2,274 Matching Annotations
  1. Apr 2024
    1. Author Response

      The following is the authors’ response to the current reviews.

      Response to Reviewer 1:

      • We agree with the reviewer’s overall assessment of this manuscript.

      • Because multiple secreted proteins are changed between the control and experimental groups, some of them could be causal and others corelative in the context of enhancing compensatory glucose production in response to elevated glycosuria. Through future studies we will determine the causal factors that trigger the increase in glucose production.

      • Yes, we will correct the typographical errors in a revised version of this manuscript.

      Response to Reviewer 2:

      • We agree with reviewer on their comment about potential sex differences we may have missed in this study. Therefore, we will include this limitation in discussion section of a revised manuscript.

      • The reviewer’s statement ‘The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout’ is incorrect. In the referred publication, we had explicitly mentioned in methods that ‘All of the experiments, except those using a diet-induced obesity mouse model or noted otherwise, were completed within 14 days of inducing the Glut2 deficiency.’ Please see figures 5h-l and 6 in that previous publication, which demonstrate that all the experiments were not completed within 14 days of inducing renal Glut2 deficiency. Per the reviewer’s advice, in the present manuscript we will include the timeline of the experiments (which in some cases is 4 months beyond inducing glycosuria) with all the figure legends. In addition, for a separate project (which is unpublished) we have measured glycosuria up to 1 year after inducing renal Glut2 deficiency. Therefore, the glycosuria observed in the renal Glut2 KO mice is not temporary.

      • In our previous response to the reviewer, we had already mentioned which control group was used in this study. Please see our response to the second reviewer’s point 3. As mentioned to the reviewer, we had used Glut2-loxp/loxp mice as the control group, which is also described multiple times in the figure legends of our previous paper that reported the phenotype of renal Glut2 KO mice and is cited in this manuscript so we don’t have to repeat the same information. Per the reviewer’s advice, we will also include the information in a revised version of this manuscript.

      • We request the reviewer to look at figure 1, showing an increase in glucose production in renal Glut2 KO mice and figure 3, which demonstrates that an afferent renal denervation reduces blood glucose levels by 50%. The afferent renal denervation (ablation of afferent renal nerves) does reduce blood glucose levels in renal Glut2 KO mice. Therefore, the use of the word ‘promote’ in the title is accurate and appropriate to reflect the role of the afferent renal nerves in contributing to about 50% increase in blood glucose levels in renal Glut2 KO mice. Regarding the reviewer's comment on changes in Crh gene expression, please look at figure 3. Ablation of renal afferent nerves decreases hypothalamic Crh gene expression and other mediators of the HPA axis by 50%. Therefore, the afferent renal nerves do contribute to regulating blood glucose levels, at least in part, by the HPA axis (which is widely known to change blood glucose levels). The use of words such as ‘required’ or ‘necessary’ in the title may have indicated causal role or could have been misleading here; therefore we have purposely used ‘promote’ in the title to accurately reflect the findings of this study.

      • Because we observed an increase in hepatic glucose production in renal Glut2 KO mice (Fig. 1) - which was reduced by 50% after selective afferent renal denervation (Fig. 3) - in the graphical abstract we are suggesting a neural connection between the kidney-brain-liver or an endocrine factor(s) to account for these changes in blood glucose levels as also described in the discussion section. We can include a question mark ‘?’ in the graphical abstract to show that further studies are need to validate these proposed mechanisms; however, we cannot just remove the arrow as advised by the reviewer.

      • Per the reviewer’s advice, in the methods we will include the dilutions used for each assay.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      It would be helpful to the reader to specify in Figure 1a-c whether data were directly measured or calculated.

      We have now clarified this in method section of the revised manuscript. The glucose production was directly measured and then fractional contribution of the tissues was calculated from the former data. We have also included a reference research paper to further clarify the method.

      The methods section would be strengthened by clarifying the order in which experiments were performed, the age of the mice at each time point, and whether different cohorts were used for different techniques.

      We have included additional details in the method section with proper citations. For in-depth protocols we have cited our previous publications.

      It would be helpful to explain or provide a reference for how the post-mortem background activity measurement was performed.

      We have included this explanation in the revised manuscript.

      Similarly, details regarding the collection of blood for ACTH and corticosterone measurement are needed for the reader to evaluate whether the results are confounded by stress at the time of collection.

      We have added these details in the method section.

      I recommend stating, if accurate, that you used mixed-sex groups because your previous study found no sex differences in the phenotype of renal Glut2 KO mice.

      Yes, we have included these details in the revised manuscript.

      Sentence 239 is difficult to follow. Also, line 287 contains a contraction.

      We have revised the sentence per the reviewer’s advice.

      A graphical abstract would be helpful, bearing in mind conclusive vs suggestive findings.

      Yes, we have included the graphical abstract with the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments to the Authors

      (1) The Methods also need to specify more of the critical details of the ELISAs, including the dilution factors used, and whether the values reported are dilution-corrected. Also, there is no description of how insulin was measured.

      We have included these details in the method section. The assay dilutions were performed per manufacturers’ instructions.

      (2) The Methods do not sufficiently describe how Crh mRNA was quantified in the hypothalamus. Presumably, they examined only the paraventricular nucleus? How many sections were used for in situ hybridization? How were the brains processed? What thickness of section was used? When were the brains collected?

      We have included these details in the method section and cited our previous publications for in-depth protocols. Some of the information is also available in the figure legends.

      (3) The number of mice that were used for plasma proteomics is not indicated.

      The number of mice is indicated using individual symbols or points presented on the bar graphs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study addresses the long-term effect of warming and altered precipitation on microbial growth, as a proxy for understanding the impact of global warming. While the methods are compelling and the evidence supporting the claims is solid, additional analysis of the data would strengthen the study, which should be of broad interest to microbial ecologists and microbiologists.

      We sincerely appreciate your assessment and thoughtful comments, which are valuable and very helpful for improving our manuscript. We have carefully considered all comments, and made extensive, thorough corrections and additional analysis of the data, which we hope to meet with approval.

      Reviewer #1 (Public Review):

      Warming and precipitation regime change significantly influences both above-ground and below-ground processes across Earth's ecosystems. Soil microbial communities, which underpin the biogeochemical processes that often shape ecosystem function, are no exception to this, and although research shows they can adapt to this warming, population dynamics and ecophysiological responses to these disturbances are not currently known. The Qinghai-Tibet Plateau, the Third Pole of the Earth, is considered among the most sensitive ecosystems to climate change. The manuscript described an integrated, trait-based understanding of these dynamics with the qSIP data. The experimental design and methods appear to be of sufficient quality. The data and analyses are of great value to the larger microbial ecological community and may help advance our understanding of how microbial systems will respond to global change. There are very few studies in which the growth rates of bacterial populations from multifactorial manipulation experiments on the Qinghai-Tibet Plateau have been investigated via qSIP, and the large quantity of data that comprises the study described in this manuscript, will substantially advance our knowledge of bacterial responses to warming and precipitation manipulations.

      We appreciate the encouragement and positive comments.

      Specific comments:

      (1) Please add some names of microbial groups with most common for the growth rates.

      We have added the sentence “The members in Solirubrobacter and Pseudonocardia genera had high growth rates under changed climate regimes” In the Abstract (Line 57-58).

      (2) L47-48, consider changing "microbial growth and death" to "microbial eco-physiological processes (e.g., growth and death)", and changing "such eco-physiological traits" to "such processes".

      Done (Line 47 and 48).

      (3) L50-51, the author estimated bacterial growth in alpine meadow soils of the Tibetan Plateau after warming and altered precipitation manipulation in situ. Actually, the soil samples were collected and incubated in the laboratory rather than in the field like the previous experiment conducted by Purcell et al. (2021, Global Change Biology). "In situ" would lead me to believe that the qSIP incubation was conducted in the field, so I think the use of the word in situ is inappropriate here. [https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15911]

      Agreed. We have deleted “in situ”.

      (4) L52, what does "interactive global change factors" mean?

      We have revised this sentence to “the growth of major taxa was suppressed by the single and combined effects of temperature and precipitation” (Line 52-53).

      (5) L61, in my opinion, "Microbial diversity" belongs to the category of species composition, rather than ecosystem functional services. Please revise it.

      Agree. We have deleted it.

      (6) L69, consider changing "further" to "thus".

      Done (Line 70).

      (7) L82, delete "The evidence is overwhelming that".

      Done.

      (8) L85-90, these two sentences have similar meanings, please express them concisely.

      We have deleted the sentence “Altered precipitation, particularly drought or heavy precipitation events, also tends to negatively influence soil processes and biodiversity”.

      (9) L91, the effect of drought on soil microorganisms is lacking here.

      We have added the sentence “Reduced precipitation affects soil processes notably by directly stressing soil organisms, and also altering the supply of substrates to microbes via dissolution, diffusion, and transport” in the Introduction (Line87-89).

      (10) L102, "Growth" should be highlighted here, as changes in relative abundance can also be classified as population dynamics. The use of the term "population dynamics" will eliminate the highlight of this study in calculating the growth rate of microbial species in in-situ soil based on qSIP. Consider changing "population dynamics" to "population-growth responses" or something like that.

      Done (Line 98).

      (11) L105, please note that this citation focuses on plant physiological characteristics.

      We have revised the reference (Line 102).

      (12) L115, "soil temperature, water availability" should be considered as a direct impact of climate change, rather than an indirect impact on microorganisms.

      We have deleted them.

      (13) L134-135, please clarify the interaction types between which climate factors.

      We have deleted this sentence.

      (14) L135-138, suggest modifying or deleting this sentence. The results in this study are already eco-physiological data and do not need to be further "understood and predicted".

      We have deleted this sentence.

      (15) L150, "The experimental design has been described in previously". I think this refers to another study and not the actual incubations in this study. Also in L198, suggest a change to "Incubation conditions were similar to those previously described". So, it's clear it's not the same experiment.

      We have revised these sentences to “has been described previously in (Ma et al., 2017)” (Line 136) and “according to a previous publication” (Line 194).

      Reference:

      Ma, Z., Liu, H., Mi, Z., Zhang, Z., Wang, Y., Xu, W. et al. (2017). Climate warming reduces the temporal stability of plant community biomass production. Nature Communications, 8, 15378.

      (16) L188, change "pre-wet soil samples" to "pre-wet samples" and change "soil samples for 48h incubation" to "incubation samples". What does "pre-wet" mean? Does it represent soil pre-cultivation?

      Done. The pre-wet samples, i.e., the soil samples before incubation (T = 0 d), were used to estimate the initial microbial composition. "pre-wet" does not mean soil pre-cultivation. We have added the description “A portion of the air-dried soil samples was taken as the pre-wet treatment (i.e., before incubation without H2O addition)” in MATERIALS AND METHODS (Line 174-175).

      (17) Unify the time unit of incubation (hour or day). Consider changing "48 h" to "2 d" in Materials and Methods.

      Done.

      (18) L247, what version of RDP Classifier was used?

      We used RDP v16 database for taxonomic annotation. We have added this information in the revision (Line 246).

      (19) L270, "average molecular weights".

      Done (Line 268).

      (20) L272-275, based on the preceding description, it appears that the culture period was limited to 48 hours. Please confirm it.

      Apologize for this mistake. We have revised it (Line 273).

      (21) L297, switch the order of the first two sentences of this paragraph.

      Done (Line 297).

      (22) L331, change "smaller-than-additive" to "smaller than their expected additive effect".

      Done (Line 331).

      (23) L374 and 381, I struggle with why "larger combined effects" than single factor effects represent higher degree of antoninism, and I think it should be "smaller combined effects".

      Agree. We have revised it according to this suggestion (Line 369 and 374).

      (24) L375, remove "than that of drought and warming".

      Done.

      (25) L405, simplify the expression, change "between different warming and rainfall regimes" to "between climate regimes"

      We have deleted this sentence.

      (26) L406-408, species are already on the phylogenetic tree and they can not "clustered at the phylogenetic branches", but the functional traits of microbes can. Please revise it.

      We have revised this sentence to “Overall, the most incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic clustering (i.e., species clustered at the phylogenetic branches; NTI > 0, P < 0.05)” (Line 402-404).

      (27) L409, the same as above, and consider removing "The incorporators subjected to". We have revised this sentence to “The incorporators whose growth subjected to the additive interaction of warming × drought also showed significant phylogenetic clustering (P < 0.05)” (Line 404-406).

      (28) L412, consider changing "incorporators subjected to the synergistic interaction" to "the synergistic growth responses under multifactorial changes".

      We have revised the sentence to “incorporators whose growth is influenced by the synergistic interaction showed phylogenetically random distribution under both climate scenarios (P > 0.05)” (Line 407-409).

      (29) L505-506, please add a reference for this sentence.

      Done (Line 488).

      (30) L511-514, It should be noted that the production of MBC does not necessarily imply a net change in the C pool size. The accelerated growth rates may result in expedited turnover of MBC, rather than an increase in carbon sequestration.

      Thanks. We have deleted this sentence.

      (31) Language precision. In the discussion section there must be some additional caveats introduced to some of the claims the authors are making. For instance, L518, the author should clarify that "in this study, the bacterial growth in alpine grassland may be influenced by antagonistic interactions between multiple climatic factors after a decadal-long experiment". Because other studies may exhibit different results due to the focus on different ecosystem functions as well as environmental conditions. As such, softening of the language is recommended- lines are noted below- and these will not adjust the outcomes of this study, but support more precise interpretation.

      We have revised the sentence to “In this study, a decade-long experiment revealed that bacterial growth in alpine meadows is primarily influenced by the antagonistic interaction between T × P” (Line 497-499).

      (32) Picrust analysis is a good way to connect species and their functions, especially Picrust2, which updated the reference database and optimized the algorithm to improve its prediction accuracy (Douglas et al., 2020, Nature Biotechnology). However, the link between microbial taxonomy and microbial metabolism is still not straightforward, especially in diverse microbial communities like soils. The authors should introduce caveats within discussion that they know the limitations of their methods. For context, as a reader who does metabolisms in soils, I found myself somewhat disappointed when piecrust data was introduced and not properly caveated. Particularly, it might be helpful to introduce briefly in the last paragraph of the results. These caveats are necessary to not potentially overstate the author's findings, and to make sure the reader knows the authors understand the very clear limitations of these methods. [https://www.nature.com/articles/s41587-020-0548-6]

      Thanks. We have introduced caveats in DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542).

      Reference:

      Douglas, G., Maffei, V., Zaneveld, J., Yurgel, S., Brown, J., Taylor, C. et al. (2020). PICRUSt2 for prediction of metagenome functions. Nature Biotechnology, 38, 1-5.

      (33) Although the author has explained the potential causes for the negative effects of different climate change factors (i.e., warming, drought, and wet) on microbial growth, there seems to be a lack of a summary assertion and an extension on how climate change affects microbial growth and related ecosystem functions. It is recommended to make a general summary of the results in the last part of Discussion.

      We have added a general summary in the last paragraph of DISCUSSION, that is “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction. This suggests the development of multifactor manipulation experiments in precise prediction of future ecosystem services and feedbacks under climate change scenarios” (Line 552-558).

      (34) L546, please add the taxonomic information for "OTU 14".

      Done (Line 533).

      (35) L800, change "The phylogenetic tree" to "A phylogenetic tree".

      Done (Line 762).

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to describe the effect of different temperature and precipitation regimes on microbial growth responses in an alpine grassland ecosystem using quantitative 18O stable isotope probing. It was found that all climate manipulations had negative effects on microbial growth, and that single-factor manipulations exerted larger negative effects as compared to combined-factor manipulations. The degree of antagonism between factors was analyzed in detail, as well as the differential effect of these divergent antagonistic responses on microbial taxa that incorporated the isotope. Finally, a hypothetical functional profiling was performed based on taxonomic affiliations. This work gives additional evidence that altered warming and precipitation regimes negatively impact microbial growth.

      Strengths:

      A long term experiment with a thorough experimental design in apparently field conditions is a plus for this work, making the results potentially generalisable to the alpine grassland ecosystem. Also, the implementation of a qSIP approach to determine microbial growth ensures that only active members of the community are assessed. Finally, particular attention was given to the interaction between factors and a robust approach was implemented to quantify the weight of the combined-factor manipulations on microbial growth.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      The methodology does not mention whether the samples taken for the incubations were rhizosphere soil, bulk soil or a mix between both type of soils. If the samples were taken from rhizosphere soil, I wonder how the plants were affected by the infrared heaters and if the resulting shadow (also in the controls with dummy heaters) had an effect on the plants and the root exudates of the parcels as compared to plants outside the blocks? If the samples were bulk soil, are the results generalisable for a grassland ecosystem? In my opinion, it is needed to add more info on the origin of the soil samples and how these were taken.

      The samples taken for the incubations can be considered as a mixture of rhizosphere and bulk soils. During soil sampling, we did not use conventional rhizosphere soil collection methods. However, there is a certain proportion of fragmented roots in the soil samples we collected, indicating that soil properties are influenced by plants. We have added this description in MATERIALS AND METHODS (Line 158).

      To minimize the impact of physical shading on the plants, each sampling point was as far away from infrared heaters as possible. We have added this information of soil collection in MATERIALS AND METHODS, that is “In each plot, three soil cores of the topsoil (0-5 cm in depth) were randomly collected and combined as a composite sample, which can be considered as a mixture of rhizosphere and bulk soils. Each sampling point was as far away from infrared heaters as possible to minimize the impact of physical shading on the plants. The fresh soil samples were shipped to the laboratory and sieved (2-mm) to remove root fragments and stones.” (Line 157-162).

      Previous studies based on our field experiment assessed the effects of warming and altered precipitation on soil microbial communities (Zhang et al., 2016), the temporal stability of plant community biomass (Ma et al., 2017), shifting plant species composition and grassland primary production (Liu et al., 2018). These studies provide guidance for the experiment design and execution.

      Reference:

      Zhang, KP., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ma ZY., Liu, HY., Mi, ZR. et al. (2017). Climate warming reduces the temporal stabilityof plant community biomass production. Nature Communications, 8, 15378.

      Liu, HY., Mi, ZR., Lin, L. et al. (2018). Shifting plant species composition in response to climate change stabilizes grassland primary production. Proceedings of the National Academy of Sciences, 115, 4051-4056.

      The qSIP calculations reported in the methodology for this work are rather superficial and the reader must be experienced in this technique to understand how the incorporators were identified and their growth quantified. For instance, the GC content of taxa was calculated for reads clustered in OTUs, and it is not discussed in the text the validity of such approach working at genus level.

      We have added the description of qSIP calculations in Supplementary Materials.

      The approach of GC content calculation can be used at genus level (Koch et al., 2018). The GC content of each bacterial taxon (Gi) was calculated using the mean density for the unlabeled (WLIGHTi) treatments (Hungate et al. 2015), rather than OTU sequence information. We have revised the sentence in MATERIALS AND METHODS, that is “the number of 16S rRNA gene copies per OTU taxon (e.g., genus or OTU) in each density fraction was calculated by multiplying the relative abundance (acquisition by sequencing) by the total number of 16S rRNA gene copies (acquisition by qPCR)” (Line 255-258).

      Reference:

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The selection of V4-V5 region over V3-V4 region to quantify the number of copies of the 16S rRNA gene should be substantiated in the text. Classic works determined one decade ago that primer pairs that amplify V3-V4 are most suitable to assess soil bacterial communities. Hungate et al. (2015), worked with the V3-V4 region when establishing the qSIP method. Maybe the number of unassigned OTUs is related with the selection of this region.

      Both primer sets (V3-V4 and V4-V5 regions), are widely used across various sample sets, with highly similar in representing the total microbial community composition (Fadeev et al., 2021; Zhang et al., 2018).

      A previous study based on our Field Research Station of Alpine Grassland Ecosystem used V4-V5 primer pairs to investigated the effect of warming and altered precipitation on the overall bacterial community composition (Zhang et al., 2016).

      Another reason for choosing the V4-V5 primer set in this study was to integrate and compare the data with that of two previous qSIP studies (Ruan et al., 2023; Guo et al., submitted), both of them focused on the growth responses of active species to global change and used V4-V5 primer pairs.

      We have added an explanation about primer selection as “The V4-V5 primer pairs were chosen to facilitate integration and comparison with data from previous studies (Ruan et al., 2023; Zhang et al., 2016)” (Line 213-215).

      Reference:

      Fadeev, E., Cardozo-Mino, M.G., Rapp, J.Z. et al. (2021). Comparison of Two 16S rRNA Primers (V3–V4 and V4–V5) for Studies of Arctic Microbial Communities. Frontiers in Microbiology, 12

      Zhang, J.Y., Ding, X., Guan, R. et al. (2018). Evaluation of different 16S rRNA gene V regions for exploring bacterial diversity in a eutrophic freshwater lake. Science of The Total Environment, 618, 1254-1267.

      Zhang, K.P., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ruan, Y., Kuzyakov, Y., Liu, X. et al. (2023). Elevated temperature and CO2 strongly affect the growth strategies of soil bacteria. Nature Communications, 14, 1-12.

      Guo, J.J., Kuzyakov, Y., Li, L. et al. (2023). Bacterial growth acclimation to long-term nitrogen input in soil. The ISME Journal, Submitted.

      Report of preprocessing and processing of the sequences does not comply state of the art standards. More info on how the sequences were handled is needed, taking into account that a significant part of the manuscript relies on taxonomic classification of such sequences. Also, an OTU approach for an almost species-dependent analysis (GC contents) should be replaced or complemented with an ASV or subOTUs approach, using denoisers such as DADA2 or deblur. Usage of functional prediction tools underestimates gene frequencies, including those related with biogeochemical significance for soil-carbon and nitrogen cycling.

      (1) We have complemented the information about sequence processing as “The raw sequences were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). In brief, the paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to remove redundant sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence.” (Line 238-245).

      (2) We have complemented the zero-radius OTU (ZOTU) analysis by the unoise3 command in USEARCH (https://drive5.com/usearch/manual/pipe_otus.html), as shown in Fig. S1-S2. The results showed that overall growth responses of soil bacteria to warming and precipitation changes were similar based on OTU and ZOTU analyses, i.e., warming and altered precipitation tend to negatively affect the growth of grassland bacteria and the prevalence of antagonistic interactions of T × P. The similarity of results between the different methods is reflected at the overall community level, the phylum level, the genus level and the species (i.e., OTU or ZOTU) level (Fig. S1 and S2).

      Author response image 1.

      The growth responses of grassland bacteria to warming and altered precipitation based on ZOTU analysis. The results of growth rates at the community level (A), the phylum level (B), and the ZOTU level (C and D) were similar to those based on OTU analysis. C the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. D the proportions of species growth influenced by different interaction types of T × P. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Author response image 2.

      The growth responses of grassland bacteria at the genus level to warming and altered precipitation based on OTU analysis (A and C) and ZOTU analysis (B and D). A and B the single and combined factor effects of climate factors on growth in genera, by comparing with those in T0nP. C and D the proportions of genera whose growth influenced by different interaction types of T × P.

      (3) Agreed. We have introduced the caveat about the limitation of usage of functional prediction tools to the end of DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542). The caveat ensures that the reader knows the limitations of these methods, and are not potentially overstate our findings.

      Reference:

      Douglas, G.M., Maffei, V.J., Zaneveld, J.R. et al. (2020) PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 38, 685–688.

      Reviewer #2 (Recommendations For The Authors):

      General suggestions:

      Regarding the qSIP method, would be of help to see the differences in density vs number of 16S rRNA gene abundance for the most responsive bacterial groups in the different treatments, taking into account that with only 7 fractions the entire change in bacterial growth was resolved.

      We have selected three representative bacterial taxa (OTU1 belonging to Bradyrhizobium, OTU14 belonging to Solirubrobacter, OTU15 belonging to Pseudoxanthomonas), which have high growth rates in climate change treatments. The result showed that the peaks in the 18O treatment are shifted "backwards" (greater average weighted buoyancy density) compared to the 16O treatment, indicating that these species assimilates the 18O isotope into the DNA molecules during growth.

      Author response image 3.

      The distribution of 16S rRNA gene abundance of three representative bacterial taxa (OTU1- Bradyrhizobium, OTU14-Solirubrobacter, and OTU15-Pseudoxanthomonas) in different buoyant density fractions. Values represent mean and the error bars represent standard deviation.

      Seven fractionated DNA samples were selected for sequencing because they contained more than 99% gene copy numbers of each samples (please see the Figure below). The DNA concentrations of other fractions were too low to construct sequencing libraries.

      Author response image 4.

      Relative abundance of 16S rRNA gene copies in each fraction. The fractions with density between 1.703 and 1.727 g ml-1 were selected because they contained more than 99% gene copy numbers of each sample. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation.

      With such dataset additional multivariate analysis would be of help to better interpret the ecological framework.

      Thanks for the suggestion. Interpreting the ecological framework is meaningful for understanding microbial responses to environmental changes.

      The main objective of this study is to investigate the growth response of soil microbes in alpine grasslands to the temperature and precipitation changes, and the interaction between climate factors. Our results, as well as the results of complementary analyses (based on subOTU analyses, SHOWN BELOW), indicate that warming and altered precipitation tend to negatively affect the growth of grassland bacteria, and the prevalence of antagonistic interactions of T × P.

      We have emphasized our research objectives and main conclusions in the revised manuscript: “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau” (Line 112-114);

      “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction” (Line 552-556).

      Extension of interaction analysis and its conclusions should be shortened, summarizing the most relevant findings. In my opinion, it becomes a bit redundant.

      We have shortened the discussion of Extension of interaction analysis by deleting the little relevant contents.

      Below are some, but not all, examples that have been deleted or revised in the Discussion,

      (1) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive”;

      (2) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)”;

      (3) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” (Line 499-503);

      (4) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” (Line 464-466).

      I strongly suggest a functional analysis based on shotgun sequencing or RNAseq approaches. With this approach this work would be able to answer who is growing under altered T and Precipitation regimes and what are those that are growing doing.

      Thanks for the suggestion. Metagenomic sequencing is a popular approach to evaluate potential functions of microbial communities in environment. However, there are two main reasons that limit the application of metagenomic or metatranscriptomic sequencing in this study: 1) Most of the fractionated samples in SIP experiment have low DNA concentration and do not meet the requirement of library construction for sequencing; 2) Metagenome and metatranscriptomics usually have relatively low sensitivity to rare species, reducing the diversity of detected active species.

      This study focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      Minor suggestions:

      L121. _As

      We have deleted this sentence and relocated the hypotheses in the last paragraph of INTRODUCTION (according to the suggestion of the reviewer #3).

      Line150. Described previously in.

      Done (Line 136).

      Line500. Check whether it is better to use the word acclimatization (Coordinated response to several simultaneous stressors) in exchange of acclimation

      We have revised it according to this suggestion (Line 481).

      Fig.4C Drought

      Done (Line 761).

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Ruan et al. studied the long-term impact of warming and altered precipitations on the composition and growth of the soil microbial community. The researchers adopted an experimental approach to assess the impact of climate change on microbial diversity and functionality. This study was carried out within a controlled environment, wherein two primary factors were assessed: temperature (in two distinct levels) and humidity (across three different levels). These factors were manipulated in a full factorial design, resulting in a total of six treatments. This experimental setup was maintained for ten years. To analyze the active microbial community, the researchers employed a technique involving the incorporation of radiolabeled water into biomolecules (particularly DNA) through quantitative stable isotope probing. This allowed for the tracking of the active fraction of microbes, accomplished via isopycnic centrifugation, followed by Illumina sequencing of the denser fraction. This study was followed by a series of statistical analysis to identify the impact of these two variables on the whole community and specific taxonomic groups. The full factorial design arrangement enabled the researchers to discern both individual contributions as well as potential interactions among the variables

      Strengths:

      This work presents a timely study that assesses in a controlled fashion the potential impact of global warming and altered precipitations on microbial populations. The experimental setup, experimental approach and data analysis seem to be overall solid. I consider the paper of high interest for the whole community as it provides a baseline to the assessment of global warming on microbial diversity.

      Thanks for the encouragement and positive comments.

      Weaknesses:

      While taxonomic information is interesting, it would have been highly valuable to include transcriptomics data as well. This would allow us to understand what active pathways become enriched under warming and altered precipitations. Non-metabolic OTUs hold significance as well. The authors could have potentially described these non-incorporators and derived hypotheses from the gathered information. The work would have benefited from using more biological replicates of each treatment.

      Thanks for the valuable suggestions.

      (1) Metatranscriptomics can assess the functional profiles of the community, but it has relatively low sensitivity to rare species, which is difficult to correlate the function pathways with the assignment to the numerous active taxa identified by qSIP. Additionally, due to the low DNA concentration, most fractionated samples are difficult to construct sequencing libraries, while amplicon based sequencing analyses were allowed. This study therefore focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      (2) 18O-qSIP can identify the growing microbial species (i.e., 18O incorporators) in the environment rather than metabolically active taxa. These non-incorporators in our study were likely to be metabolically active, i.e., maintaining life activities without reproduction, or recently deceased (Blazewicz et al., 2013). Therefore, it is hard to distinguish whether these non-incorporators possess metabolic activity.

      (3) Agreed. The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis. We have explained this issue in MATERIALS AND METHODS, that is “Considering the cost of qSIP experiment (i.e., the use of isotopes and the sequencing of a large number of DNA samples), we randomly selected three out of the six plots, serving as three replicates for each treatment” (Line 154-157).

      Reference:

      Nuccio, E.E., Starr, E., Karaoz, U. et al. (2020) Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J 14, 999–1014.

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      The manuscript should be written in a clearer way. The language should be more direct, so the message is conveyed faster and clearer. Some sentences, for instance, could be shortened or re-organized. Below, you will find some examples.

      We have rewritten the sentences to make the manuscript clearer. Below are some, but not all, examples that have been revised:

      (1) Deleted “(reduced precipitation, hereafter ‘drought’, or enhanced precipitation, hereafter ‘wet’)” in INTRODUCTION;

      (2) Deleted “Controlled experiments simulating climate change have investigated changes in microbial community composition as measured by shifts in the relative abundances (Evans & Wallenstein, 2014; Barnard et al., 2015). However, changes in relative abundances may be poor indicators of population responses to environmental change in some cases (Blazewicz et al., 2020). Another challenge is the presence of a large number of inactive microbial cells in the soil, which hinders the direct, quantitative measure of the ecological drivers in population dynamics (Fierer, 2017; Lennon & Jones, 2011).” in DISCUSSION;

      (3) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive” in DISCUSSION;

      (4) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)” in DISCUSSION;

      (5) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” in DISCUSSION (Line 499-503);

      (6) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” in DISCUSSION (Line 464-466).

      I'm curious about why, even though there were six replicates of the experiment, only three samples were collected for analysis. Metagenomic analyses tend to display high variability.

      The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis..

      In Fig. 3A, the absolute growth rates (16S copies/d*g) are shown. How do you know that the efficiency of DNA extraction was similar across all treatments and therefore the absolute numbers are comparable?

      To avoid differences in extraction efficiency caused by experimental procedures, all DNA samples were extracted by the same person (the first author) within 2-3 hours, and a unifying procedure of cell lysis and DNA extraction was used, i.e., the mechanical cell destruction was attained by multi-size beads beating at 6 m s-1 for 40 s, and then FastDNA™ SPIN Kit for Soil (MP Biomedicals, Cleveland, OH, USA) was used for DNA extraction.

      We have measured the concentration of extracted DNA and found no significant difference between treatments (Table for the response letter).

      Author response table 1.

      Soil DNA concentration in climate change treatments after qSIP incubation (measured by Qubit® DNA HS Assay Kits).

      Values represent mean and standard deviation. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. The results of ANOVA indicated no significant difference of extracted DNA concentration between treatments (p > 0.05).

      We have introduced the caveat in the DISCUSSION, that is “Note that the experimental parameters such as DNA extraction and PCR amplification efficiencies also have significant effects on the accuracy of growth assessment. This alerts the need to standardize experimental practices to ensure more realistic and reliable results” (Line 544-547).

      Line 96-99 and 121-124: "Hypotheses are typically placed at the end of the final paragraph in the Introduction section. It is advisable to relocate them there and provide a clearer description of the paper's main goal."

      We have relocated the hypotheses at the end of INTRODUCTION, and the main goal of this study, that is “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau, by using the 18O-quantitative stable isotope probing (18O-qSIP)” (Line 112-115).

      Line 399: Although you describe the classification among antagonistic interactions in the Methods section, I think you should describe this in further detail here. Can you clarify how you carried out this categorization and how these results were interpreted considering the phylogenetic classification.

      We have added the description of antagonistic interactions, that is “The interaction type of T × P on the growth of ~70% incorporators was antagonistic (i.e., the combined effect size is smaller than the additive expectation) (Fig. 4C)” (Line 388-390).

      The interaction types between factors can be classified into three categories: additive, synergistic and antagonistic. Additive interactions are those in which the combined effect size of factors is equal to the sum of the single effects of the factors (i.e., additive expectation, Fig. 1B). Synergistic interactions refer to the effect size was larger than the additive expectation by the combined manipulation of factors. On the contrary, antagonistic interactions refer to the combined effect size of factors is smaller than the additive expectation. In this study, the antagonistic interactions were further divided into three sub-categories: weak antagonistic interaction, strong antagonistic interaction, and neutralizing effect (Fig. 1B). The weak antagonistic interaction refers to the combined effect size smaller than the additive expectation and larger than any of the single factor effects. The strong antagonistic interaction refers to that the combined effect size is smaller than any of the single factor effects but larger than 0. The neutralizing effect refers to that the combined effect size is equal to 0, implying that the effects of different factors cancel each other out.

      Methodologically, the single and combined effects of two climate factors and their interaction effects were calculated by the natural logarithm of response ratio (lnRR) and Hedges’ d, respectively (Yue et al., 2017).

      We have added the result interpretation about the phylogenetic distribution patterns of incorporators, that is “The degree of phylogenetic relatedness can indicate the processes that influenced community assembly, like the extent a community is shaped by environmental filtering (clustered by phylogeny) or competitive interactions (life strategy is phylogenetically random distribution) (Evans & Wallenstein, 2014; Webb et al., 2002).The results showed that the incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic relatedness, indicating the occurrence of taxa more likely shaped by environment filtering (i.e., selection pressure caused by changes in temperature and moisture conditions). In contrast, the growing taxa affected by synergistic interactions of T × P showed random phylogenetic distributions (Table S1), which may be explained by competition between taxa with similar eco-physiological traits or changes in genotypes (possibly through horizontal gene transfer) (Evans & Wallenstein, 2014). We also found that the extent of phylogenetic relatedness to which taxa groups of T × P interaction types varied by climate scenarios, suggesting that different climate history processes influenced the ways bacteria survive temperature and moisture stress” (Line 515-529).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Yue, K., Fornara, D.A., Yang, W., Peng, Y., Peng, C., Liu, Z. et al. (2017). Influence of multiple global change drivers on terrestrial carbon storage: additive effects are common. Ecology Letters, 20, 663-672.

      Line 407-8: What do you mean with "...clustered at the phylogenetic branches" and Line 410: "cluster near the tips of the phylogenetic tree". Can you please clarify?

      Sorry for the unclear statement. We have added the explanation of NTI, that is “Nearest taxon index (NTI) was used to determine whether the species in a particular growth response are more phylogenetically related to one another than to other species (i.e., close or clustering on phylogenetic tree). NTI is an indicator of the extent of terminal clustering, or clustering near the tips of the tree (Evans & Wallenstein, 2014; Webb et al., 2002)” (Line 397-401).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Could you provide some info about the biochemistry of the incorporation of heavy water into DNA molecules? What specific enzymes are typically involved?

      Due to the low DNA concentration in most fractionated samples (less than 10 ng/μL, measured by Qubit DNA HS Assay Kits), only amplicon based sequencing analyses were allowed. This study therefore focused only on active microbial taxa and their growth in response to multifactorial climate change.

      What might be the impact of soil desiccation on bacterial survival and subsequent water uptake?

      Slow dehydration and air drying of soil is a very common phenomenon in nature (Koch et al., 2018). In this process, microorganisms will reduce metabolism, and shift towards a potentially active state (Blagodatskaya and Kuzyakov, 2013). A previous study suggested that the potentially active microbial population permanently existing in soil between the active and dormant physiological states. Even under long-term starvation the potentially active microorganisms maintain ‘physiological alertness’ to be ready to occasional substrate input (Blagodatskaya and Kuzyakov, 2013). These microorganisms are important participants in the biogeochemical cycle is the focus of this study.

      Replacing the environmental water in the soil with 18O-labelled water is a typical practice for qSIP studies (Hungate et al. 2015; Koch et al., 2018). This process may cause disturbance to the microbial community. In this study, the soil samples were placed in a thermostatic incubator (14℃ and 16℃), rather than air-drying at 25℃ (as used in most studies). The incubation temperature is relatively low (compared to 25℃) and there is no violent air convection in the incubator, resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h. The process of soil drying in this study simulated the natural phenomenon, i.e., slow water loss in soil.

      We have added the description in MATERIALS AND METHODS, that is “There is no violent air convection in the incubator and the incubation temperature is relatively low (compared to 25℃ used in previous studies), resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h” (Line 171-174).

      Reference:

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The analysis of the 180 incorporators is interesting as it defines what microbes are metabolically active and hence growing under the different conditions tested. Should not be worth to analyze the non-incorporators? Is it possible to identify a pattern to generate a hypothesis of why they are metabolically inactive based on this information? In the Methods section, the authors state that they identified a total of 6,938 OTUs, of which only 1,373 were found to be incorporators.

      Microbes exist in a range of metabolic states: growing, active (non-growth), dormant and recently deceased (Blazewicz et al., 2013), and there is still a lack of clear threshold for their identification. 18O-DNA qSIP can identified the growing microbial species (i.e., 18O incorporators) rather than all metabolic active taxa, because some cells are measurably metabolizing (catabolic and/or anabolic processes) without reproduction. Therefore, the non-incorporators in our study may be metabolically active, or not (recently deceased microorganisms). This study focuses on the growing microorganisms identified by 18O-qSIP.

      In this study, ~20% microbial taxa (1,373/6,938) were identified as 18O incorporators. Microorganisms in soils suffer from resource and energy constraints frequently (Blagodatskaya and Kuzyakov, 2013). The energy requirements of species in the growing state are much higher (~30 fold) than those in the non-growing state, so the percentage of growing bacterial taxa in soil tends to be low.

      Reference:

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Minor comments:

      Fig. 3A and 3B. Please show the results of the multiple comparisons.

      Done.

      Author response image 5.

      Bacterial growth responses to climate change and the interaction types between warming and altered precipitation. The growth rates (A), and responses (LnRR) of soil bacteria to warming and altered precipitation (B) at the whole community level. The growth rates (C), and responses of the dominant bacterial phyla (D) had similar trends with that of the whole community. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Fig. 4. This figure should be self-explanatory. This diagram is challenging to understand.

      We have revised Fig. 4 to improve clarity.

      Author response image 6.

      The growth responses and phylogenetic relationship of incorporators subjected to different interaction types under two climate scenarios. A phylogenetic tree of all incorporators observed in the grassland soils (A). The inner heatmap represents the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. The outer heatmap represents the interaction types between warming and altered precipitation under two climate change scenarios. The proportions of positive or negative responses in species growth to single and combined manipulation of climate factors by summarizing the data from the inner heatmap (B). The proportions of species growth influenced by different interaction types of T × P by summarizing the data from the outer heatmap (C).

      Fig. 4. It says "Dorought" instead of "drought"

      Done (Line 760).

      Line 109: "relieves" instead of "relieved"

      Done (Line 102).

      Line 129: Should be: "We classified the interaction types as additive, synergistic, antagonistic, null and neutralizing."

      Done (Line 117).

      Line 233: How were the 16S rRNA sequences from each density fraction analyzed?

      (1) Raw sequencing data processing:

      The raw 16S rRNA gene sequences of each density fraction were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). The paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to identify the unique sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence. The taxonomic affiliation of the representative sequence was determined using the RDP classifier (Wang et al., 2007).

      (2) qSIP calculation:

      Sequencing data reflects the relative abundance of taxa in community. We multiply the OTU’s relative abundance (acquisition by sequencing) and the number of 16S rRNA gene copies (acquisition by qPCR) to obtain the number of gene copies per OTU in each fraction. Then, the proportion of gene copies of a specific OTU of each fraction relative to the total amount of gene copies in one sample was calculated and used as a weight value for further calculation of the average weighted buoyant density (the critical parameter for assessing microbial growth).

      Line 366: "Three single-factor ... between warming and altered precipitation" -> "The individual impact of warming, drought, and wet conditions resulted in the most substantial negative effects on bacterial growth compared with the effects of warming x drought and warming x wet. A result that illustrates the negative interactions between warming and modified precipitations patterns."

      Done (Line 365-368).

      Line 376: "Similar with the result of whole growth of bacteria community, the growth responses of the major bacterial phyla were also negatively influenced by single climate factors". This sentence is hard to read. Maybe something like this: "Growth of the major bacterial phyla was also negatively influenced by the individual climate factors".

      Done (Line 371-372).

      Line 383: "In particular, the effects of wet and warming neutralized each other, resulting the net effects became zero on the growth rates of the phyla Actinobacteria and Bacteroidetes". "In Actinobacteria and Bacteroidetes, the effect of wet and warming neutralized each other, as the combined effect of these two factors had no effect on growth".

      Done (Line 377-379).

      Line 390: "The individual warming treatment (T+nP) reduced the growth rates of 75% incorporators..." "Warming (T+nP) reduced the growth of 75% of the taxonomic groups, which was followed by drought and wet.

      Done (Line 384-385).

      Line 392: "The combined manipulations of warming and altered precipitation lowered the percentages of incorporators with negative responses compared with single factor manipulation, especially warming and enhanced precipitation manipulation" -> "Warming x drought and warming x wet had a smaller impact on the growth of incorporators, compared with single effects."

      Done (Line 385-387).

      Line 468. This sentence "To the best ..." is not necessary.

      We have deleted this sentence.

      Line 476. Is it really "synthesis" the word you want to use?

      We have deleted this sentence.

      Line 477. Maybe should written like this: "Consistent with our findings, a recent experimental study demonstrated that 15 years of warming reduced the growth rate of soil bacteria in a montane meadow in northern Arizona."

      Done (Line 459-461).

      Line 490 and 502. Consider using "however" only once in a paragraph.

      We have deleted the second “however” (Line 483).

      Line 555-559. Based on genomic data you cannot predict the functional role of microbes in the environment. These sentences are speculative. Please, consider using less strong affirmations and focus more on the pathways that are enriched in the incorporators.

      Agreed. We have deleted this part of content.

  2. Mar 2024
    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Chen et al. reported that the core binding factor beta (Cbfβ), a heterodimeric subunit of the RUNX family transcription factors (TFs), is crucial in maintaining cartilage homeostasis and counteracting traumatic OA pathology. Using mouse models in which Cbfβ is conditionally inactivated in the Col2a1+ and Acan+ cells, the authors claimed that Cbfβ ablation led to articular cartilage (AC) degeneration, which is associated with aberrant cartilage gene expression and chondrocyte signaling, particularly the elevated Wnt/Catenin and the decreased Hippo/YAP and TGFβ signaling. The authors further showed that Cbfβ transcripts are decreased in human OA cartilage, and sustaining Cbfβ expression in mouse knee joints mitigated the severity of surgery-evoked OA.

      On the whole, the work reported is interesting and exciting. Genetic and biochemical data support key statements. Both in vivo and in vitro experiments were well designed with proper controls; semiquantitative data were digitalized and processed for statistical significance. Furthermore, new findings were adequately discussed in contrast to the current available knowledge. However, the conceptual novelty of this study is slightly compromised by recent publications showing that Cbfβ reduction is associated with OA (Che et al. 2023; Li et al. 2021). Also, the authors claimed that multiple signaling pathways were affected by Cbfβ ablation in cartilage cells; many of them, however, are indirect effects given the nature of Cbfβ as a TF. The authors also showed that pSMAD2/3 and active βCatenin decreased and increased upon Cbfβ depletion in the mouse AC cartilage. However, how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further discussion. Overall, Cbfβ's role in cartilage and OA pathology is an emerging area of study; the authors provided a set of genetic evidences showing that Cbfβ is indispensable for cartilage homeostasis.

      We thank the reviewer for the positive appraisal of our manuscript. We greatly appreciate the insightful comments and critiques. In accordance with the reviewer’s suggestions, we have thoroughly revised all parts of the manuscript. We are glad that the reviewers considered our work to be of interest, and we are grateful for this opportunity to resubmit our manuscript. With regard to concerns of novelty of our study, Li et al’s study only reported the relationship between abnormal Cbfβ expression in human cartilage and osteoarthritis. Che et al’s study employed Cbfβf/fAggrecan-cre mice, while our study used a novel inducible Cbfβf/fCol2α1CreERT mouse model. While the Aggrecan-creERT system provides valuable insights into the role of Cbfβ in differentiated cartilage cells and its implications in the advanced stages of osteoarthritis, our current study also used Cbfβf/fCol2α1-CreERT aimed to explore the gene's function from a broader perspective. Previous study points out that Col2α1 is expressed in both early and late stage of chondrogenesis, including skeletal mesenchymal cells, perichondrium and presumptive joint cells, but aggrecan is expressed specifically in differentiated chondrocytes(1). However, studies show that not only differentiated chondrocytes but also chondrocyte progenitors are involved in OA pathogenesis(2). In our current study, the Col2α1-CreERT system allowed us to investigate Cbfβ's role not only in mature chondrocytes but also in early chondroprogenitor cells, offering a comprehensive view of Cbfβ’s involvement in cartilage in osteoarthritis. Therefore, the use of the Cbfβf/fCol2α1-CreERT mouse mutant strain was instrumental in expanding our understanding of Cbfβ's multifaceted role in osteoarthritis, highlighting its importance not only in mature cartilage but also in the early stages of cartilage formation and differentiation. In addition to the different types of Cre used compared to our previous study, our current study also used gain-of-function approach in ACLT-induced OA disease model to understand the potential therapeutic function of Cbfβ in OA pathological condition. Adding our current findings to our previous research, we can now piece together a more complete picture of Cbfβ's role across the entire spectrum of cartilage development in osteoarthritis.

      We agree with the reviewer that how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further exploration. So far there is no clear explanation of this, which is why we used RNA-seq and heatmap analysis to examine other genes expression which could help to uncover the mechanism underlying these results. Interestingly, Che et al’s result showed that TGFB signaling (P-Smad3) increased in Cbfβf/fAggrecan-cre mice, while our data showed that TGFB signaling (both PSmad3 and Smad3) decreased in Cbfβf/fCol2α1-CreERT mice as shown in our results in Figure 8. These results were also confirmed by RNA-seq analysis as shown in the heatmaps in figure 5.

      These differences could be the result of different mouse ages used in our study and Che et al’s study.

      1. Blaney Davidson EN, van de Loo FA, van den Berg WB, van der Kraan PM. How to build an inducible cartilagespecific transgenic mouse. Arthritis Res Ther. 2014;16(3):210.

      2. Tong L, Yu H, Huang X, Shen J, Xiao G, Chen L, et al. Current understanding of osteoarthritis pathogenesis and relevant new approaches. Bone Res. 2022;10(1):60.

      Reviewer #3 (Public Review):

      The authors comprehensively demonstrated the Cbfβ gene, which is involved in articular cartilage homeostasis, can promote articular cartilage regeneration and repair in osteoarthritis (OA) through regulating Hippo/YAP signaling TGF-β signaling, and canonical Wnt signaling. First, the authors demonstrated the deletion of Cbfβ can induce the OA phenotypes including decreased articular cartilage and osteoblasts, and increased osteoclasts and subchondral bone hyperplasia, and induce the early onset of OA. Additionally, the authors showed that the deficiency of Cbfβ in cartilage can increase canonical Wnt signaling and decrease TGF-β and Hippo signaling. Finally, the authors demonstrated that the overexpression of Cbfβ can inhibit Wnt signaling and enhance Hippo/YAP signaling in knee joints articular cartilage of ACLT-induced OA mice and protect against ACLT-induced OA. The manuscript is overall well-constructed, and the authors provided evidence to support their findings.

      In Fig. 7I, it could be better to show the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      We thank the reviewer for bringing this to our attention. In the revised figure 7I, we have included the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      In Fig. 9H-K, in the quantification analysis, the OARSI score in the DMM+AAV-YFP group is higher than in the sham group significantly. However, the SO staining results appear to show no significant difference between the DMM+AAV-luc-YFP group (Fig. 9I) and the sham group (Fig. 9H).

      We thank the reviewer for bringing this to our attention. Although both the sham and DMM+AAV-luc-YFP group stain positive for SO, the SO stain intensity of the DMM+AAV-lucYFP group is noticeably lower. In addition, SO staining is not the only parameter which is included in the OARSI score. We also evaluated the cartilage thickness, proteoglycan structure, and Cartilage surface fibrillation index. Our evaluation to determine the OARSI score relies on the qualities of the whole joint, not only the magnified portion. For convenience we have also outlined the region of positive SO stain in the revised figure 9I

    1. Author Response

      eLife assessment

      This important study provides a new, apparently high-performance algorithm for B cell clonal family inference. The new algorithm is highly innovative and based on a rigorous probabilistic analysis of the relevant biological processes and their imprint on the resulting sequences, however, the strength of evidence regarding the algorithm's performance is incomplete, due to (1) a lack of clarity regarding how different data sets were used for different steps during algorithm development and validation, resulting in concerns of circularity, (2) a lack of detail regarding the settings for competitor programs during benchmarking, and (3) method development, data simulation for method validation, and empirical analyses all based on the B cell repertoire of a single subject. With clarity around these issues and application to a more diverse set of real samples, this paper could be fundamental to immunologists and important to any researcher or clinician utilizing B cell receptor repertoires in their field (e.g., cancer immunology).

      We apologize for the long delay in implementing the suggested changes. Some of the co-authors had some personal issues that made it hard to efficiently work on the revision.

      We have addressed all the essential points below, as well as all the detailed comments of each reviewer in the following pages.

      Due to the journal’s guidelines we have to upload an “all black” version of the manuscript as the main version. We have uploaded a revised manuscript with the changes marked in red as a “Related Manuscript file”, which appears at the very end of the Merged Manuscript File, after all the Figures, and at the end of the list of files on the webpage. We apologize for this inconvenience.

      In addition, we have added an extension of HILARy to deal with paired-chain repertoires, and have benchmarked the new method on a recently published synthetic dataset. This new analysis is now presented in new Fig. 5.

      Reviewer #1 (Public Review):

      Identifying individual BCR/Ab chain sequences that are members of the same clone is a longstanding problem in the analysis of BCR/Ab repertoire sequencing data. The authors propose a new method designed to be scalable for application to huge repertoire data sets without sacrificing accuracy. Their approach utilizes Hamming Distance between CDR3 sequences followed by clustering for a fast, high-precision approach to classifying pairs of sequences as related or not, and then refines the classification using mutation information from germline-encoded regions. They compare their method with other state-of-the-art methods using synthetic data.

      The authors address an important problem in an interesting, innovative, and rigorous way, using probabilistic representations of CDR3 differences, frequencies of shared and not-shared mutations, and the relationships between the two under hypotheses of related pairs and unrelated pairs, and from these develop an approach for determining thresholds for classification and lineage assignment. Benchmarking shows that the proposed method, the complete method including both steps, outperforms other methods.

      Strengths of the method include its theoretical underpinnings which are consistent with an immunologist's intuition about how related and unrelated sequences would compare with each other in terms of the metrics to use and how those metrics are related to each other.

      I have two high-level concerns:

      (1) It isn't clear how the real and synthetic data are being used to estimate parameters for the classifier and evaluate the classifier to avoid circularity. It seems like the approach is used to assign lineages in the data from [1], and then properties of this set of lineages are used to estimate parameters that are then used to refine the approach and generate synthetic data that is used to evaluate the approach. This may not be a problem with the approach but rather with its presentation, but it isn't entirely clear what data is being used and where for what purpose. An understanding of this is necessary in order to truly evaluate the method and results.

      The reviewer is correct in their understanding of the pipeline. It should be stressed that the lineages used to guide the generation of the synthetic data was done on VJl classes for which the clustering was easy and reliable, and should therefore be largely model independent.

      We have added an explanation in the main text of why the re-use of real data lineages inferred by HILARy doesn’t bias the procedure, since it’s done on a subset of lineages within VJl classes that are easy to infer (section “Test on synthetic dataset”).

      (2) Regarding the data used for benchmarking - given the intertwined fashion by which the classification approach and synthetic data generation approach appear to have been developed, it is not surprising that the proposed approach outperforms the other methods when evaluated on the synthetic data presented here. It would be better to include in the benchmark the data used by the other methods to benchmark themselves or also generate synthetic data using their data generation procedures.

      We agree with the reviewer that a test of the method on an independent synthetic dataset is important for its applicability and to compare to other methods.

      We have added a new synthetic dataset from the group that designed the partis method to our benchmark. Our method still performs competitively, on par with partis—which was developed and tested on that dataset—and better than other methods. The results are presented in revised Fig. 4 (panels E-G), and Figure 4–figure supplement 1 as a function of the mutation rate.

      In addition, we have used that dataset to benchmark a new version of HILARy that also uses the light chain. We present the results in new Figures 5 and Figure 4–figure supplement 1.

      An improved method for BCR/Ab sequence lineage assignment would be a methodologic advancement that would enable more rigorous analyses of BCR/Ab repertoires across many fields, including infectious disease, cancer, autoimmune disease, etc., and in turn, enable advancement in our understanding of humoral immune responses. The methods would have utility to a broad community of researchers.

      Reviewer #2 (Public Review):

      This manuscript describes a new algorithm for clonal family inference based on V and J gene identity, sequence divergence in the CDR3 region, and shared mutations outside the CDR3. Specifically, the algorithm starts by grouping sequences that have the same V and J genes and the same CDR3 length. It then performs single-linkage clustering on these groups based on CDR3 Hamming distance, then further refines these groups based on shared mutations.

      Although there are a number of algorithms that use a similar overall strategy, a couple of aspects make this work unique. First, a persistent challenge for algorithms such as this one is how to set a cutoff for single-linkage clustering: if it is too low, then one separates clusters that should be together, and if too high one joins together clusters that should be separate. Here the authors leverage a rich collection of probabilistic tools to make an optimal choice. Specifically, they model the probability distributions of within- and between-cluster CDR3 Hamming distances, with parameters depending on CDR3 length and the "prevalence" of clonal sequence pairs (i.e. family size distribution). This allows the algorithm to make optimal choices for separating clusters, given the particular chosen distance metric, and assuming the sample in question has been accurately modeled. Second, the algorithm uses a highly efficient means of doing single-linkage clustering on nucleotide sequences.

      This leads to a fast and highly performant algorithm on data meant to replicate the original sample used in algorithm design. The ideas are new and beautifully developed. The application to real data is interesting, especially the point about dN/dS.

      However, the paper leaves open the question of how this inference algorithm works on samples other than the one used for simulation and as a template for validation. If I understand the simulation procedure correctly - that one takes a collection of inferred trees from the real data, then re-draws the root sequence and the identity of the mutations on the branches - then the simulated data should be very close to the data used to develop the methods in the paper. This consideration seems especially important given that key methods in this paper use mutation counts and overall mutation counts are preserved.

      Repertoires come in all shapes and sizes: infants to adults, healthy to cancerous, and naive to memory to plasma-cell-just-after-vaccination. If this is being proposed as a general-purpose clonal inference algorithm rather than one just for this sample, then a more diverse set of validations are needed.

      We agree that testing the method on a differently generated dataset is a useful check. We should point out, however, that our synthetic dataset is not as biased as it may seem. In particular, it is based on trees from VJl classes that we predicted are very easy to cluster, which means that they are truly faithful to the data, and not dependent on the particular algorithm used to infer them. The big advantage over this synthetic dataset over others is that it recapitulates the power law statistics of clone size distribution, as well as the diversity of mutation rates. To us, it still represents a more useful benchmark than synthetic datasets generated by population genetics models, which miss most of this very broad variability.

      However, to check how the method generalizes to other datasets, we repeated our validation procedure on the dataset used to evaluate Partis in Ralph et al 2022. The new results are discussed in the main text and in new panels of Fig. 4 in the same form as the previous comparisons. We also added a comparison of performance as a function of mutation rate in the new Figure 4–figure supplement 1.

      It is unclear how to run the code. The software repo has a nice readme explaining the file layout, dependencies, and input file format, but the repo seems to be lacking an inference.ipynb mentioned there which runs an analysis. Perhaps this is a typo and refers to inference.py, which in addition to the documented cdr3 clustering, seems to have functions to run both clustering methods. However, it does not seem to have any documentation or help messages about how to run these functions.

      We have completely overhauled the github to provide a detailed step by step explanation of how to run the code. The code is now easily installable using pip.

      The results are not currently reproducible, because the simulated data is not available. The data availability statement says that no data have been generated for this manuscript, however simulated data has been generated, and that is a key aspect of the analysis in the paper.

      We have uploaded the simulated data to zenodo, as well as provided scripts in the github to run the benchmarks.

      More detail is needed to understand the timing comparisons. The new software is clearly written to use many threads. Were the other software packages run using multiple threads? What type of machine was used for the benchmarks?

      All timing comparisons were made based on a single VJl class on a 14 double-threaded CPU computer. HILARy uses all 28 threads, and other methods were run with default settings, with multi-threading allowed.

      We have clarified the specifications of the computer.

      Reviewer #3 (Public Review):

      B cell receptors are produced through a combination of random V(D)J recombination and somatic hypermutation. Identifying clonal lineages - cells that descend from a common V(D)J rearrangement - is an important part of B cell repertoire analysis. Here, the authors developed a new method to identify clonal lineages from BCR data. This method builds off of prior advances in the field and uses both an adaptive clonal distance threshold and shared somatic hypermutation information to group B cells into clonal lineages.

      The major strength of this paper is its thorough quantitative treatment of the subject and integration of multiple improvements into the clonal clustering process. By their simulation results, the method is both highly efficient and accurate.

      The only notable weakness we identified is that much of the impact of the method will depend on its superiority to existing approaches, and this is not convincingly demonstrated by Fig. 4. In particular, little detail is given on how the other clonal clustering programs were run, and this can significantly impact their performance. More specifically:

      We have added a new benchmark to address these concerns, presented in Fig. 4 and in new figure 4 – figure supplement 1 as a function of a controllable mutation rate.

      (1) Scoper supports multiple methods for clonal clustering, including both adaptive CDR3 distance thresholds (Nouri and Kleinstein, 2018) and shared V-gene mutations (Nouri and Kleinstein, 2020). It is not clear which method was used for benchmarking. The specific functions and settings used should have been detailed and justified. Spectral clustering with shared V gene mutations would be the most comparable to the authors' method. Similar detail is needed for partis.

      In the updated version I use the 2020 version. The 2018 is very similar to simple single linkage so will be removed from the benchmark.

      (2) It is not clear how the adaptive thresholds and shared mutation analysis in the authors' method differ from prior approaches such as scoper and partis.

      We have changed the paragraph in the discussion section about the benchmark to highlight the innovative aspects and differences with previous approaches.

      (3) The scripts for performing benchmarking analyses, as well as the version numbers of programs tested, are not available.

      We have added to the github all the scripts used for benchmarking. We have added details about the version numbers in the data and code availability section of the methods.

      (4) Similar to above, P. 10 describes single linkage hierarchical clustering with a fixed threshold as a "crude method" that "suffers from inaccuracy as it loses precision in the case of highlymutated sequences and junctions of short length." As far as we could tell, this statement is not backed up by either citations or analyses in the paper. It should not be difficult for the authors to test this though using their simulations, as this method is also implemented in scoper.

      We have added this method to our benchmark to support that point. The results are presented in Figure 4 – figure supplement 2.

      References

      Nouri N, Kleinstein SH. 2020. Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data. PLOS Comput Biol 16:e1007977. doi:10.1371/journal.pcbi.1007977

      Nouri N, Kleinstein SH. 2018. A spectral clustering-based method for identifying clones from high- throughput B cell repertoire sequencing data. Bioinformatics 34:i341-i349. doi:10.1093/bioinformatics/bty235

      We have changed citation [22] to refer to the 2018 paper. The 2020 paper is citation [18].

    1. Author Response

      We acknowledge the editors and reviewers for their careful and thoughtful review of the preprint. Their comments and suggestions will be very useful in improving the manuscript's revised version, which we plan to submit in the coming weeks.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewing Editor

      We thank you for clarifying several of the questions raised by the reviewers. Since the study has otherwise largely stayed unchanged, we will leave the eLife assessment as “before”:

      We respectfully disagree because we addressed all concerns raised by the two reviewers except one (below), which was not satisfactorily answered according to reviewer 1; it has now been addressed (new S3 Fig).

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed most of my previous comments. However, there is one important point that was not satisfactorily addressed "The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided" The response that "It is not straightforward to quantify and describe the intensity of the bands of these numerous with different fate outcomes." In the revision, they mentioned at least three repeats were performed. If so, it's not entirely clear why they couldn't quantify the western blots results. Including quantitative data will strengthen the rigor of the findings.

      Quantitative data from Fig. 4 and Fig. 5 are now provided as S3 Fig and described in the manuscript (lines 170-175; 184-188).

    2. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) It is not entirely clear why a tumor-free model is chosen to study immune responses, as immune responses can differ significantly with or without tumor-bearing. A more detailed explanation is needed.

      We appreciate the question. As stated in the original submission, tumor-free mouse models are commonly used to assess off-target outcomes of anti-neoplastic therapies. We have expanded on this point and acknowledged this shortcoming in the revised manuscript (lines 264-265).

      (2) Immune responses in isolated macrophages, neutrophils, and bone marrow cells require priming with LPS, while such responses are not observed in vivo. There is no explanation for these differences.

      The reviewer raises an excellent point. The assembly of inflammasomes such as those nucleated by NLRP3 requires priming signals, which increase the levels of this sensor, which are kept low in homeostatic conditions to prevent spontaneous unwanted inflammation. While LPS is commonly used in vitro as an inducer of priming signals, these cues are triggered in vivo by various molecules, including pro-inflammatory cytokines. We have provided a rationale for the use of LPS in vitro in the revised manuscript (lines 144-145).

      (3) The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided. This additional information is recommended.

      While caspase-1, caspase-3, GSDMD, and GSDME but not AIM 2 and NLRP3 are activated upon proteolytic cleavage. It is not straightforward to quantify and describe the intensity of the bands of these numerous with different fate outcomes. We regret for not mentioning the numbers of repeats in the original submission. This information has now been provided in figure legends where necessary.

      (4) Many abbreviations are used throughout the text, and some of the full names are not provided.

      Full names are required at the first introduction.

      We agree. We have provided full names at the first introduction (lines 21, 23, 86).

      (5) Fig. 5B needs a label on the X axis.

      We regret the confusion: X axis was for both Fig. 5B and 5C. We have made the change in the new Fig. 5.

      Reviewer #2:

      The following specific points could be addressed to further improve the quality of the manuscript:

      (1) Concerning data presented in Figure 1, 3D micro-CT reconstructions of the entire femurs could be shown instead of just the trabecular bone. Data on cortical bone loss are important. It would be important to show histological (sagittal) sections of the bones at baseline, treated with Doxorubicin or vehicle, and quantify osteoblasts in addition to osteoclasts. Is there increased bone marrow adiposity in Doxorubicin-treated mice? The data with vehicle should be shown in the main figures not just in the supplemental data.

      We thank the reviewer for the suggestion. We have now provided 3 D micro-CT reconstructions of a representative femur containing both trabecular and cortical bones (S1B Fig). Only the metaphyseal area is shown because we did not originally scan the entire femur.

      Quantification of osteoblast number is not a reliable measurement, the reason why we carried out dynamic histomorphometry to assess the effect of doxorubicin on bone formation (original S1D Fig/new S1E Fig).

      Unfortunately, we did not determine the effects of doxorubicin on bone marrow adiposity. However, to address the reviewer’s comment, we have mentioned in the revised manuscript adipogenic effects of doxorubicin based on the literature (lines 264-265).

      (2) Concerning data presented in Figure 2, how long after Doxorubicin injection is leukopenia observed (beyond the 72-hour timepoint)? Does cell-count return to baseline 4 weeks after treatment (when the bone phenotype is characterized)? Why use 12-week-old mice here and 10week-old animals for the rest of the study?

      We appreciate the question. We did not measure leukopenic effects of doxorubicin beyond the 72-hour timepoint based on the following: i) bones are analyzed in mice injected only once with a single dose of doxorubicin; ii) leukopenia is a side effect of doxorubicin whose blood levels should be undetectable 4 weeks after its administration although we did not measure them experimentally. Our premise is that osteopenia observed in doxorubicin-exposed mice is the result of early events that occur after the administration of the drug.

      We apologize for the confusion. We assessed baseline bone mass by VivaCT using 10-week-old mice; doxorubicin was injected 2 weeks when mice were 12-week-old. We have clarified this point in the revised manuscript (line 301).

      (3) It would be important to evaluate local inflammation in bones collected from wild-type and mutant mice. Are ASC specks, Cit-H3, and MPO present in the bone marrow? The expression of some components of the inflammasomes or relevant pathways could be assessed in bone samples deprived of bone marrow and in the bone marrow.

      This is a good point. Although we were not able to reliably measure Cit-H3 and MPO in bone marrow fluid, our data shown in Figs. 3-6, 7A-D are from bone marrow cells.

      (4) Data presented in western blots should be quantified. The ratio of signal intensity obtained for beta-actin over the signal obtained for a given protein should be calculated for each experimental condition (especially in Figure 5, where beta-actin levels fluctuate a lot).

      Please see the response to question #1. Fluctuations in β-actin levels are likely related to doxorubicin cytotoxic effects as mentioned in the original submission (lines 150, 194, 253). Despite this caveat, IL-1β levels are stimulated by this drug.

      (5) In Figure 7, BV/TV of WT and mutant mice at baseline should be quantified and shown. Sagittal histological sections of the femur should be shown. 3D micro-CT reconstructions of the entire femur could be shown instead of just the trabecular bone. Osteoblasts and bone resorption should be quantified. Data obtained with vehicle should be quantified and shown in the main figure. The control and LPS conditions should be better defined. Does it include vehicle?

      Please see the response to reviewer 1’s question #1.

      We have now provided 3 D micro-CT reconstructions of a representative femur containing both trabecular and cortical bone (S3A, B Fig).

      LPS was dissolved in PBS (vehicle), which was used as control. We have now replaced vehicle with PBS in Fig. 7.

      (6) For all figures, the number of biological replicates should be mentioned in the legends, as well as the statistical tests used for the analyses.

      We have now included this information in the legends where necessary.

      (7) Some of the scientific rationales are not totally clear and could be better explained in the text. For example, it is written on page 6 "studies mainly on male mice and revolved around innate immune responses" and "we focused on neutrophils because of their high turnover rate and short lifespan", but it is not clear why. The rationale (page 10) for assessing bone mass in "mice globally lacking AIM2 and/or NLRP3" is not totally clear either. The argument is that systemic inflammation leads to bone loss but the effects obtained with the total ablation of AIM2 and NLRP3 do not prove strictly speaking that systemic inflammation really matters (in this current study, although we know from many other studies that it clearly does matter). We could imagine, for example, that bone mass would be preserved in AIM2 KO mice only because the inflammasome is impaired in osteoblasts and/or osteoclasts, but not in any other cell types. Conversely one could imagine that bone would be preserved only because inflammation is preserved in the gut, for example. The use of global knockouts unfortunately does not tell us much about the importance of systemic versus local effects of the inflammasomes. It shows that reducing inflammation, either in specific organs or globally, limits bone loss in doxorubicin-treated mice. This result is important but it was fully expected since doxorubicin has been reported to induce systemic inflammation, and since many studies have shown that systemic inflammation leads to bone loss.

      We appreciate the comments. We have clarified the rationale for focusing on neutrophils (lines 129-130) and AIM2 and NLRP inflammasomes (lines 209-211). We have also now down played the concept of inflammasome-mediated systemic inflammation in doxorubicin-induced bone loss.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Roget et al. build on their previous work developing a simple theoretical model to examine whether ageing can be under natural selection, challenging the mainstream view that ageing is merely a byproduct of other biological and evolutionary processes. The authors propose an agent-based model to evaluate the adaptive dynamics of a haploid asexual population with two independent traits: fertility timespan and mortality onset. Through computational simulations, their model demonstrates that ageing can give populations an evolutionary advantage. Notably, this observation arises from the model without invoking any explicit energy tradeoffs, commonly used to explain this relationship.

      The model’s results are based on both numerical simulations and formal mathematical analysis.

      Additionally, the theoretical model developed here indicates that mortality onset is generally selected to start before the loss of fertility, irrespective of the initial values in the population. The selected relationship between the fertility timespan and mortality onset depends on the strength of fertility and mortality effects, with larger effects resulting in the loss of fertility and mortality onset being closer together. By allowing for a trans-generational effect on ageing in the model, the authors show that this can be advantageous as well, lowering the risk of collapse in the population despite an apparent fitness disadvantage in individuals. Upon closer examination, the authors reveal that this unexpected outcome is a consequence of the trans-generational effect on ageing increasing the evolvability of the population (i.e., allowing a more effective exploration of the parameter landscape), reaching the optimum state faster.

      The simplicity of the proposed theoretical model represents both the major strength and weakness of this work. On one hand, with an original and rigorous methodology, the logic of their conclusions can be easily grasped and generalised, yielding surprising results. Using just a handful of parameters and relying on direct competition simulations, the model qualitatively recapitulates the negative correlation between lifespan and fertility without requiring energy tradeoffs. This alone makes this work an important milestone for the rapidly growing field of adaptive dynamics, opening many new avenues of research, both theoretically and empirically.

      We thank the reviewers and editor for highlighting the importance of the work presented here.

      On the other hand, the simplicity of the model also makes its relationship with living organisms difficult to gauge, leaving open questions about how much the model represents the reality of actual evolution in a natural context.

      We presented both in results and discussion how the mathematical trade-offs between fertility and survival time give rise to (xb, xd) configuration representative of existing aging modes.

      In particular, a more explicit discussion of how the specifics of the model can impact the results and their interpretation is needed. For example, the lack of mechanistic details on the trans-generational effect on ageing makes the results difficult to interpret.

      We discussed the role of the transgenerational Lansing effect played to its function, there is no need for a particular mechanism beyond that function of transgenerational negative effect. We reinforce this in the discussion by adding the following sentence “Regarding the nature of the transgenerational effect, our model is agnostic and the mere transmission of any negative effect would be sufficient to exert the function.“

      Even if analytical results are obtained, most of the observations appear derived from simulations as they are currently presented. Also, the choice of parameters for the simulations shown in the paper and how they relate to our biological knowledge are not fully addressed by the authors.

      The long time limit of the system with and without the Lansing effect is based on analytical results later confirmed using numerical simulations. The choice of parameters is explained in the introduction as being the minimum ones for defining a living organism. As for the parameters’ values, our numerical analysis gives a solution for any ib, id, xb and xd on R+, making the choice of initial value a mere random decision.

      Finally, the conclusions of evolvability are insufficiently supported, as the authors do not show if the wider genotypic variability in populations with the ageing trans-generational effect is, in fact, selected.

      We do not show nor claim that evolvability per se is selected for but that the apparent advantage given by this transgenerational effect seems to be mediated by an increased genotypic/phenotypic variability conferred to the lineage that we interpreted as evolvability.

      Recommendations for the authors

      (1) The authors could use the lineage tracing results for the evolvability aspect. Specifically, within subpopulations featuring the Lansing effect, it would be valuable to explore whether individuals with parental age greater than the mortality onset (a > x_d) demonstrate higher fitness compared to individuals with a < x_d. Additionally, an examination of how this variation evolves over time could provide further insights into the dynamics of the proposed model.

      We thank the reviewer for this suggestion. This is an ongoing work in the group, especially in the context of varying environmental conditions.

      (2) In all simulations, I_b = I_d = 1, resulting in total fertility (x_b * I_b) equating to x_b, while x_d is proportional to life expectancy. Considering an exploration of the implications of this parameter setting, the authors could frame x_d as a 'lifespan cost', potentially allowing for the model to be conceptualised in terms of energetic tradeoffs. This might offer additional perspectives on the dynamics of the model and its alignment with biological principles.

      We discuss how the apparent trade-offs given by the model depending on ib and id values can be related to the interpretation of such trade-offs that has been accepted for most of the past century. Our claim here in the discussion is that one does not need such energetic trade-off for the fertility/longevity trade-offs to appear. Such energetic trade-off is not a “biological principle” but merely an accepted interpretation of a fertility/longevity trade-off that is not even a general mechanism.

      (3) Considering the necessity of variation in x_d for the observed patterns, an exploration could be undertaken by the authors to examine a model where x_d is simply variable without inheritance. This could involve centring x_d at some value d with some variance σ_d for all individuals. In such a scenario, it may be observed whether the same convergence of x_b - x_d occurs without requiring x_d to be selected. Furthermore, similar consequences of the Lansing effect could potentially be identified.

      This was done early on during our work and did not show any major changes in the model’s behaviour beyond the time of convergence. We did not include it to the final manuscript because of the low added value to an already long and complex manuscript.

      (4) While it may not be necessary to alter the model itself, it is suggested that the authors consider acknowledging the potential consequences of certain modelling decisions that might be perceived as biologically unrealistic. Notable examples include assumptions such as fertility from birth and zero mortality prior to x_d. These assumptions, such as infertility from birth, could be viewed as distinctive features, and it might be worth mentioning that parental care of offspring could have co-evolved with such features. This is particularly relevant considering the energy tradeoff hypothesis that has been postulated.

      Although inspired from results obtained in Drosophila, mice, nematodes and zebrafish, the model is so far haploid and asexual, thus involving individuals likely more similar to unicellulars. In these conditions, infertility from birth did not seem relevant to us. However, the model and codes are accessible online and we hope that others will use it to address such questions. It is interesting though to notice that ageing appears here without such constraint.

      Additionally, the consideration that all organisms face a non-trivial mortality rate at every age, not solely from physiological causes, reflects the reality within which selection operates.

      We thought this was the best way to reflect, an environment with a limited carrying capacity. A more complex model is under construction to take into account the fact that older individuals might be more sensitive to it than younger ones.

      (5) While acknowledging the technical rigour applied by the authors, it is suggested that further attention be given to conducting a comprehensive 'reality check' associated with the chosen parameters, particularly regarding the biological relevance of the results. For instance, the authors argue that offspring of old organisms do not, on average, live similarly to their parents. However, it is noted that studies in the haploid asexual organism yeast, akin to what the authors model (albeit not necessarily yeast), revealed that the average lifespan of yeast progeny born from young or old mothers is very similar.

      We do not claim that progeny of old parents live less long than that of younger parents on average, we say that it happens in the progeny of physiologically old parents, representing at most 10% of the population in our numerical simulations.

      The authors cite experimental evolution in Drosophila progeny conceived later in the life of the parent, indicating that the onset of mortality in these progeny occurs late, sometimes even after the end of the fertility period (Burke et al., 2016; Rose et al., 2002). While the authors report their own previous studies with divergent results, independent experiments have suggested an increase of x_d following an artificial increase of x_b (Luckinbill and Clare, 1985; Sgro et al., 2000). A more in-depth consideration of these contrasting observations and their potential implications for the current model could enhance the overall robustness of the study.

      The increase of x_d following an artificial increase of x_b is predicted by our model as discussed. The divergence of observations between studies is alas hard to assess.

      (6) To enhance readability and maintain consistency, it is suggested that the authors homogenise the description of key parameters, specifically x_b and x_d, throughout the text. This could contribute to improved clarity and rigour. One recommendation is to refer to x_b consistently as the 'fertility span' and x_d as the 'mortality onset' for the sake of uniformity in terminology.

      We have modified the text accordingly.

      (7) At various points in the text, the assertion is made that observations have indicated a tradeoff between fertility and longevity. It is recommended that the authors provide references or data to substantiate this claim. This addition would contribute to the empirical grounding of the mentioned tradeoff and strengthen the overall support for the assertions made in the study.

      We added the following references to the discussion Lemaitre et al., 2015, Kirkwood, 2005 and Rodrigues and Flatt 2016.

      (8) The statement claiming that the model is 'able to describe all types of ageing observed in the wild' should be moderated. As the authors themselves acknowledge, the model is referred to as a 'toy model,' and it is made clear that it cannot capture, nor is intended to capture, the entire diversity observed in life. Adjusting this statement to reflect the limited scope and purpose of the model would enhance precision and accuracy in the presentation of its capabilities.

      Although a toy model, its possible configurations encompass all the possible configurations described so far across the diversity of ageing throughout the tree of life from negligible senescence with no loss of fertility (x_b and x_d >> 0) to menopause-like configurations (x_b >> x_d) through fast mortality increase post reproduction (x_b = x_d). Replacing our current square functions would allow age-dependant decrease or increase of fertility and/or risks of mortality onsets.

      (9) To bolster the biological relevance of the study, it is strongly recommended that the authors cross-check the results of their simulations with previously published experimental findings. This approach would serve to strengthen the alignment between the model outcomes and observed biological phenomena. Additionally, placing greater emphasis on the biological relevance aspects throughout the text would contribute to a more robust and comprehensive exploration of the study's implications.

      In the present manuscript we have tried to cite a certain number of results from artificial selection experiments on life history traits in order to strengthen the interpretations of our model’s behaviour. There are numerous other studies, going in the same direction or not, but we do not think that it would be relevant to add an exhaustive list of them. Nevertheless, we added Stearns et al., 2000 that adds extrinsic high mortality to the evolution of life history traits.

      (1) For enhanced clarity, it is suggested that the x-axis in Figure 1 be labelled as 'age.' Considering this adjustment could contribute to clearer visual communication of the data.

      We agree with the reviewer and modified the figure accordingly.

      (!!) The addition of graphical legends is recommended for Figures 3-5, as well as the supplementary figures. Including these legends would provide essential context and improve the interpretability of the figures for readers.

      We agree with the reviewer and modified the figure accordingly.

      (12) For improved distinction of the ranges indicated by quantiles in Figure 3, it is suggested that the authors consider enhancing visual clarity. One approach could involve making the middle quantile thicker or using a different line type. Additionally, it is recommended to explore the calculation of the highest density 90% intervals rather than the 1-9 deciles. This adjustment could contribute to a clearer representation of the data distribution in the figure.

      We named the different deciles directly on the figure to improve readability.

      (13) It is observed that the mathematical proofs in Annex 1 are not displaying properly in the PDF. Additionally, there seem to be missing and broken references for the Annex. This issue may be related to LaTeX formatting. The authors could consider revisiting the formatting of Annex 1 to ensure the correct display of mathematical proofs and address the referencing concerns, possibly by checking and rectifying any LaTeX-related issues.

      The latex file of the supplementary was not correctly compiled. It is now corrected.

      (14) There is inconsistency in the text regarding the reference to the Annex, with both 'Annex' and 'Annexe' being used interchangeably. To maintain uniformity, it is suggested that the authors consistently use either 'Annex' or 'Annexe' throughout the text. This adjustment would contribute to a more polished presentation of the supplementary material.

      We corrected them accordingly.

      (15)There appears to be a typographical error in the name of Supplementary Figure 3.

      We corrected it accordingly.

    1. Author Response

      eLife assessment

      The authors present evidence that small extracellular vesicles can be secreted from cells inside larger vesicles that they call amphiectosomes, which then tear to release their small vesicle contents. There are questions and concerns relating to the quality of the data and the in vivo significance of the observations. The findings are potentially important but the data are incomplete and the claims are only partially supported.

      We agree that the in vivo significance and details of the molecular background of amphiectosome release remains to be studied further. However, as Reviewer 2 indicated, our data in this Short Report may have a substantial impact on our understanding of EV biogenesis. Therefore, we considered it was important to publish our data as soon as possible because it may significantly impact other EV biogenesis studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      (1) Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometre.

      (1) When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      (2) Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      (3) In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      (4) In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      (5) Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3_S2B).

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2_S4, respectively.

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In the Supplementary figure Figure 2-S4 we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2_S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO4 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2._S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO4 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H2O2 and NaBH4 to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force. This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.

      The conclusions of this preprint are mostly supported by the data except being noted below. Some descriptions need to be revised for accuracy.

      L219-L285: The conclusion that X-linked miR-506 family miRNAs are expanded via LINE1 retrotransposition is not supported by the data. LINE1s and SINEs are very abundant, accounting for nearly 30% of the genome. In addition, the LINE1 content of the mammalian X chromosome is twice that of the autosomes. One can easily find flanking LINE1/SINE repeat. Therefore, the analyses in Fig. 2G, Fig. 2H and Fig. S3 are not informative. In order to claim LINE1-mediated retrotransposition, it is necessary to show the hallmarks of LINE1 retrotransposition, which are only possible for new insertions. The X chromosome is known to be enriched for testis-specific multi-copy genes that are expressed in round spermatids (PMID: 18454149). The conclusion on the LINE1-mediated expansion of miR-506 family on the X chromosome is not supported by the data and does not add additional insights. I think that the LINE1 related figure panels and description (L219-L285) need to be deleted. In discussion (L557558), "...and subsequently underwent sequence divergence via LINE1-mediated retrotransposition during evolution" should also be deleted. This section (L219-L285) needs to deal only with the origin of miR506 from MER91C DNA transposons, which is both convincing and informative.

      Reply: Agreed, the corresponding sentences were deleted.

      Fig. 3A: can you speculate/discuss why the miR-506 expression in sperm is higher than in round spermatids?

      Reply: RNAs are much less abundant in sperm than in somatic or spermatogenic cells (~1/100). Spermborne small RNAs represent a small fraction of total small RNAs expressed in their precursor spermatogenic cells, including spermatocytes and spermatids. Therefore, when the same amount of total/small RNAs are used for quantitative analyses, sperm-borne small RNAs (e.g., miR-506 family miRNAs) would be proportionally enriched in sperm compared to other spermatogenic cells. We discussed this point in the text (Lines 550-556).

      **Reviewer #2 (Public Review):

      In this paper, Wang and collaborators characterize the rapid evolution of the X-linked miR-506 cluster in mammals and characterize the functional reference of depleting a few or most of the miRNAs in the cluster. The authors show that the cluster originated from the MER91C DNA transposon and provide some evidence that it might have expanded through the retrotransposition of adjacent LINE1s. Although the animals depleted of most miRNAs in the cluster show normal sperm parameters, the authors observed a small but significant reduction in litter size. The authors then speculate that the depletion of most miRNAs in the cluster could impair sperm competitiveness in polyandrous mating. Using a successive mating protocol, they show that, indeed, sperm lacking most X-linked miR-506 family members is outcompeted by wild-type sperm. The authors then analyze the evolution of the miR-506 cluster and its predicted targets. They conclude that the main difference between mice and humans is the expansion of the number of target sites per transcript in humans.

      The conclusions of the paper are, in most cases, supported by the data; however, a more precise and indepth analysis would have helped build a more convincing argument in most cases.

      (1) In the abstracts and throughout the manuscript, the authors claim that "... these X-linked miRNA-506 family miRNA [...] have gained more targets [...] " while comparing the human miRNA-506 family to the mouse. An alternative possibility is that the mouse has lost some targets. A proper analysis would entail determining the number of targets in the mouse and human common ancestor.

      Reply: This question alerted us that we did not describe our conclusion accurately, causing confusion for this reviewer. Our data suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis. In other words, mice never lost any targets compared to humans, but per the miR-506 family miRNA tends to target more genes in humans than in mice.

      We revised the text to more accurately report our data. The pertaining text (lines 490-508) now reads: “Furthermore, we analyzed the number of all potential targets of the miR-506 family miRNAs predicted by the aforementioned four algorithms among humans, mice, and rats. The total number of targets for all the X-linked miR-506 family miRNAs among different species did not show significant enrichment in humans (Fig. S9C), suggesting the sheer number of target genes does not increase in humans. We then compared the number of target genes per miRNA. When comparing the number of target genes per miRNA for all the miRNAs (baseline) between humans and mice, we found that on a per miRNA basis, human miRNAs have more targets than murine miRNAs (p<0.05, t-test) (Fig. S9D), consistent with higher biological complexity in humans. This became even more obvious for the X-linked miR-506 family (p<0.05, t-test) (Fig. S9D). In humans, the X-linked miR-506 family, on a per miRNA basis, targets a significantly greater number of genes than the average of all miRNAs combined (p<0.05, t-test) (Fig. S9D). In contrast, in mice, we observed no significant difference in the number of targets per miRNA between X-linked miRNAs and all of the mouse miRNAs combined (mouse baseline) (Fig. S9D). These results suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis.”

      We also changed “have gained” to “have” throughout the text to avoid confusion.

      (2) The authors claim that the miRNA cluster expanded through L1 retrotransposition. However, the possibility of an early expansion of the cluster before the divergence of the species while the MER91C DNA transposon was active was not evaluated. Although L1 likely contributed to the diversity within mammals, the generalization may not apply to all species. For example, SINEs are closer on average than L1s to the miRNAs in the SmiR subcluster in humans and dogs, and the horse SmiR subcluster seems to have expanded by a TE-independent mechanism.

      Reply: Agreed. We deleted the data mentioned by this reviewer.

      (3) Some results are difficult to reconcile and would have benefited from further discussion. The miR-465 sKO has over two thousand differentially expressed transcripts and no apparent phenotype. Also, the authors show a sharp downregulation of CRISP1 at the RNA and protein level in the mouse. However, most miRNAs of the cluster increase the expression of Crisp1 on a reporter assay. The only one with a negative impact has a very mild effect. miRNAs are typically associated with target repression; however, most of the miRNAs analyzed in this study activate transcript expression.

      Reply: Both mRNA and protein levels of Crisp1 were downregulated in KO mice, and these results are consistent with the luciferase data showing overexpression of these miRNAs upregulated the Crisp1 3’UTR luciferase activity. We agree that miRNAs usually repress target gene expression. However, numerous studies have also shown that some miRNAs, such as human miR-369-3, Let-7, and miR-373, mouse miR-34/449 and the miR-506 family, and the synthetic miRNA miRcxcr4, activate gene expression both in vitro (1, 2) and in vivo (3-6). Earlier reports have shown that these miRNAs can upregulate their target gene expression, either by recruiting FXR1, targeting promoters, or sequestering RNA subcellular locations (1, 2, 6). We briefly discussed this in the text (Lines 605-611).

      (4) More information is required to interpret the results of the differential RNA targeting by the murine and human miRNA-506 family. The materials and methods section needs to explain how the authors select their putative targets. In the text, they mention the use of four different prediction programs. Are they considering all sites predicted by any method, all sites predicted simultaneously by all methods, or something in between? Also, what are they considering as a "shared target" between mice and humans? Is it a mRNA that any miR-506 family member is targeting? Is it a mRNA targeted by the same miRNA in both species? Does the targeting need to occur in the same position determined by aligning the different 3'UTRs?

      Reply: Since each prediction method has its merit, we included all putative targets predicted by any of the four methods. The "shared target" refers to a mRNA that any miR-506 family member targets because the miR-506 family is highly divergent among different species. We have added the information to the “Large and small RNA-seq data analysis” section in Materials and Methods (Lines 871-882).

      (5) The authors highlight the particular evolution of the cluster derived from a transposable element. Given the tendency of transposable elements to be expressed in germ cells, the family might have originated to repress the expression of the elements while still active but then remained to control the expression of the genes where the element had been inserted. The authors did not evaluate the expression of transcripts containing the transposable element or discuss this possibility. The authors proposed an expansion of the target sites in humans. However, whether this expansion was associated with the expansion of the TE in humans was not discussed either. Clarifying whether the transposable element was still active after the divergence of the mouse and human lineages would have been informative to address this outstanding issue.

      Reply: Agreed. The MER91C DNA transposon is denoted as nonautonomous (7); however, whether it was active during the divergence of mouse and human lineages is unknown. To determine whether the expansion of the target sites in humans was due to the expansion of the MER91C DNA transposon, we analyzed the MER91C DNA transposon-containing transcripts and associated them with our DETs. Of interest, 28 human and 3 mouse mRNAs possess 3’UTRs containing MER91C DNA sequences, and only 3 and 0 out of those 28 and 3 genes belonged to DETs in humans and mice, respectively (Fig. S9E), suggesting a minimal effect of MER91C DNA transposon expansion on the number of target sites. We briefly discussed this in the text (Lines 511-518).

      Post-transcriptional regulation is exceptionally complex in male haploid cells, and the functional relevance of many regulatory pathways remains unclear. This manuscript, together with recent findings on the role of piRNA clusters, starts to clarify the nature of the selective pressure that shapes the evolution of small RNA pathways in the male germ line.

      Reply: Agreed. We appreciate your insightful comments.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.

      Strengths:

      This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.

      Weaknesses:

      The authors specifically addressed the function of 5 clusters of X-link miR-506 family containing 19 miRNAs. There is another small cluster containing 3 miRNAs close to the Fmr1 locus. Would this small cluster act in concert with the 5 clusters to regulate spermatogenesis? In addition, any autosomal miR-506 like miRNAs may compensate for the loss of X-linked miR-506 family. These possibilities should be discussed.

      Reply: The three FmiRs were not deleted in this study because the SmiRs are much more abundant than the FmiRs in WT mice (Author Response image 1, heatmap version of Fig. 5C). Based on small RNA-seq, some FmiRs, e.g., miR-201 and miR-547, were upregulated in the SmiRs KO mice, suggesting that this small cluster may act in concert with the other 5 clusters and thus, worth further investigation. To our best knowledge, all the miR-506 family miRNAs are located on the X chromosome, although some other miRNAs were upregulated in the KO mice, they don’t belong to the miR-506 family. We briefly discussed this point in the text (Lines 635-638).

      Author response image 1.

      sRNA-seq of WT and miR-506 family KO testis samples.

      Direct molecular link to sperm competitiveness defect remains unclear but is difficult to address.

      Reply: In this study, we identified a target of the miR-506 family, i.e. Crisp1. KO of Crisp1 in mice, or inhibition of CRISP1 in human sperm (7, 8), appears to phenocopy the quinKO mice, displaying largely normal sperm motility but compromised ability to penetrate eggs. The detailed mechanism warrants further investigation in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 84-85: "Several cellular events are unique to the male germ cells, e.g., meiosis, genetic recombination, and haploid male germ cell differentiation (also called spermiogenesis)". This statement is not accurate. Please revise. Meiosis and genetic recombination are common to both male and female germ cells. They are highly conserved in both sexes in many species including mouse.

      Reply: Agreed. We have revised the sentence and it now reads: “Several cellular events are unique to the male germ cells, e.g., postnatal formation of the adult male germline stem cells (i.e., spermatogonia stem cells), pubertal onset of meiosis, and haploid male germ cell differentiation (also called spermiogenesis) (9)” (Lines 83-86).

      Lines 163-164: "we found that Slitrk2 and Fmr1 were syntenically linked to autosomes in zebrafish and birds (Fig. 1A), but had migrated onto the X chromosome in most mammals". This description is not accurate. Chr 4 in zebrafish and birds is syntenic to the X chromosome in mammals. The term "migrated" is not appropriate. Suggestion: Slitrk2 and Fmr1 mapped to Chr 4 (syntenic with mammalian X chromosome) in zebrafish and birds but to the X chromosome in most mammals.

      Reply: Agreed. Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the significance statement, the authors mention that the mutants are "functionally infertile," although the decrease in competitiveness is partial. I suggest referring to them as "functionally sub-fertile."

      Reply: Agreed. Revised as suggested.

      (2) I will urge the authors to explain in more detail how some figures are generated and what they mean. Some critical information needs to be included in various panels.

      (2a) Figure S1. The phastCons track does not seem to align as expected with the rest of the figure. The highest conservation peak is only present in humans, and the sequence conserved in the sea turtle has the lowest phastCons score. I was expecting the opposite from the explanation.

      Reply: The tracks for phyloP and phastCons are the scores for all 100 species, whereas the tracks with the species names on the left are the corresponding sequences aligned to the human genome. We have revised our figure to make it clearer.

      (2b) Figure 2A and Figure S2C. Although all the functional analysis of the manuscript has been done in mice, the alignments showing sequence conservation do not include the murine miRNAs. Please include the mouse miRNAs in these panels.

      Reply: The mouse has Mir-506-P7 with the conserved miRNA-3P seed region, which was included in the lower panel in Figure S2C. However, mice do not have Mir-506-P6, which may have been lost or too divergent to be recognized during the evolution and thus, were not included in Figure 2A and the upper panel in Figure S2C.

      (2c) Figure S7H. The panel could be easier to read.

      Reply: Agreed. We combined all the same groups and turned Figure S7H (now Figure S6H) into a heatmap.

      (2d) The legend of Figure 6G reads, "The number of target sites within individual target mRNAs in both humans and mice ." Can the author explain why the value 1 of the human "Number of target sites" is connected to virtually all the "Number of target sites" values in mice?

      Reply: Sorry for the confusion. For example, for gene 1, we have 1 target site in the human and 1 target site in the mouse; but for gene 2, we have 1 target site in the human and multiple sites in the mouse; therefore, the value 1 is connected to more than one value in the mouse.

      Reviewer #3 (Recommendations For The Authors):

      CRISP1 and EGR1 protein localization in WT and mutant sperm by immunostaining would be helpful.

      Reply: Agreed. We performed immunostaining for CRISP1 on WT sperm, and the new results are presented in Figure S8D. CRISP1 seems mainly expressed in the principal piece and head of sperm.

      The detailed description of the generation of various mutant lines should be included in the Methods.

      Reply: We added more details on the generation of knockout lines in the Materials and Methods (686701).

      References:

      (1) S. Vasudevan, Y. Tong, J. A. Steitz, Switching from repression to activation: microRNAs can upregulate translation. Science 318, 1931-1934 (2007).

      (2) R. F. Place, L. C. Li, D. Pookot, E. J. Noonan, R. Dahiya, MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc Natl Acad Sci U S A 105, 1608-1613 (2008).

      (3) Z. Wang et al., X-linked miR-506 family miRNAs promote FMRP expression in mouse spermatogonia. EMBO Rep 21, e49024 (2020).

      (4) S. Yuan et al., Motile cilia of the male reproductive system require miR-34/miR-449 for development and function to generate luminal turbulence. Proc Natl Acad Sci U S A 116, 35843593 (2019).

      (5) S. Yuan et al., Oviductal motile cilia are essential for oocyte pickup but dispensable for sperm and embryo transport. Proc Natl Acad Sci U S A 118 (2021).

      (6) M. Guo et al., Uncoupling transcription and translation through miRNA-dependent poly(A) length control in haploid male germ cells. Development 149 (2022).

      (7) V. G. Da Ros et al., Impaired sperm fertilizing ability in mice lacking Cysteine-RIch Secretory Protein 1 (CRISP1). Dev Biol 320, 12-18 (2008).

      (8) J. A. Maldera et al., Human fertilization: epididymal hCRISP1 mediates sperm-zona pellucida binding through its interaction with ZP3. Mol Hum Reprod 20, 341-349 (2014).

      (9) L. Hermo, R. M. Pelletier, D. G. Cyr, C. E. Smith, Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73, 241-278 (2010).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present study by Berger et al. analyzes to what extent memory formation is dependent on available energy reserves. This has been dealt with extensively in the case of aversive memory formation, but only very sparsely in the case of appetitive memory formation. It has long been known that an appetitive memory in flies can only be formed by starvation. However, the authors here additionally show that not only the duration of starvation plays a role, but also determines which form of memory (short- or long-term memory) is formed. The authors demonstrated that internal glycogen stores play a role in this process and that this is achieved through insulin-like signaling in octopaminergic reward neurons that integrates internal energy stores into memory formation. Here, the authors suggest that octopamine plays a role as a negative regulator of different forms of memory.

      The study sheds light on an old question, to what extent the octopaminergic neuronal system plays a role in the formation of appetitive memory, since in recent years only the dopaminergic system has been in focus. Furthermore, the data are an interesting contribution to the ongoing debate whether insulin receptors play a role in neurons themselves or in glial cells. The experiments are very well designed and the authors used a variety of behavioural experiments, genetic tools to manipulate neuronal activity and state-of-the-art imaging techniques. In addition, they not only clearly demonstrated that octopamine is a negative regulator of appetitive memory formation, but also proposed a mechanism by which the insulin receptor in octopaminergic neurons senses the internal energy status and then controls the activity of those neurons. The conclusions are mostly supported by the data, but some aspects related to the experimental design, some explanations and literature references need more clarification and revision.

      (1) Usually, long-term memory (LTM) is tested 24 hours after training. Here, the authors usually refer to LTM as a memory that is tested 6 hours after training. The addition of a control experiment to show that LTM that the authors observe here lasts longer would increase the power of this study immensely.

      We thank the reviewer for this comment, as it helped greatly to clarify the matter.

      We measured memory of control and mutant flies 24 h after the training and included the data into the manuscript (Figure 1B and summarized in a model in Figure 2C). We show that control flies develop an intermediate type of memory, that is depending on the length of starvation either anesthesia-sensitive or resistant. Mutants lacking octopamine develop either anesthesia-sensitive or resistant long-term memory.

      (2) The authors define here another consolidated memory component as ARM, when they applied a cold-shock 2 hours after training. However, some publications showed that LTM is formed after only one training cycle (Krashes et al 2008, Tempel et al 1983). This makes it difficult to determine, whether appetitive ARM can be formed. Furthermore, one study showed that appetitive ARM is absent after massed training (Colomb et al 2009). Therefore, the conclusion could be also, that different starvation protocols, would lead to different stabilities of LTM. Therefore, additional experiments could help to clarify this opposing explanation. From these results, it can then be concluded either that different stable forms of LTM are formed depending on the starvation state, or that two differently consolidated memory phases (LTM, ARM) are formed, as has already been shown for aversive memory. This is also important for other statements in the manuscript, and therefore the authors should address this. For example, the findings about the insulin receptor (is it two opposing memories or different stabilities of LTM).

      The flies indeed develop different types of memory depending on the length of starvation and the internal energy supply.

      Reviewer #2 (Public Review):

      How organism physiological state modulates establishment and perdurance of memories is a timely question that the authors aimed at addressing by studying the interplay between energy homeostasis and food-related conditioning in Drosophila. Specifically, they studied how starvation modulates the establishment of short-term vs long-term memories and clarified the role of the monoamine Octopamine in food-related conditioning, showing that it is not per se involved in formation of appetitive short-term memories but rather gates memory formation by suppressing LTM when energy levels are high. This work clarifies previously described phenotypes and provides insight about interconnections between energy levels, feeding and formation of short-term and long-term food-related memories. In the absence of population-specific manipulation of octopamine signaling, it however does not reach a circuit-level understanding of how these different processes are integrated.

      Strengths

      • Previous studies have documented the impact of Octopamine on different aspects of food-related behaviors (regulation of energy homeostasis, feeding, sugar sensing, appetitive memory...), but we currently lack a clear understanding of how these different functions are interconnected. The authors have used a variety of experimental approaches to systematically test the impact of internal energy levels in establishment of appetitive memory and the role of Octopamine in this process.

      • The authors have used a range of approaches, performed carefully controlled experiments and produced high quality data.

      Weaknesses

      (1) In the tbh mutant flies, Tyramine -to- Octopamine conversion is inhibited, resulting not only in a lack of Octopamine, but also in elevated levels of Tyramine. If and how elevated levels of Tyramine contributes to the described phenotypes is unclear. In the current version of the manuscript, only one set of experiments (Figure 2) has been performed using Octopamine agonist. This is particularly important in light of recent published data showing that starvation modifies Tyramine levels. (2) Octopamine (and its precursor Tyramine) have been implicated in numerous processes, complicating the analysis of the phenotypes resulting from a general inhibition of tbh.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increase in octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (3) The manuscript explores various aspects of the impact of energy levels on food-related behaviors and the underlying sensing and effector mechanism, both in wild-type and tbh mutants, making it difficult to follow the flow of the results.

      We included models illustrating the results to clarify the content of the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Berger et al. study how internal energy storage influence learning and memory. Since in Drosophila melanogaster, octopamine (OA) is involved in the regulation of energy homeostasis they focus on the roles of OA. To do so they use the tyramine-β-hydroxylase (Tbh) mutant that is lacking the neurotransmitter OA and study short term memory (STM), long-term memory (LTM) and anesthesia-resistant memory (ARM). They show that the duration of starvation affects the magnitude of both short- and long-term memory. In addition, they show that OA has a suppressive effect on learning and memory. In terms of energy storage, they show that internal glycogen storage influences how long sucrose is remembered and high glycogen suppresses memory. Finally, they show that insulin-like signaling in octopaminergic neurons, which is also related to internal energy storage, suppresses learning and memory.

      This is an important study that extends our knowledge on OA activity in learning and memory and the effects the metabolic state has on learning and memory. The authors nicely use the genetic tools available in flies to try and unravel the complex circuitry of metabolic state level, OA activity and learning and memory.

      Nevertheless, I do have some comments that I think require attention:

      (1) The authors use RNAi to reduce the level of glycogen synthase or glycogen phosphorylase. These manipulations are expected to affect the level of glycogen. Using specific drivers the authors attempt to manipulate glycogen level at the muscles and fat bodies and examine how this affects learning and memory. The conclusions of the authors arise solely from the manipulation intended (i.e. the genetics). However, the authors also directly measured glycogen levels at these organs and those do not follow the manipulation intended, i.e. the RNAi had very limited effect on the glycogen level. Nevertheless, these results are ignored.

      We agreed with the reviewer and repeated the experiments. While we could not detect differences in whole animals, we detected differences in tissues enriched for muscles or fat, e.g. thorax or abdomen. We added the data.

      (2) The authors claim in the summary that OA is not required for STM. However, according to one experiment OA is required for STM as Tbh mutants cannot form STM. In another experiment OA is suppressive to STM as wt flies fed with OA cannot form STM. Therefore, it is very difficult to appreciate the actual role of OA on STM.

      During mild starvation, the internal energy supply is greater in Tbh mutants than in control flies. This information is integrated into the reward system via insulin receptor signaling. Therefore, the association between the odorant and sucrose is not meaningful to the mutants and no STM is formed. At the same time there is no release of octopamine and therefore no repression of LTM. In starved animals, octopamine suppresses food intake (we added the data). This is consistent with a function of Octopamine as a signal for the presence of food. Depending on when the signal comes, this might suppress the formation of STM or LTM.

      (3) The authors use t-test and ANOVA for most of the statistics, however, they did not perform normality tests. While I am quite sure that most datasets will pass normality test, nevertheless, this is required.

      Thanks for pointing this out. We have included a description in the “Materials and Methods” section that explains how we tested the data for normal distribution. We corrected the figure legends accordingly.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. “

      (4) While it is logical to assume that OA neurons are upstream to R15A04 DA neurons, I am not sure this really arises from the experiment that is presented here. It is well established that without activity in R15A04 DA neurons there is no LTM. Since OA acts to decrease LTM, can one really conclude anything about the location of OA effect when there is no learning?

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant.

      (5) It is unclear how expression of a dominant negative form of insulin receptor (InR) in OA neurons can rescue the lack of OA due to the Tbh mutation. If OA neurons cannot release anything to the presumably downstream DA neurons, how can changing their internal signaling has any effect?

      The expression of the dominant negative form of the insulin receptor signals no food or low energy levels and activation of the insulin receptor that there is enough food. The reward is a source of food, but the energy content is not high enough to fill the energy stores. The insulin receptor activation can activate at least three different signaling cascades, one of which might regulate octopamine release.

      While I stressed some comments that need to be addressed, the overall take-home message of the manuscript is supported and the authors do show that the metabolic state of the animal affects learning and memory. I do think though, that some more caution is required for some of the conclusions.

      We added additional data to address the points raised.

      Recommendations for the authors:

      We addressed all points raised by the reviewers, clarified the content or added more data.

      Reviewer #1 (Recommendations For The Authors):

      (1) Throughout the manuscript, the full stop of a sentence is always placed before the references.

      We fixed this.

      (2) I find the English in the manuscript not yet sufficient for publication. I suggest that the authors carefully revise the manuscript. I think if the sentences are structured a little more clearly, this paper has enormous potential to be read by your broad community.

      We agree and revised the manuscript. We hope the manuscript is now clearer.

      (3) Sentences l114 to l117 are misleading. The authors imply that they tested the same flies for changes in odor perception or sucrose sensitivity. I assume that the authors meant that they analyzed different groups of animals.

      We clarified the sentence as follows:

      “To ensure that the observed differences in learning and memory were not due to changes in odorant perception, odorant evaluation or sucrose sensitivity, different fly populations of the same genotypes were tested for their odorant acuity, odorant preference and their sucrose responsiveness (Table S1).”

      (4) In the title as well as in the abstract the influence of octopamine on appetitive memory formation is described in more detail, this is also the main focus of this study. However, in the introduction, the influence of the insulin receptor on memory formation is discussed first. Personally, I would describe this later in the manuscript, ideally in the results section. At this point in the manuscript, this leads to an interruption in the flow of reading.

      Thanks for the suggestion. We changed the order in the introduction.

      (5) The authors could consider, since they only used Drosophila melanogaster, changing "Drosophila melanogaster" to "Drosophila" throughout the manuscript.

      We modified the text accordingly.

      (6) All evaluations and statistical tests are state of the art. However, I have one comment. For each statistical test, a correction should be made depending on the number of tests. However, I could not determine whether this was also done for the parametric or non-parametric one-sample t-test. From the results and the methods section, I would guess not. Here I would recommend a Bonferroni correction or even better a Sidak-Holm correction. Furthermore, the authors could also go into more detail about which non-parametric one-sample t-test they used.

      We described the statistic used in more detail in the material and method section.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. For normal distributed data, we used the Student’s t test to compare differences between two groups and the one-way ANOVA with Tukey’s post hoc HSD test for differences between more than two groups. For nonparametric data, we used the Mann-Whitney U test for differences between two groups and for more than two groups the Kruskal-Wallis test with post hoc Duenn analysis and Bonferroni correction. The nonparametric one-sample sign test was used to analyze whether behavior was not based on random choice and differed from zero (P < 0.5). The statistical data analysis was performed using statskingdom (https://www.statskingdom.com).”

      (7) In nearly all figure legends the sentence "The letter "a" marks a significant difference from random choice as determined by a one-sample sign test (P < 0.05; P< 0.01)" occur. This is correctly indexed in the figures. However, I do not understand here what then P < 0.05; P**< 0.01 means. The significance level should be described here. I would strongly recommend the authors to make the definition clearer.

      We corrected this in the figure legends (see also above).

      (8) In Fig. 1B the labelling is a bit confusing. I interpret the two right groups as the mutants for octopamine, but there is still w[1118] in front.

      We modified the Figure 1B.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions

      (1) Assessing the contribution of Tyramine in the observed phenotypes (for example by reducing the levels of Tyramine or its specific receptor) would help understand the contribution of Tyramine in the observed phenotypes.

      See comments above.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increased octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (2) Cell-specific inhibition of octopamine receptors should thus be performed to precisely interpret the observed phenotypes and dissect how interconnected the different phenotypes are, which is the object of this publication.

      We observed that the time point and duration of octopamine application changes the behavioral output. The behavior analyzed depends on pulses of octopamine and differences of the internal energy status. A cell-specific inhibition via RNAi knock down of octopamine receptors might not clarify the issue.

      (3) Defining of streamline and progressively integrating the different observations into a unifying model would improve the clarity and flow of the manuscript.

      We included models explaining the observed results (Figure 2C and Figure 7E).

      Minor comments

      Line 129: Figure 1B should be mentioned, not 2B.

      Figure 1 legend: E should be replaced by C (after A,B).

      Figure S5: what are the arrows pointing to? Why are the Inr foci visible in A not seen in B? It should be mCD8-GFP and not mCD on top of the images.

      We fixed this.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) Can one really conclude from Figure 2A that OA acts on R15A04 DA neurons? It is well established that without activity in these DA neurons there is no LTM. Since OA acts to decrease learning, how one can conclude anything about the location of OA effect when there is no learning? With STM the situation was opposite, OA supported learning and this was abolished when DA neurons were silenced. I think some supporting experiment are required, i.e. how OA affects DA neurons activity or, alternatively, tone down a bit the writing.

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant. The inhibition of dopaminergic neurons blocks the memory of Tbh mutants. Taken together the duration of the memory, the cold-shock experiments and the inhibition of the dopaminergic neurons, Tbh develops LTM after training. This training does not evoke memory in controls.

      The loss of STM in mildly starved Tbh mutants depends on the integration of the high internal energy levels via InR signaling. Reducing the internal energy levels further by extension of starvation result in STM supporting that OA is not directly involved in the formation of STM.

      (2) Figure 4 requires some clarifications. In Supplementary Figure S2 the authors show that they could not manipulate glycogen levels in muscles. However, in Figure 4B they show that "Increasing glycogen levels in the muscles did not change short-term memory in 16 h starved flies, but the reduction in glycogen significantly improved memory strength (Figure 4B)" (lines 231-233). How can this be reconciled?

      While we could not detect differences in whole animals, we detected differences in glycogen content in body parts enriched with muscles or fat, e.g. thorax or abdomen when using UAS-GlyP-RNAi or UAS-GlyS-RNAi under the control of the respective Gal4 drivers.

      We added the data.

      Likewise, the authors write that "Increasing or decreasing glycogen levels in the fat bodies had no effect on memory performance (Figure 4C)" Line (233-234). However, in Figure S2 they show that they can only increase glycogen levels but not decrease them.

      As explained above the conclusion of Figure 4 "Thus, low levels of glycogen in the muscles upon starvation positively influence appetitive short-term memory, while high levels of glycogen in the muscles and fat body reduce short-term memory" lines 245-246, is not supported by the direct measurements of glycogen presented in Figure S2.

      We added the data showing that the reduction or increase can be measured when analyzing the specific body parts enriched in muscles tissue or fat tissue.

      (3) In cases where mutant flies do not display learning, a control should be done to see if they ate the sugar (with dye). Especially since the genetic manipulation affects metabolism.

      We analyzed how much sucrose the animals consumed in the behavioral test. Tbh and controls fed and there was no difference in feeding behavior between the mutants and the controls.

      “We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies. “

      (4) The use of t-test requires the data to be normally distributed. If I am not mistaken this was not demonstrated for any of the datasets used. I did a quick check on one of the datasets provided in the excel sheet and it is normally distributed. Therefore, please add normality test for all data sets. If some do not pass normality, please use a suitable non-parametric test.

      We added normality test to all data sets and used non-parametric tests for non-normal distributed data. We clarify this in the material and method section and the figure legends.

      (5) The authors show that OA suppresses also STM. This result is in contradiction to previous published results. This by itself is not a problem. However, this result also seems to me in contradiction to the authors own results. According to Figure 1B, OA is required for STM as it absence in the tbh mutant results in loss of STM. According to Figure 2C, OA is reducing STM as wt flies fed with OA just prior to learning do not form STM. This appears in other places in the manuscript as well.

      In addition, in the text lines 178-180, the authors write "A short pulse of octopamine before the training inhibits the STM. Thus, octopamine is a negative regulator of appetitive dopaminergic neuron-dependent long-term memory and can block STM." But in the summary they write "Octopamine is not required for short-term memory, since octopamine deficient mutants form appetitive short-term memory to sucrose and to other nutrients depending on the internal energy status." So, the take-home message regarding OA and STM is unclear.

      The authors need to better clarify this point.

      We clarified these points. See comments above. The loss of memory in Tbh mutants is not due to loss of octopamine, but increased energy levels that changes the reward properties of sucrose.

      (6) The manuscript is very difficult to follow. The authors constantly change between 16 and 40 hours starvation, short term memory, 3 hour memory and 6 hour memory. I think it would have been better to have a more focused manuscript. However, if this is not possible, I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other. Also, perhaps add to each figure a panel describing exactly the experimental conditions. I think also simplifying the text and adding more conclusions throughout the results section will help the readers to follow. Finally, I think that it would help understanding the conclusions if the authors can add a diagram of the flow that they think occurs. For example, the authors show that glycogen suppresses learning as its reduction increases learning. They also show that InR activity receptor suppresses learning as its KD also increases learning. If I am not mistaken the link between the two is not straight forward (but I may be wrong here). A diagram of the flow would be very helpful.

      We prepared diagrams summarizing and explaining the results.

      Minor

      (1) I may not have understood correctly as I am not sure that I found Table S1.

      Also, there was no legend for Table S1.

      Nevertheless, if I understood correctly, the authors write that "Before the experiments, flies were tested to determine whether they perceived the odorants, preferred one odorant over other and responded to the reward similarly to ensure that the observed differences in behavior were not due to changes in odorant perception or sucrose sensitivity (Table S1)." However, according to the Table that I found it seems that following 40h starvation wt flies show preference to OCT whereas this does not occur for the mutant. Also, it seems that at 16h the mutant has a much higher preference to the odors than after 40h. This is a bit odd. I am also not sure what the balance value refers to. Finally, the mutant shows really low 2M sucrose preference after 40h. In general, this set of experiments requires a bit more explanation.

      I think it is better to show these experiments using graphs and add this to the supplementary figures.

      We clarified the experiments in the result section as follows and added an explanation to the material and method section. We tested the odorant acuity and sucrose preference for all genotypes used in the manuscript and added the data to the Table S1.

      “The flies of the different genotypes sensed the odorants and evaluated them as similar salient in comparison. This is important to a avoid a bias in the situation where flies have to choose between the two odorants after training. They also sensed sucrose. We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies.”

      (2) Line 129 should be Figure 1B

      Is corrected.

      (3) Line 133, Figure 1C, how can one explain the negative reinforcement? I can understand no reinforcement, but negative?

      The effect of glucose might be doses dependent. 0.15 M sucrose is a much closer to a realistic concentration found in fruits than 2 M sucrose and might therefore elicit aversion. When animals are starved enough they might find any food source attractive, even when the concentrations of sucrose is unrealistic.

      (4) Figure 1, why are the graphs different between panel B and C?

      Is corrected.

      (5) In Figure S1, are the TβhnM18 groups differ significantly from zero? I think they are, so better to state this somewhere. If not, the claims in lines 134-135 are not supported by the data.

      We added the significance and added the data to Figure 1.

      Figure S1 legend: there is no A panel. Also "below box blots" should be box plots.

      Thanks for pointing that out. We corrected it.

      (6) It is not clear what is the duration of starvation used in Figure 2A. I assume that 16h and sucrose 2M used were used, but I would state that explicitly.

      We added the information to the figure legends.

      (7) Figure 2A is missing a control of flies with both the driver and UAS shibirets at the permissive temperature.

      We added the controls to the supplement (Figure S1).

      (8) It seems to me that Figure 3B, in which the author state that "Only after 40 h of starvation did TβhnM18 mutants show a similar preference to control sucrose consumption" (line 198) is somewhat in contradiction to Table S1 in which I see Sucrose preference for wt 0.36 and for tbh 0.17. I think this comment arise because I did not understand Table S1 correctly, so please better explain.

      We rewrote this section.

      (9) In Figure 3C, consider not using std as this stands for standard deviation and may be confusing.

      We now use the term “food” instead of “std” and explained in the legend that food means standard fly food.

      We fixed this.

      (10) Please check the Supplementary Figures. I think Figures S2 and S3 are switched.

      We fixed this.

      (11) There is a mistake in Figure S3A. The right column should have another "+" sign.

      Thanks, we fixed this.

      (12) I am somewhat puzzled by Figures 4 and 5. If I understand correctly figure 4B w1118 mef2-G4 is exactly the same experiment as Figure 5A w1118 mef2-G4 and yet in Figure 4B performance index is 0.2 and in Figure 5A about 0.4. According to other comparisons it seems to me that these will be significantly different and yet it is the same experiment.

      They are two independent experiments done at different times. The controls were independently repeated.

      (13) Line 273 should be Figure 5C.

      Is corrected.

      (14) I don't think this is a correct sentence "Virgin females remembered sucrose significantly better than mated females." Line 274.

      Reads now:

      “Virgin females remembered the odorant paired with sucrose significantly better than mated females.”

      (15) Line 340 there is no Figure 1E

      Is fixed (1 C)

      (16) The data excel file is difficult to follow. In Figure 2 there are references to Figure 5. The graphs are pointing to other files. Text is not always in English. It is not clear what W stands for. I recommend making it more accessible.

      We corrected the data excel files.

      (17) The manuscript is difficult to follow. I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other.

      We improved the data presentation by

      a) adding a model showing the kinetics of memory formation in controls and mutants (Figure 2C)

      b) a model explaining how the internal state is integrated into the formation of memory (Figure 7D).

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study reports important evidence that infants' internal factors guide children's attention and that caregivers respond to infants' attentional shifts during caregiver-infant interactions. The authors analyzed EEG data and multiple types of behaviors using solid methodologies that can guide future studies of neural responses during social interaction in infants. However, the analysis is incomplete, as several methodological choices need more adequate justification.

      Reviewer #1

      Public Review:

      The authors bring together multiple study methods (brain recordings with EEG and behavioral coding of infant and caregiver looking, and caregiver vocal changes) to understand social processes involved in infant attention. They test different hypotheses on whether caregivers scaffold attention by structuring a child's behavior, versus whether the child's attention is guided by internal factors and caregivers then respond to infants' attentional shifts. They conclude that internal processes (as measured by brain activation preceding looking) control infants' attention, and that caregivers rapidly modify their behaviors in response to changes in infant attention.

      The study is meticulously documented, with cutting-edge analytic approaches to testing alternative models; this type of work provides a careful and well-documented guide for how to conduct studies and process and analyze data for researchers in the relatively new area of neural response in infants in social contexts.

      We are very pleased that R1 considers our work an important contribution to this developing field, and we hope that we have now addressed their concerns below.

      Some concerns arise around the use of terms (for example, an infant may "look" at an object, but that does not mean the infant is actually "attending); collapsing of different types of looks (to people and objects), and the averaging of data across infants that may mask some of the individual patterns.

      We thank the reviewer for this feedback and their related comments below, and we feel that our manuscript is much stronger as a result of the changes we have made. Please see blow for a detailed description of our rationale for defining and analysing the attention data, as well as the textual changes made in response to the author’s comments.

      Recommendations For The Authors

      This paper is rigorous in method, theoretically grounded, and makes an important contribution to understanding processes of infant attention, brain activity, and the reciprocal temporal features of caregiver-infant interactions. The alternative hypothesis approach sets up the questions well (although authors should temper any wording that suggests attention processes are one or the other. That is, certain bouts of infant attention can be guided by exogenous factors such as social input, and others be endogenous; so averaging across all bouts can actually mask the variation in these patterns). I appreciated the focus on multiple types of behavior (e.g., gaze, vocal fluctuations in maternal speech); the emphasis on contingent responding; and the very clear summaries of takeaways after each section. Furthermore, methods and analyses are well described, details on data processing and so on are very thorough, and visualizations aptly facilitate data interpretation. However, I am not an expert on infant neural responses in EEG and assume that a reviewer with such expertise will weigh in on the treatment and quality of the data; therefore, my comments should be interpreted in light of this lack of knowledge.

      We thank R1 for these very positive and insightful comments on our analyses which are the result of a number of years of methodological and technical developmental work.

      We do agree with R1 that we should more carefully word parts of our argument in the Introduction to make clear the fact that shifts in infant attention could be driven by a combination of interactive and endogenous influences. As a result of this comment, we have made direct changes to parts of the Introduction; removing any wording that suggests that these processes are ‘alternative’ or ‘separate’, and our overall aim states: ‘Here, recording EEG from infants during naturalistic interactions with their caregiver, we examined the (inter)-dependent influences of infants’ endogenous oscillatory neural activity, and inter-dyadic behavioural contingencies in organising infant attention’.

      Examining variability between infant attention episodes in the factors that influence the length and timing of the attention episode is an important area for future investigation. We now include a discussion on this on page 38 of the Discussion section, with suggestions for how this could be examined. Investigating different subtypes of infant attention is methodologically challenging, given the number of infant behaviours that would need to inform such an analysis- all of which are time consuming to code. Developing automated methods for performing these kinds of analyses is an important avenue for future work.

      Here, I review various issues that require revision or elaboration based on my reading of what I consider to otherwise be a solid and important research paper.

      Problem in the use of the term attention scaffolding. Although there may be literature precedent in the use of this term, it is problematic to narrowly define scaffolding as mother-initiated guidance of attention. A mother who responds to infant behaviors, but expands on the topic or supports continued attention, and so on, is scaffolding learning to a higher level. I would think about a different term because it currently implies a caregiver as either scaffolding OR responding contingently. It is not an either-or situation in conceptual meaning. In fact, research on social contingency (or contingent responsiveness), often views the follow-in responding as a way to scaffold learning in an infant.

      Yes, we agree with R1 that the term ‘attention scaffolding’ could be confusing given the use of this term in previous work conducted with children and their caregivers in problem-solving tasks, that emphasise modulations in caregiver behaviour as a function of infant behaviour. As a result of this suggestion, we have made direct edits to the text throughout, replacing the term attentional scaffold with terms such as ‘organise’ and ‘structure’ in relation to the caregiver-leading or ‘didactic’ perspective, and terms such as ‘contingent responding’ and ‘dynamic modulation’ in relation to the caregiver-following perspective. We feel that this has much improved the clarity of the argument in the Introduction and Discussion sections.

      Do individual data support the group average trends? My concern with unobservable (by definition) is that EEG data averages may mask what's going on in individual brain response. Effects appear to be small as well, which occurs in such conditions of averaging across perhaps very variable response patterns. In the interest of full transparency and open science, how many infants show the type of pattern revealed by the average graph (e.g., do neural markers of infant engagement forward predict attention for all babies? Majority?). Non-parametric tests on how many babies show a claimed pattern would offer the litmus test of significance on whether the phenomenon is robust across infants or pulled by a few infants with certain patterns of data. Ditto for all data. This would bolster my confidence in the summaries of what is going on in the infant brain. (The same applies as I suggest to attention bouts. To what extent does the forward-predict or backward-predict pattern work for all bouts, only some bouts, etc.?). I recognize that to obtain power, summaries are needed across infants and bouts, but I want to know if what's being observed is systematic.

      We thank R1 for this comment and understand their concern that the overall pattern of findings reported in relation to the infants’ EEG data might obscure inter-individual variability in the associations between attention and theta power. Averaging across individual participant EEG responses is, however, the gold standard way to perform both event-locked (Jones et al., 2020) and continuous methods (Attaheri et al., 2020) of EEG analysis that are reported in the current manuscript. EEG data, and, in particular, naturalistic EEG data is inherently noisy, and averaging across participants increases the signal to noise ratio (i.e. inconsistent, and, therefore, non-task-related activity is averaged out of the response (Cohen, 2014; Noreika et al., 2020)). Examining individual EEG responses is unlikely to tell us anything meaningful, given that, if a response is not found for a particular participant, then it could be that the response is not present for that participant, or that it is present, but the EEG recording for that participant is too noisy to show the effect. Computing group-level effects, as is most common in all neuroimaging analyses, is, therefore, most optimal to examining our main research questions.

      The findings reported in this analysis also replicate previous work conducted by our lab which showed that infant attention to objects significantly forward-predicted increases in infant theta activity during joint table-top play with their caregiver, involving one toy object (compared to our paradigm which involved 3;Wass et al., 2018). More recent work conducted by our lab has also shown continuous and time-locked associations between infant look durations and infant theta activity when infants play with objects on their own (Perapoch Amadó et al., 2023). To reassure readers of the replicability of the current findings, we now reference the Wass et al. (2018) study at the beginning of the Discussion section.

      Could activity artifacts lead to certain reported trends? Babies typically look at an object before they touch or manipulate the object, and so longer bouts of attention likely involve a look and then a touch for lengthier time frames. If active involvement with an object (touching for example) amplifies theta activity, that may explain why attention duration forward predicts theta power. That is, baby looks, then touches, then theta activates, and coding would show visual gaze preceding the theta activation. Careful alignment of infants' touches and other such behaviors with the theta peak might help address this question, again to lend confidence to the robustness of the interpretation.

      Yes, again this is a very important point, and the removal of movement-related artifact is something we have given careful attention to in the analysis of our naturalistic EEG data (Georgieva et al., 2020; Marriott Haresign et al., 2021). As a result of this comment we have made direct changes to the Results section on page 18 to more clearly signal the reader to our EEG pre-processing section before presenting the results of the cross-correlation analyses.

      As we describe in the Methods section of the main text, movement-related artifacts are removed from the data with ICA decomposition, utilising an automatic-rejection algorithm, specially designed for work with our naturalistic EEG data (Marriott Haresign et al., 2021). Given that ICA rejection does not remove all artifact introduced to the EEG signal, additional analysis steps were taken to reduce the possibility that movement artifacts influenced the results of the reported analyses. As explained in the Methods section, rather than absolute theta power, relative theta was used in all EEG analyses, computed by dividing the power at each theta frequency by the summed power across all frequencies. Eye and head movement-related artifacts most often associate with broadband increases in power in the EEG signal (Cohen, 2014): computing relative theta activity therefore further reduces the potential influence of artifact on the EEG signal.

      It is also important to highlight that previous work examining movement artifacts in controlled paradigms with infants has shown that limb movements actually associate with a decrease in power at theta frequencies, compared to rest (Georgieva et al., 2020). It is therefore unlikely that limb movement artifacts explain the pattern of association observed between theta power and infant attention in the current study.

      That said, examining the association between body movements and fluctuations in EEG activity during naturalistic interactions is an important next step, and something our lab is currently working on. Given that touching an object is most often the end-state of a larger body movement, aligning the EEG signal to the onset of infant touch is not all that informative to understanding how body movements associate with increases and decreases in power in the EEG signal. Our lab is currently working on developing new methods using motion tracking software and arousal composites to understand how data-derived behavioural sub-types associate with differential patterns of EEG activity.

      The term attention may be misleading. The behavior being examined is infant gaze or looks, with the assumption that gaze is a marker of "attention". The authors are aware that gaze can be a blank stare that doesn't reflect underlying true "attention". I recommend substitution of a conservative, more precise term that captures the variable being measured (gaze); it would then be fine to state that in their interpretation, gaze taken as a marker for attention or something like that. At minimum, using term "visual attention" can be a solution if authors do not want to use the precise term gaze. As an example, the sentence "An attention episode was defined as a discrete period of attention towards one of the play objects on the table, or to the partner" should be modified to defined as looking at a play object or partner.

      We thank the reviewer for this comment, and we understand their concern with the use of the term ‘attention’ where we are referring to shifts in infant eye gaze. However, the use of this term to describe patterns of infant gaze, irrespective of whether they are ‘actually attending’ or not is used widely in the literature, in both interactive (e.g. Yu et al., 2021) and screen-based experiments examining infant attention (Richards, 2010). We therefore feel that its use in our current manuscript is acceptable and consistent with the reporting of similar interaction findings. On page 39 of the Discussion we now also include a discussion on how future research might further investigate differential subtypes of infant looks to distinguish between moments where infants are attending vs. just looking.

      Why collapse across gaze to object vs. other? Conceptually, it's unclear why the same hypotheses and research questions on neural-attention (i.e., gaze in actuality) links would apply to looks to a mom's face or to an object. Some rationale would be useful to the reader as to why these two distinct behaviors are taken as following the same principles in ordering of brain and behavior. Perhaps I missed something, however, because later in the Discussion the authors state that "fluctuations in neural markers of infants' engagement or interest forward-predict their attentiveness towards objects", which suggests there was an object-focused variable only? Please clarify. (Again, sorry if I missed something).

      This is a really important point, and we agree with R1 that it could have been more clearly expressed in our original submission – for which, we apologise. In the cross-correlation analyses conducted in parts 2 and 3 which examines forwards-predictive associations between infant attention durations and infant endogenous oscillatory activity (part two), and caregiver behaviour (part three), as R1 describes, we include all infant looks towards objects and their partner. Including all infant look types is necessary to produce a continuous variable to cross-correlate with the other continuous variables (e.g. theta activity, caregiver vocal behaviours), and, therefore, does not concentrate only on infant attention episodes towards objects.

      We take the reviewers’ point that different attention and neural mechanisms may be associated with looks towards objects vs. the partner, which we now acknowledge directly on page 10 of the Introduction. However, our focus here is on the endogenous and interactive mechanisms that drive fluctuations in infant engagement with the ongoing, free-flowing interaction. Indeed, previous work has shown increases in theta activity during sustained episodes of infant attention to a range of different stimuli, including cartoon videos (Xie et al., 2018), real-life screen-based interactions (Jones et al., 2020), as well as objects (Begus et al., 2016). In the second half of part 2, we go on to address the endogenous processes that support infant attention episodes specifically towards objects.

      As a result of this comment, we have made direct changes to the Introduction on page 10 to more clearly explain the looking behaviours included in the cross-correlation analysis, and the rationale behind the analysis being conducted in this way – which is different to the reactive analyses conducted in the second half of parts one and three, which examines infant object looks only. Direct edits to the text have also been made throughout the Results and Methods sections as a result of this comment, to more clearly specify the types of looks included in each analysis. Now, where we discuss the cross-correlation analyses we refer only to infant ‘attention durations’ or infant ‘attention’, whilst ‘object-directed attention’ and ‘looks towards objects’ is clearly specified in sections discussing the reactive analyses conducted in parts 2 and 3. We have also amended the Discussion on page 31so that the cross-correlation analyses is interpreted relative to infant overall attention, rather than their attention towards objects only.

      Why are mothers' gazes shorter than infants' gazes? This was the flip of what I'd expect, so some interpretation would be useful to understanding the data.

      This is a really interesting observation. Our findings of the looking behaviour of caregivers and infants in our joint play interactions actually correspond to much previous micro-dynamic analysis of caregiver and infant looking behaviour during early table-top interactions (Abney et al., 2017; Perapoch Amadó et al., 2023; Yu & Smith, 2013, 2016). The reason for the shorter look durations in the adult is due to the fact that the caregivers alternate their gaze between their infant and the objects (i.e. they spend a lot of the interaction time monitoring their infants’ behaviours). This can be seen in Figure 2 (see main text) which shows that caregiver looks are divided between looks to their infants and looks towards objects. In comparison, infants spend most of their time focussing on objects (see Figure 2, main text), with relatively infrequent looks to their caregiver. As a result, infant looks are, overall, longer in comparison to their caregivers’.

      Minor points

      Use the term association or relation (relationships is for interpersonal relationships, not in statistics).

      This has now been amended throughout.

      I'm unsure I'd call the interactions "naturalistic" when they occur at a table, with select toys, EEG caps on partners, and so on. The term seems more appropriate for studies with fewer constraints that occur (for example) in a home environment, etc.

      We understand R1s concern with our use of the term ‘naturalistic’ to refer to the joint play interactions that we analyse in the current study. However, we feel the term is appropriate, given that the interactions are unstructured: the only instruction given to caregivers at the beginning of the interaction is to play with their infants in the way that they might do at home. The interactions, therefore, measure free-flowing caregiver and infant behaviours, where modulations in each individual’s behaviour are the result of the intra- and inter-individual dynamics of the social exchange. This is in comparison to previous work on early infant attention development which has used more structured designs, and modulations in infant behaviour occur as a result of the parameters of the experimental task.

      Reviewer #2

      Public Review

      Summary:

      This paper acknowledges that most development occurs in social contexts, with other social partners. The authors put forth two main frameworks of how development occurs within a social interaction with a caregiver. The first is that although social interaction with mature partners is somewhat bi-directional, mature social partners exogenously influence infant behaviors and attention through "attentional scaffolding", and that in this case infant attention is reactive to caregiver behavior. The second framework posits that caregivers support and guide infant attention by contingently responding to reorientations in infant behavior, thus caregiver behaviors are reactive to infant behavior. The aim of this paper is to use moment-to-moment analysis techniques to understand the directionality of dyadic interaction. It is difficult to determine whether the authors prove their point as the results are not clearly explained as is the motivation for the chosen methods.

      Strengths

      The question driving this study is interesting and a genuine gap in the literature. Almost all development occurs in the presence of a mature social partner. While it is known that these interactions are critical for development, the directionality of how these interactions unfold in real-time is less known.

      The analyses largely seem to be appropriate for the question at hand, capturing small moment-to-moment dynamics in both infant and child behavior, and their relationships with themselves and each other. Autocorrelations and cross-correlations are powerful tools that can uncover small but meaningful patterns in data that may not be uncovered with other more discretized analyses (i.e. regression).

      We are pleased that R2 finds our work to be an interesting contribution to the field, which utilises appropriate analysis techniques.

      Weaknesses

      The major weakness of this paper is that the reader is assumed to understand why these results lead to their claimed findings. The authors need to describe more carefully their reasoning and justification for their analyses and what they hope to show. While a handful of experts would understand why autocorrelations and cross-correlations should be used, they are by no means basic analyses. It would also be helpful to use simulated data or even a simple figure to help the reader more easily understand what a significant result looks like versus an insignificant result.

      We thank the reviewer for this comment, and we agree that much more detail should be added to the Introduction section. As a result of this comment, we have made direct changes to the Introduction on pages 9-11 to more clearly detail these analysis methods, our rationale for using these methods; and how we expect the results to further our understanding of the drivers of infant attention in naturalistic social interactions.

      We also provide a figure in the SM (Fig. S6) to help the reader more clearly understand the permutation method used in our statistical analyses described in the Methods, on page 51, which depicts significant vs. insignificant patterns of results against their permutation distribution.

      While the overall question is interesting the introduction does not properly set up the rest of the paper. The authors spend a lot of time talking about oscillatory patterns in general but leave very little discussion to the fact they are using EEG to measure these patterns. The justification for using EEG is also not very well developed. Why did the authors single out fronto-temporal channels instead of using whole brain techniques, which are more standard in the field? This is idiosyncratic and not common.

      We very much agree with R2 that the rationale and justification for using EEG to understand the processes that influence infants’ attention patterns is under-developed in the current manuscript. As a result of this comment we have made direct edits to the Introduction section of the main text on pages 7-8 to more clearly describe the rationale for examining the relationship between infant EEG activity and their attention during the play interactions with their caregivers.

      As we describe in the Introduction section, previous behavioural work conducted with infants has suggested that endogenous cognitive processes (i.e. fluctuations in top-down cognitive control) might be important in explaining how infants allocate their attention during free-flowing, naturalistic interactions towards the end of the first year. Oscillatory neural activity occurring at theta frequencies (3-6Hz), which can be measured with EEG, has previously been associated with top-down intrinsically guided attentional processes in both adulthood and infancy (Jones et al., 2020; Orekhova, 1999; Xie et al., 2018). Measuring fluctuations in infant theta activity therefore provides a method to examine how endogenous cognitive processes structure infant attention in naturalistic social interactions which might be otherwise unobservable behaviourally.

      It is important to note that the Introduction distinguishes between two different oscillatory mechanisms that could possibly explain the organisation of infant attention over the course of the interaction. The first refers to oscillatory patterns of attention, that is, consistent attention durations produced by infants that likely reflect automatic, regulatory functions, related to fluctuations in infant arousal. The second mechanism is oscillatory neural activity occurring at theta frequencies, recorded with EEG, which, as mentioned above, is thought to reflect fluctuations in intrinsically guided attention in early infancy. We have amended the Introduction to make the distinction between the two more clear.

      A worrisome weakness is that the figures are not consistently formatted. The y-axes are not consistent within figures making the data difficult to compare and interpret. Labels are also not consistent and very often the text size is way too small making reading the axes difficult. This is a noticeable lack of attention to detail.

      This has now been adjusted throughout, where appropriate.

      No data is provided to reproduce the figures. This does not need to include the original videos but rather the processed and de-identified data used to generate the figures. Providing the data to support reproducibility is increasingly common in the field of developmental science and the authors are greatly encouraged to do so.

      This will be provided with the final manuscript.

      Minor Weaknesses

      Figure 4, how is the pattern in a not significant while in b a very similar pattern with the same magnitude of change is? This seems like a spurious result.

      The statistical analysis conducted for all cross-correlation analyses reported follows a rigorous and stringent permutation-based temporal clustering method which controls for family-wise error rate using a non-parametric Monte Carlo method (see Methods in the main text for more detail). Permutations are created by shuffling data sets between participants and, therefore, patterns of significance identified by the cluster-based permutation analysis will depend on the mean and standard deviation of the cross-correlations in the permutation distribution. Fig. S6 now depicts the cross-correlations against their permutation distributions which should help readers to understand the patterns of significance reported in the main text.

      The correlations appear very weak in Figures 3b, 5a, 7e. Despite a linear mixed effects model showing a relationship, it is difficult to believe looking at the data. Both the Spearman and Pearson correlations for these plots should be clearly included in the text, figure, or figure legend.

      We thank the reviewer for this comment, and agree that reporting the correlations for these plots would strengthen the findings of the linear mixed effects models reported in text. As a result, we have added both Spearman and Pearson correlations to the legends of Figures 3b, 5a and 7e, corresponding to the statistically significant relationships examined in the linear mixed effects models. The strength of the relationships are entirely consistent with those documented in other previous research that used similar methods (e.g. Piazza et al., 2018). How strong the relationship looks to the observer is entirely dependent on the graphical representation chosen to represent it. We have chosen to present the data in this way because we feel that it is the most honest way to represent the statistically significant, and very carefully analysed, effects that we have observed in our data.

      Linear mixed effects models need more detail. Why were they built the way they were built? I would have appreciated seeing multiple models in the supplementary methods and a reasoning to have landed on one. There are multiple ways I can see this model being built (especially with the addition of a random intercept). Also, there are methods to test significance between models and aid in selection. That being said, although participant identity is a very common random effect, its use should be clearly stated in the main text.

      We very much agree with R2 that the reporting of the linear mixed effects models needs more detail and this has now been added to the Method section (page 54). Whilst it is true that there are multiple ways in which this model could be built, given the specificity of our research questions, regarding the reactive changes in infant theta activity and caregiver behaviours that occur after infant look onsets towards objects (see pages 9-11 of the Introduction), we take a hypothesis driven approach to building the linear mixed effects models. As a result, random intercepts are specified for participants, as well as uncorrelated by-participant random slopes (Brown, 2021; Gelman & Hill, 2006; Suarez-Rivera et al., 2019). In this way, infant look durations are predicted from caregiver behaviours (or infant theta activity), controlling for between participant variability in look durations, as well as the strength of the effect of caregiver behaviours (or infant theta activity) on infant look durations.

      Some parentheses aren't closed, a more careful re-reading focusing on these minor textual issues is warranted.

      This has now been corrected.

      Analysis of F0 seems unnecessarily complex. Is there a reason for this?

      Computation of the continuous caregiver F0 variable may seem complex but we feel that all analysis steps are necessary to accurately and reliably compute this variable in our naturalistic, noisy and free-flowing interaction data. For example, we place the F0 only into segments of the interaction identified as the mum speaking so that background noises and infant vocalisations are not included in the continuous variable. We then interpolate through unvoiced segments (similar to Räsänen et al., 2018), and compute the derivative in 1000ms intervals as a measure of the rate of change. The steps taken to compute this variable have been both carefully and thoughtfully selected given the many ways in which this continuous rate of change variable could be computed (cf. Piazza et al., 2018; Räsänen et al., 2018).

      The choice of a 20hz filter seems odd when an example of toy clacks is given. Toy clacks are much higher than 20hz, and a 20hz filter probably wouldn't do anything against toy clacks given that the authors already set floor and ceiling parameters of 75-600Hz in their F0 extraction.

      We thank the reviewer for this comment and we can see that this part of the description of the F0 computation is confusing. A 20Hz low pass filter is applied to the data stream after extracting the F0 with floor and ceiling parameters set between 75-600Hz. The 20Hz filter therefore filters modulations in the caregivers’ F0 that occur at a modulation frequency greater than 20Hz. The 20Hz filter does not, therefore, refer to the spectral filtering of the speech signal. The description of this variable has been rephrased on page 48 of the main text.

      Linear interpolation is a choice I would not have made. Where there is no data, there is no data. It feels inappropriate to assume that the data in between is simply a linear interpolation of surrounding points.

      The choice to interpolate where there was no data was something we considered in a lot of detail, given the many options for dealing with missing data points in this analysis, and the difficulties involved with extracting a continuous F0 variable in our naturalistic data sets. As R2 points out, one option would be to set data points to NaN values where no F0 is detected and/ or the Mum is not vocalising. A second option, however, would be to set the continuous variable to 0s where no F0 is detected and/ or the Mum is not vocalising (where the mum is not producing sound there is no F0 so rather than setting the variable to missing data points, really it makes most objective sense to set to 0).

      Either of these options (setting parts where no F0 is detected to NaN or 0) makes it difficult to then meaningfully compute the rate of change in F0: where NaN values are inserted, this reduces the number of data points in each time window; where 0s are inserted this creates large and unreal changes in F0. Inserting NaN values into the continuous variable also reduces the number of data points included in the cross-correlation and event-locked analyses. It is important to note that, in our naturalistic interactions, caregivers’ vocal patterns are characterised by lots of short vocalisations interspersed by short pauses (Phillips et al., in prep), similar to previous findings in naturalistic settings (Gratier et al., 2015). Interpolation will, therefore, have largely interpolated through the small pauses in the caregiver’s vocalisations.

      The only limitation listed was related to the demographics of the sample, namely saying that middle class moms in east London. Given that the demographics of London, even east London are quite varied, it's disappointing their sample does not reflect the community they are in.

      Yes we very much agree with R2 that the lack of inclusion of caregivers from wider demographic backgrounds is disappointing, and something which is often a problem in developmental research. Our lab is currently working to collect similar data from infants with a family history of ADHD, as part of a longitudinal, ongoing project, involving families from across the UK, from much more varied demographic backgrounds. We hope that the findings reported here will feed directly into the work conducted as part of this new project.

      That said, demographic table of the subjects included in this study should be added.

      This is now included in the SM, and referenced in the main text.

      References

      Abney, D. H., Warlaumont, A. S., Oller, D. K., Wallot, S., & Kello, C. T. (2017). Multiple Coordination Patterns in Infant and Adult Vocalizations. Infancy, 22(4), 514–539. https://doi.org/10.1111/infa.12165

      Attaheri, A., Choisdealbha, Á. N., Di Liberto, G. M., Rocha, S., Brusini, P., Mead, N., Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., Grey, C., Flanagan, S., & Goswami, U. (2020). Delta- and theta-band cortical tracking and phase-amplitude coupling to sung speech by infants [Preprint]. Neuroscience. https://doi.org/10.1101/2020.10.12.329326

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants’ preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397–12402. https://doi.org/10.1073/pnas.1603261113

      Brown, V. A. (2021). An Introduction to Linear Mixed-Effects Modeling in R.

      Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. The MIT Press.

      Gelman, A., & Hill, J. (2006). In Data Analysis using Regression and mulilevel/Hierachical Models. Cambridge University Press.

      Georgieva, S., Lester, S., Noreika, V., Yilmaz, M. N., Wass, S., & Leong, V. (2020). Toward the Understanding of Topographical and Spectral Signatures of Infant Movement Artifacts in Naturalistic EEG. Frontiers in Neuroscience, 14, 352. https://doi.org/10.3389/fnins.2020.00352

      Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., & Parlato-Oliveira, E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01167

      Jones, E. J. H., Goodwin, A., Orekhova, E., Charman, T., Dawson, G., Webb, S. J., & Johnson, M. H. (2020). Infant EEG theta modulation predicts childhood intelligence. Scientific Reports, 10(1), 11232. https://doi.org/10.1038/s41598-020-67687-y

      Marriott Haresign, I., Phillips, E., Whitehorn, M., Noreika, V., Jones, E. J. H., Leong, V., & Wass, S. V. (2021). Automatic classification of ICA components from infant EEG using MARA. Developmental Cognitive Neuroscience, 52, 101024. https://doi.org/10.1016/j.dcn.2021.101024

      Noreika, V., Georgieva, S., Wass, S., & Leong, V. (2020). 14 challenges and their solutions for conducting social neuroscience and longitudinal EEG research with infants. Infant Behavior and Development, 58, 101393. https://doi.org/10.1016/j.infbeh.2019.101393

      Orekhova, E. (1999). Theta synchronization during sustained anticipatory attention in infants over the second half of the first year of life. International Journal of Psychophysiology, 32(2), 151–172. https://doi.org/10.1016/S0167-8760(99)00011-2

      Perapoch Amadó, M., Greenwood, E., James, Labendzki, P., Haresign, I. M., Northrop, T., Phillips, E., Viswanathan, N., Whitehorn, M., Jones, E. J. H., & Wass, S. (2023). Naturalistic attention transitions from subcortical to cortical control during infancy. [Preprint]. Open Science Framework. https://doi.org/10.31219/osf.io/6z27a

      Piazza, E. A., Hasenfratz, L., Hasson, U., & Lew-Williams, C. (2018). Infant and adult brains are coupled to the dynamics of natural communication [Preprint]. Neuroscience. https://doi.org/10.1101/359810

      Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206. https://doi.org/10.1016/j.cognition.2018.05.015

      Richards, J. E. (2010). The development of attention to simple and complex visual stimuli in infants: Behavioral and psychophysiological measures. Developmental Review, 30(2), 203–219. https://doi.org/10.1016/j.dr.2010.03.005

      Suarez-Rivera, C., Smith, L. B., & Yu, C. (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental Psychology, 55(1), 96–109. https://doi.org/10.1037/dev0000628

      Wass, S. V., Noreika, V., Georgieva, S., Clackson, K., Brightman, L., Nutbrown, R., Covarrubias, L. S., & Leong, V. (2018). Parental neural responsivity to infants’ visual attention: How mature brains influence immature brains during social interaction. PLOS Biology, 16(12), e2006328. https://doi.org/10.1371/journal.pbio.2006328

      Xie, W., Mallin, B. M., & Richards, J. E. (2018). Development of infant sustained attention and its relation to EEG oscillations: An EEG and cortical source analysis study. Developmental Science, 21(3), e12562. https://doi.org/10.1111/desc.12562

      Yu, C., & Smith, L. B. (2013). Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination. PLoS ONE, 8(11), e79659. https://doi.org/10.1371/journal.pone.0079659

      Yu, C., & Smith, L. B. (2016). The Social Origins of Sustained Attention in One-Year-Old Human Infants. Current Biology, 26(9), 1235–1240. https://doi.org/10.1016/j.cub.2016.03.026

      Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences, 118(52), e2107019118. https://doi.org/10.1073/pnas.2107019118

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their appreciation of our study and thoughtful comments. In response to the main concern raised by all reviewers regarding the potential influences of external noise factors on intuitive inference, such as external disturbances or imperfect observations, we have conducted three new experiments suggested by the reviewers. These experiments were designed to: (1) assess the influence of external forces on humans’ judgments by implementing a wall to block wind disturbances from one direction, (2) examine human accuracy in predicting the landing position of a falling ball when its trajectory is obscured, and (3) evaluate the effect of object geometry on human judgment of stability. The findings from these experiments consistently support our proposal of the stochastic world model on gravity embedded in human mind. Besides, we have also addressed the rest comments from the reviewers in a one-by-one fashion.

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review, I did not find it entirely convincing that the study shows evidence for a Gaussian understanding of gravity. There are two studies that would bolster this claim: 1. Replicate experiment 1, but also ask people to infer whether there was a hidden force. If people are truly representing gravity as proposed in the paper, you should get no force inferences. However, if the reason the Gaussian gravity model works is that people infer unseen forces, this should come out clearly in this study.

      Author response image 1.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R1: We thank the reviewer for this suggestion. To directly test whether participants’ judgments were influenced by their implicit assumptions about external forces, we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). Before the start of the experiment, we explicitly informed the participants that the wall was designed to block wind, ensuring that any potential wind forces from the direction of the wall would not influence the collapse. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants tested (1 female; ages: 24-30), similar to the experiment without the wall (Supplementary Figure 4B). Therefore, the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, not shaped by external forces or explicit instructions.

      This new experiment has been added to the revised manuscript

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (2) Similarly, you can imagine a simple study where you drop an object behind a floating occluder and you check where people produce an anticipatory fixation (i.e., where do they think the object will come out?). If people have a stochastic representation of gravity, this should be reflected in their fixations. But my guess is that everyone will look straight down.

      Author response image 2.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      R2: We thank the reviewer for suggesting this thought experiment. However, when predicting the landing point of a falling object, participants may rely more on learned knowledge that an unimpeded object continues to fall in a straight line, rather than drawing on their intuitive physics. To avoid this potential confounding factor, we designed a similar experiment where participants were asked to predict the landing point of a parabolic trajectory, obscured by an occluder (Author response image 2A). In each trial, participants used a mouse (clicking the left button) to predict the landing point of each parabolic trajectory, and there were 100 trials in total. This design not only limits the impact of direct visual cues but also actively engages the mental simulation of intuitive physics. All three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      (3) I believe the correct alternative model should be the one that has uncertainty over unseen forces, which better captures current proposals in the field, and controls for the amount of uncertainty in the models.

      R3: We thank the reviewers for the above-mentioned suggestions, and the findings from these two new experiments reinforce our proposal regarding the inherent stochastic characteristic of how the mind represents gravity.

      (4) I was not convinced that the RL framework was set up correctly to tackle the questions it claims to tackle. What this shows is that you can evolve a world model with Gaussian gravity in a setup that has no external perturbations. That does not imply that that is how humans evolved their intuitive physics, particularly when creatures have evolved in a world full of external perturbations. Showing that when (1) there are hidden perturbations, and (2) these perturbations are learnable, but (3) the model nonetheless just learns stochastic gravity, would be a more convincing result.

      R4: We completely agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity. In fact, introducing additional external noise into the RL framework likely heightens the uncertainty in learning gravity’s direction, potentially amplifying, rather than diminishing, the stochastic nature of mental gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) Some comments on the writing:

      The word 'normality' is used to refer to people's judgments about whether a tower collapsed looked 'normal'. I was a bit confused by this because normality can also mean 'Gaussian' and the experiments are also sampling from Gaussian distributions. There were several points where it took me a second to figure out which sense of 'normality' the paper was using. I would recommend using a different term.

      R5: We are sorry for the confusion. In revision, the term “normality” has been replaced with “confidence level about normal trajectory”.

      (6) One small comment is that Newton's laws are not a faithful replica of the "physical laws of the world" they are a useful simplification that only works at certain timescales. I believe some people propose Newtonian physics as a model of intuitive physics in part because it is a rapid and useful approximation of complex physical systems, and not because it is an untested assumption of perfect correspondence.

      R6: We are sorry for the inaccurate expression. We have revised our statements in the manuscript Line 15-16: “We found that the world model on gravity was not a faithful replica of the physical laws, but instead encoded gravity’s vertical direction as a Gaussian distribution.”

      (7) Line 49-50: Based on Fig 1d, lower bound of possible configurations for 10 blocks is ~17 in log-space, which is about 2.5e7. But the line here says it's 3.72e19, which is much larger. Sorry if I am missing something.

      R7: We thank the reviewer to point out this error. We re-calculated the number of possible configurations using the formula (3) in the appendix, and the number of configurations with 10 blocks is:

      Thus,

      This estimated number is much larger than that in our previous calculation, which has been corrected in the revised text.

      Line 827-829: “d) The lower bound of configurations’ possible number and the number of blocks in a stack followed an exponential relationship with a base of 10. The procedure can create at least 1.14×1050 configurations for stacks consisting of 10 blocks.”

      Line 49-50: “… but the universal cardinality of possible configurations is at least 1.14×1050 (Supplementary Figure 1), …”

      Line 1017-1018: “… the number of configurations can be estimated with formula (9), which is 1.14×1050.”

      (8) Lines 77-78: "A widely adopted but not rigorously tested assumption is that the world model in the brain is a faithful replica of the physical laws of the world." This risks sounding like you are asserting that colleagues in the field do not rigorously test their models. I think you meant to say that they did not 'directly test', rather than 'rigorously test'. If you meant rigorous, you might want to say more to justify why you think past work was not rigorous.

      R8: We apologize for the inappropriate wording, the sentence has been revised and we illustrate the motivation more comprehensively in the revised text,

      Line 76-92: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach.”

      (9) Lines 79-84 States that past models encode gravity downward. It then says that alternatively there is consensus that the brain uses data from sensory organs and adds meaning to them. I think there might be a grammatical error here because I did not follow why saying there is 'consensus' on something is a theoretical alternative. I also had trouble following why those two statements are in opposition. Is any work on physics engines claiming the brain does not take data from sensory organs and add meaning to them?

      R9: We are sorry for the confusion. Here we intend to contrast the deterministic model (i.e., the uncertainty comes from outside the model) with the stochastic model (i.e., the uncertainty is inherently built into the model). In revision, we have clarified the intention. For details, please see R8.

      (10) Lines 85-88: Following on the sentence above, you then conclude that the representation of the world may therefore not be the same as reality. I did not understand why this followed. It seems you are saying that, because the brain takes data from sensory organs, therefore its representations may differ from reality.

      R10: Again, we are sorry about the confusion. Please see the revised text in R8.

      (11) Lines 190-191: I had trouble understanding this sentence. I believe you are missing an adjective to clarify that participants were more inclined to judge taller stacks as more likely to collapse.

      R11: We are sorry for the confusion. What we intended to state here is that participants’ judgment was biased, showing a tendency to predict a collapse for stacks regardless of their actual stability. We have revised this confusing sentence in the revision. Line 202–204: “However, the participants showed an obvious bias towards predicting a collapse for stacks regardless of their actual stability, as the dots in Fig 2b are more concentrated on the lower side of the diagonal line.”

      (12) Line 201: I don't think it's accurate to say that MGS "perfectly captured participants' judgments" unless the results are actually perfect.

      R12: We agree, and in revision we have toned down the statement Line 213–214: “…, the MGS, in contrast to the NGS, more precisely reflected participants’ judgments of stability …”

      Reviewer #2 (Recommendations For The Authors):

      I think this is an impressive set of experiments and modeling work. The paper is nicely written and I appreciate the poetic license the authors took at places in the manuscript. I only have clarification points and suggest a simple experiment that could lend further support to their conclusions. 1. In my opinion, the impact of this work is twofold. First, the suggestion that gravity is represented as a distribution of the world and not a result of (inferred) external perturbations. Second, that the distribution is advantageous as it balances speed and accuracy, and lessens computational processing demands (i.e., number of simulations). The second point here is contingent on the first point, which is really only supported by the RL model and potentially the inverted scene condition. I am somewhat surprised that the RL model does not converge on a width much smaller than ~20 degrees after 100,000 simulations. From my understanding, it was provided feedback with collapses based on natural gravity (deterministically downward). Why is learning so slow and the width so large? Could it be the density of the simulated world model distribution? If the model distribution of Qs was too dense, then Q-learning would take forever. If the model distribution was too sparse, then its final estimate would hit a floor of precision. Could the authors provide more details on the distribution of the Qs for the RL model?

      Author response image 3.

      RL learning curves as a function of θ angle with different sampling densities and learning rates. Learning rates were adjusted to low (a), intermediate (b) and high (c) settings, while sampling densities were chosen at four levels: 5x5, 11x11, 31x31, and 61x61 shown from the left to the right. Two key observations emerged from the simulations as the reviewer predicted. First, higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances. Second, increased sampling density necessitated more iterations for convergence. Note that in all simulations, we limited the iterations to 1,000 times (as opposed to 100,000 times reported in the manuscript) to demonstrate the trend without excessive computational demands.

      R1: To illustrate the distribution of the Q-values for the RL model, we re-ran the RL model with various learning rates and sampling densities (Author response image 3). These results support the reviewer’s prediction that higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances, and increased sampling density requires more iterations for convergence.

      This simulation also elucidates the slower learning observed in the experiment described in the text, where the force sphere was divided into 61x61 angle pairs, and the learning rate was set to 0.15. This set of parameters ensured convergence within a reasonable brief timeframe while maintaining high-resolution force assessments.

      Besides, the width of the Gaussian distribution is mainly determined by the complexity of stacks. As shown in Figure 3c and Supplementary Figure 9, stacks with fewer blocks (i.e., less complex) caused a larger width, whereas those with more blocks resulted in a narrower spread. In the study, we used a collection of stacks varying from 2 to 15 blocks to simulate the range of stacks humans typically encounter in daily life.

      In revision, we have incorporated these insights suggested by the reviewer to clarify the performance of the RL framework:

      Line 634-639: “The angle density and learning rate are two factors that affect the learning speed. A larger angle density prolongs the time to reach convergence but enables a more detailed force space; a higher learning rate accelerates convergence but incurs larger variance during training. To balance speed and convergence, we utilized 100,000 configurations for the training.”

      Line 618-619: “…, separately divided them into 61 sampling angles across the spherical force space (i.e., the angle density).”

      (2) Along similar lines, the authors discuss the results of the inverted science condition as reflecting cognitive impenetrability. However, do they also interpret it as support for an intrinsically noisy distribution of gravity? I would be more convinced if they created a different scene that could have the possibility of affecting the direction of an (inferred) external perturbation - a previously held explanation of the noisy world model. For example, a relatively simple experiment would be to have a wall on one side of the scene such that an external perturbation would be unlikely to be inferred from that direction. In the external perturbation account, phi would then be affected resulting in a skewed distribution of angle pairs. However, in the authors' stochastic world model phi would remain unaffected resulting in the same uniform distribution of phi the authors observed. In my opinion, this would provide more compelling evidence for the stochastic world model.

      Author response image 4.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R2: We thank the reviewer for this suggestion. Following the reviewer’s concern, we designed the experiment with the addition of a wall implemented on one side (Supplementary figure 4A). We explicitly informed the participants that the wall was designed to block wind before the start of the experiment, ensuring no potential wind forces from the direction of the wall to influence the collapse trajectory of configurations. Participants need to judge if the trajectory was normal. If participants’ judgments were influenced by external noises, we would expect to observe a skewed angle distribution. However, our results still showed a normal distribution across all participants tested, consistent with the experiment without the wall (Supplementary figure 4B). This experiment suggested the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, rather than shaped by external forces or explicit instructions.

      We revised the original manuscript, and added this new experiment

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (3) I didn't completely follow the authors' explanation for the taller objects illusion. On lines 229-232, the authors state that deviations from gravity's veridical direction are likely to accumulate with the height of the objects. Is this because, in the stochastic world model account, each block gets its own gravity vector that is sampled from the distribution? The authors should clarify this more explicitly. If this is indeed the author's claim, then it would seem that it could be manipulated by varying the dimensions of the blocks (or whatever constitutes an object).

      R3: We are sorry for the confusion caused by the use of the term ‘accumulate’. In the study, there is only one gravity vector sampled from the distribution for the entire structure, rather than each block having a unique gravity vector. The height illusion is attributed to the fact that the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction. This is especially true for objects consisting of multiple blocks stacked atop one another. In revision, we have removed the confusing term ‘accumulate’ for clarification.

      Line 242-244: “…, because the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction during humans’ internal simulations.”

      (4) The authors refer to the RL simulations as agent-environment interactions, but in reality, the RL model does not interact with the blocks. Would experience-dependent or observation be more apropos?

      R4: We completely agree. Indeed, the RL model did not manipulate stacks; rather, it updated its knowledge of natural gravity based on the discrepancies between the RL model’s predictions and observed outcomes. In revision, we have removed the confusing term ‘agent-environment interactions’ and clarified its intended meaning.

      Line 19-22: “Furthermore, a computational model with reinforcement learning revealed that the stochastic characteristic likely originated from experience-dependent comparisons between predictions formed by internal simulations and the realities observed in the external world, …”

      Reviewer #3 (Public Review):

      (1) In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.

      Author response image 5.

      Differentiating Subjectivity from Objectivity. In both Experiment 1 (a) and Experiment 2 (b), participants were instructed to determine which shape appeared most stable. Objectively, in the absence of external forces, all shapes possess equal stability. Yet, participants typically perceived the shape on the left as the most stable because of its larger base area. The discrepancy between objective realities and subjective feelings, as we propose, is attributed to the human mind representing gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      R1: We agree with the reviewer that objects will remain stable until disturbed by external forces. However, in many cases, this is a clear discrepancy between objective realities and subjective feelings. For example, electromagnetic waves associated with purple and red colors are the farthest in the electromagnetic space, yet purple and red are the closest colors in the color space. Similarly, as shown in Supplementary Figure 4, in reality all shapes possess equal stability in the absence of external forces. Yet, humans typically perceive the shape on the left as more stable because of its larger base area. In this study, we tried to explore the mechanism underlying this discrepancy by proposing that the human mind represents gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      In revision, we have clarified the rationale of this study

      Line 76-98: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach. Here, we investigated these two alternative hypotheses regarding the construction of the world model in the brain by examining how gravity’s direction is represented in the world model when participants judged object stability.”

      (2) The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint.

      R2: We thank the reviewer for highlighting several potential confounding factors in our study. We address each of these concerns point-by-point:

      (a) Misinterpretation of the 3D scene and motion. In Response Figure 4 shown above, there is no 3D structure, yet participants’ judgment on stability still deviated from objective realities. In addition, the introduction of 3D motion was to aid in understanding the stacks’ 3D structure. Previous studies without 3D motion have reported similar findings (Allen et al., 2020). Therefore, regardless of whether objects are presented in 2D or 3D, or in static or in motion formats, humans’ judgment on object stability appears consistent.

      (b) Errors in perceived height. While there might be discrepancies between perceived and simulated heights, such errors are systematic across all conditions. Therefore, they may affect the width of the Gaussian distribution but do not fundamentally alter its existence.

      (c) The viewpoint. In one experiment, we inverted gravity’s direction to point upward, diverging from common daily experience. Despite this change in viewpoint, the Gaussian distribution was still observed. That is, the viewpoint appears not a key factor in influencing how gravity’s direction is represented as a Gaussian distribution in our mental world.

      In summary, both our and previous studies (Allen et al., 2020; Battaglia et al., 2013) agree that humans’ subjective assessments of objects’ stability deviate from actual stability due to noise in mental simulation. Apart from previous studies, we suggest that this noise is intrinsic, rather than stemming from external forces or imperfect observations.

      (3) Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.

      R3: Indeed, the external forces suggested by the reviewer certainly influence judgments of objects’ stability. The critical question, however, is whether humans’ judgments on objects’ stability accurately mirror the actual stability of objects in the absence of external forces. To address this question, we designed two new experiments.

      Experiment 1: we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). We explicitly informed the participants that the wall could block wind, ensuring that no potential wind from the direction of the wall could influence the configuration. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants (Age: 25-30, two females), which is similar to the experiment without the wall (Supplementary Figure 4B).

      Author response image 6.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      Experiment 2: The second experiment adopted another paradigm to test the hypothesis of stochastic mental simulation. Consider humans to infer the landing point of a parabolic trajectory that was obscured by an occlude (Author response image 2A), the stochastic mental simulation predicted that humans’ behavior follows a Gaussian distribution. However, if humans’ judgments were influenced by external noise, the landing points could not be Gaussian. The experiment consists of 100 trials in total, and in each trial participants used a mouse to predict the landing point of each trajectory by clicking the left button. Our results found all three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      Author response image 7.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      (4) The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event?

      R4: We agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.

      Author response image 8.

      Perception Uncertainty in 3D stacks structures. (a) Experimental design. A pair of two stacks with similar placements of blocks were presented sequentially to participants, who were instructed to judge whether the stacks were identical and to rate their confidence in this judgment. Each stack was presented on the screen for 2 seconds. (b) Behavior Performance. Three participants (2 males, age range: 24-30) were recruited to the experiment. The confidence in determining whether a pair of stacks remained unchanged rapidly decreased when each block had a very small displacement, suggesting humans could keenly perceive trivial changes in configurations. The x-axis denotes the difference in block placement between stacks, with the maximum value (0.4) corresponding to the length of a block’s short side. The Y-axis denotes humans’ confidence in reporting no change. The red curve illustrates the average confidence level across 4 runs, while the yellow curve is the confidence level of each run.

      R5: Indeed, uncertainty is inevitable when perceiving the external world, because our perception is not a faithful replica of external reality. A more critical question pertains to the accuracy of our perception in representing the 3D coordinates of a stack’s blocks. To address this question, we designed a straightforward experiment (Author response image 5a), where participants were instructed to determine whether a pair of stacks were identical. The position of each block was randomly changed horizontally. We found that all participants were able to accurately identify even minor positional variations in the 3D structure of the stacks (Author response image 5b). This level of perceptual precision is adequate for locating the difference between predictions from mental simulations and actual observations of the external world.

      (6)Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?

      R6: Yes. The NGS model achieved 80% accuracy as rapidly as the MGS model. However, the NGS model required a significantly longer period to reach the plateau crucial for decision-making. In revision, this information is now included.

      Line 348-350: “…, while the initial growth rates of both models were comparable, the MGS reached the plateau crucial for decision-making sooner than the NGS.”

      We greatly appreciate the thorough and insightful review provided by all three reviewers, which has considerably improved our manuscript, especially in terms of clarity in the presentation of the approach and further validation of the robustness implications of our results.

      Reference: Allen KR, Smith KA, Tenenbaum JB. 2020. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences 117:29302–29310.

      Battaglia PW, Hamrick JB, Tenenbaum JB. 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110:18327–18332.

      Friston K, Moran RJ, Nagai Y, Taniguchi T, Gomi H, Tenenbaum J. 2021. World model learning and inference. Neural Networks 144:573–590.

      Kriegeskorte N, Douglas PK. 2019. Interpreting encoding and decoding models. Current opinion in neurobiology 55:167–179.

      MacKay DM. 1956. The epistemological problem for automataAutomata Studies.(AM-34), Volume 34. Princeton University Press. pp. 235–252.

      Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M, Uchibe E, Morimoto J. 2022. Deep learning, reinforcement learning, and world models. Neural Networks.

      Naselaris T, Kay KN, Nishimoto S, Gallant JL. 2011. Encoding and decoding in fMRI. Neuroimage 56:400–410.

      Zhou L, Smith K, Tenenbaum J, Gerstenberg T. 2022. Mental Jenga: A counterfactual simulation model of physical support.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study combines a comparative approach in different synapses with experiments that show how synaptic vesicle endocytosis in nerve terminals regulates short-term plasticity. The data presented support the conclusions and make a convincing case for fast endocytosis as necessary for rapid vesicle recruitment to active zones. Some aspects of the description of the data and analysis are however incomplete and would benefit from a more rigorous approach. With more discussion of methods and analysis, this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After the acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses an acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear-cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated. Although this is a hard question and difficult to address experimentally, reagents may affect synaptic vesicle mobilization to the release sites directly in addition to blocking endocytosis.

      To acutely block vesicle endocytosis, we utilized two different pharmacological tools, Dynasore and Pitstop-2, after testing their blocking spectra and potencies at the calyx presynaptic terminals and collected data of their common effects on target functions. Since the recovery from STD was faster at the calyx synapses in the presence of both endocytic blockers in physiological 1.3 mM [Ca2+] (Figure 2B), but not in 2.0 mM [Ca2+] (Figure S4), they might facilitate vesicle mobilization in physiological condition.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular, the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse. This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      The concept of FRP and SRP are derived from voltage-clamp step-depolarization experiments at calyces of Held in pre-hearing rodents at RT, which cannot be directly dissected in data of action-potential evoked EPSCs at post-hearing calyces at physiological conditions. However, we dissected as much by referring to related literatures in new paragraphs in Result section (p9-10), particularly on the different effects of Latrunculin application and experimental conditions by adding a new supplementary Figure (now S5). Regarding F-actin role in vesicle replenishment at cerebellar synapses, we added sentences in Discussion section (p14, last paragraph).

      Reviewer #3 (Public Review):

      General comments:

      (1) While Dynasore and Pitstop-2 may impede release site clearance due to an arrest of membrane retrieval, neither Latrunculin-B nor ML-141 specifically acts on AZ scaffold proteins. Interference with actin polymerization may have a number of consequences many of which may be unrelated to release site clearance. Therefore, neither Latrunculin-B nor ML-141 can be considered suitable tools for specifically identifying the role of AZ scaffold proteins (i.e. ELKS family proteins, Piccolo, Bassoon, α-liprin, Unc13, RIM, RBP, etc) in release site clearance which was defined as one of the principal aims of this study.

      In this study, we focused our analysis on the downstream activity of scaffold protein intersectin by comparing the common inhibitory effects of CDC42 and actin polymerization, by use of ML141 and Latrunculin B, respectively, on vesicle endocytosis and synaptic depression/ facilitation without addressing diverse individual drug effects. To avoid confusion we removed “AZ” from scaffold protein.

      (2) Initial EPSC amplitudes more than doubled in the presence of Dynasor at hippocampal SC->CA1 synapses (Figure S2). This unexpected result raises doubts about the specificity of Dynasor as a tool to selectively block SV endocytosis.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) In this study, the application of Dynasore and Pitstop-2 strongly decreases 100 Hz steady-state release at calyx synapses while - quite unexpectedly - strongly accelerates recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      The latrunculin effect on STD can vary according to the condition of application and external [Ca2+], which we show in a new supplemental Figure S5. The latrunculin effect on the recovery from STD also varies with temperature, [Ca2+], and animal age, which affect Ca2+-dependent fast recovery component from depression. We added paragraphs for this issue in Results section (p9-10).

      (4) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We added methodological explanations and reworded sentences in the text to be clear for pharmacological data derived from non-sequential separate experiments.

      (5) The authors compare results obtained in calyx with those obtained in SC->CA1 synapses which they considered examples for 'fast' and 'slow' synapses, respectively. There is little information given to help readers understand why these two synapse types were chosen, what the attributes 'fast' and 'slow' refer to, and how that may matter for the questions studied here. I assume the authors refer to the maximum frequency these two synapse types are able to transmit rather than to EPSC kinetics?

      Yes, the “fast and slow” naming features maximum operating frequency these synapses can transmit. We reworded “fast and slow” to “fast-signaling and slow-plastic” and added explanation in the text.

      (6) Strong presynaptic stimuli such as those illustrated in Figures 1B and C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents a fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Since the data shown in Figs. 1 and 3 are central to the argumentation, illustration of the corresponding conductance traces is mandatory. Merely mentioning that the first 450 ms after stimulation were skipped during analysis is insufficient.

      Conductance trace is shown with a trace of capacitance change induced by a square pulse in our previous paper (Yamashita et al, 2005 Science).

      (7) It is essential for this study to preclude a contamination of the results with postsynaptic effects (AMPAR saturation and desensitization). AMPAR saturation limits the amplitudes of initial responses in EPSC trains and hastens the recovery from depression due to a 'ceiling effect'. AMPAR desensitization occludes paired-pulse facilitation and reduces steady-state responses during EPSC trains while accelerating the initial recovery from depression. The use of, for example, 1 mM kynurenic acid in the bath is a well-established strategy to attenuate postsynaptic effects at calyx synapses. All calyx EPSC recordings should have been performed under such conditions. Otherwise, recovery time courses and STP parameters are likely contaminated by postsynaptic effects. Since the effects of AMPAR saturation on EPSC_1 and desensitization on EPSC_ss may partially cancel each other, an unchanged relative STD in the presence of kynurenic acid is not necessarily a reliable indicator for the absence of postsynaptic effects. The use of kynurenic acid in the bath would have had the beneficial side effect of massively improving voltage-clamp conditions. For the typical values given in this MS (10 nA EPSC, 3 MOhm Rs) the expected voltage escape is ~30 mV corresponding to a change in driving force of 30 mV/80 mV=38%, i.e. initial EPSCs in trains are likely underestimated by 38%. Such large voltage escape usually results in unclamped INa(V) which was suppressed in this study by routinely including 2 mM QX-314 in the pipette solution. That approach does, however, not reduce the voltage escape.

      Glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) although it does in pre-hearing calyces (Yamashita et al, 2009). In fact, as shown in Figure S3, our results are essentially the same with or without kynurenate.

      (8) In the Results section (pages 7 and 8), the authors analyze the time course into STD during 100 Hz trains in the absence and presence of drugs. In the presence of drugs, an additional fast component is observed which is absent from control recordings. Based on this observation, the authors conclude that '... the mechanisms operate predominantly at the beginning of synaptic depression'. However, the consequences of blocking or slowing site clearing are expected to be strongly release-dependent. Assuming a probability of <20% that a fusion event occurs at a given release site, >80% of the sites cannot be affected at the arrival of the second AP even by a total arrest of site clearance simply because no fusion has yet occurred. That number decreases during a train according to (1-0.2)^n, where n is the number of the AP, such that after 10 APs, ~90% of the sites have been used and may potentially be unavailable for new rounds of release after slowing site clearance. Perhaps, the faster time course into STD in the presence of the drugs isn't related to site clearance?

      Enhanced depression at the beginning of stimulation indicates the block of rapid SV replenishment mechanism, which includes endocytosis-dependent site-clearance and scaffold-dependent vesicle translocation to release sites.

      (9) In the Discussion (page 10), the authors present a calculation that is supposed to explain the reduced size of the second calyx EPSC in a 100 Hz train in the presence of Dynasore or Pitstop-2. Does this calculation assume that all endocytosed SVs are immediately available for release within 10 ms? Please elaborate.

      We do not assume rapid endocytosed vesicle reuse within 10 ms as it requires much longer time for glutamate refilling (7s at PT; Hori & Takahashi, 2012). Instead, already filled reserved vesicles can rapidly replenish release sites if sites are clean and scaffold works properly. Results shown in Figure S6 also indicate that block of vesicle transmitter refilling has no immediate effect on synaptic responses.

      (10) It is not clear, why the bafilomycin/folimycin data is presented in Fig. S5. The data is also not mentioned in the Discussion. Either explain the purpose of these experiments or remove the data.

      These v-ATPase blockers, which block vesicular transmitter refilling, are reported to enhance EPSC depression at hippocampal synapses at RT and 2 mM [Ca2+] presumably because of lack of filled vesicles undergoing rapid vesicle recycling (eg Kiss & Run). We thought it important to determine whether these data have physiological relevance since such a mechanism might also regulate synaptic strength during repetitive transmission. However, our results did not support its physiological relevance. Since these results are not within our main questions, the negative results are shown it in supplementary Figure 6 and explained in the last paragraph of Result section (p11), but were not discussed further in Discussion section.

      (11) The scheme in Figure 7 is not very helpful.

      We updated the scheme to summarize our conclusion that vesicle replenishment through endocytosis-dependent site-clearance and scaffold-dependent mechanism independently co-operate to strengthen synaptic efficacy during repetitive transmission at calyx fast-signaling synapses. However, endocytic site clearance is solely required to support facilitation at slow-plastic hippocampal SC-CA1 synapses.

      Recommendations for the authors:

      First, my deep apologies for the long delay in reviewing your paper. All reviewers are now in agreement that the paper has valuable new information, but some methods are not described well and some results appear to be incompatible with previous results in the literature. The discussion of previous literature is also incomplete and not well-balanced. With more discussion of methods and literature strengthened this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms. We ask that you address the comments and revise your paper before we can fully recommend the paper as being an important contribution with compelling evidence and a strong data set that supports the conclusions.

      We explained methods more explicitly. Apparent incompatibility with previous results is now explained and discussed with new supplementary data.

      Major:

      (1) In this study, the application of Dynasore and Pitstop-2 strongly decreased 100 Hz steady-state release at calyx synapses while - quite unexpectedly - it strongly accelerated recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      Lack of change in the recovery from depression in dynamin-1 knockout mice by Mahapatra et al (2016) is consistent with results in Figure S4 in 2 mM [Ca2+], whereas accelerated recovery by Dynasore (Figure 2B2) is observed in 1.3 mM [Ca2+] suggesting that it is masked in 2 mM [Ca2+] but revealed in physiological [Ca2+] (p7, top paragraph). In both cases, however, recovery from STD is not prolonged unlike Hosoi et al (2009).

      The latrunculin issues are discussed in Results section with newly added Supplementary Figure S5 (p9-10).

      (2) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We made these points clearer in Method section and Result section.

      (3) Please cite and discuss briefly previous papers that have shown fast endocytosis in the calyx of Held with membrane capacitance measurements like Renden and von Gersdorff, J Neurophysiology, 98:3349, 2007 and Taschenberger et al., Neuron, 2002. These papers first showed exocytosis and endocytosis kinetics in more mature (hearing) mice calyx of Held and at higher physiological temperatures.

      One of these literatures relevant to the present study is quoted in p4.

      (4) The findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We added discussions on the issue of latrunculin in Result section by quoting previous literatures (p9-10). Since there is no direct evidence (by vesicle imaging) for the presence of FRP and SRP, these definitions derived from voltage clamp step-depolarization studies are difficult to incorporate into the dissection of synaptic depression in physiological conditions.

      Reviewer #1 (Recommendations For The Authors):

      I have no major comments, but the following issues may be addressed.

      (1) The term "fast and slow" synapses may be relative and a bit confusing. I do not think hippocampal synapses are slow synapses.

      We have replaced “fast and slow” by “fast-signaling and slow-plastic” to represent their functions and added explanation in the text.

      (2) Off-target effects of pharmacological effects may be discussed. In this respect, bafilomycin experiments can be used to argue against the slow effects of vesicle cycling such as endocytosis, and vesicle mobilization. However, the effects on rapid vesicle mobilization cannot be excluded entirely. Because I cannot exclude the absence of off-target effects either (can be addressed by looking at single vesicle imaging at nano-scale, which is hard to do or looking at EM level quantitatively?), I feel this is a matter of discussion.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) Fig2 A2, B2 and Fig 4 A2 and B2. It is easier to plot the recovery only normalized to the initial value. Subtracting steady-state is somewhat confusing because the recovery looks faster after deeper depression, but this may be just apparent.

      We have given values for both types of plots in Table 2, which indicates no essential difference in the recovery parameters.

      Reviewer #2 (Recommendations For The Authors):

      Line 51: Rajappa et al. (2016) investigated clearance deficits in synaptophysin KO mice (not synaptobrevin).

      Corrected.

      Line 54: intersectin is introduced as AZ scaffold protein, although in most of the literature, it is referred to as an endocytic scaffold protein (also in the cited one, e.g. Sakaba et al. 2013). At least, this should be discussed.

      Since blockers of intersectin downstream protein activity has no effect on vesicle endocytosis (Figure 3 and Sakaba et al, 2013), we called it (presynaptic) scaffold protein instead of endocytic scaffold protein.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Page 1, Title: I don't think the presented data address the role of the presynaptic scaffold in SV replenishment. In addition, 'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be implied here.

      In this study our focus was on the downstream activity of scaffold protein intersectin and since block of its downstream effector proteins CDC42 and actin activities do not obstruct the endocytic activity (Fig 3, and Sakaba et al., 2013), instead of naming it as “endocytic scaffold protein”, we adopted “presynaptic scaffold protein”.

      We have corrected it in the text.

      Page 2, Abstract: Clarify 'physiologically optimized condition' here and elsewhere in the manuscript.

      Abstract: in physiologically optimized condition → in physiological temperature and Ca2+.

      Page 3, line 62: I don't think 'the site-clearance hypothesis is widely accepted'. There are very few models that implement such a mechanism. Examples would be Pan & Zucker (2009) Neuron and Lin, Taschenberger & Neher 2022 (PNAS) which could be cited.

      62: the site-clearance hypothesis is “widely accepted”→ “well supported”

      Page 3 line 77: Please clarify 'fast synapses

      77: fast synapses→fast-signaling synapses, added clarification in the text.

      Page 4, line 100: Please clarify 'in the maximal rate'.

      100: in the maxima rate→reached during 1-Hz stimulation.

      Page 6, line 136: Please clarify 'to reduce the gap'.

      136: To reduce the gap between these different results→To explore the reason for these different results

      Page 7, line 157: I don't consider ML141 and Latrunculin-B 'scaffold protein inhibitors'.

      157: scaffold protein inhibitors had no effect on→ reworded as “none of these inhibitors affected fast or slow endocytosis”.  

      Page 7, line 162: P-value missing.

      162: p < 0.001 added.

      Page 8, line 184: "Since both endocytic blockers and scaffold inhibitors enhanced synaptic depression with a similar time course" consider rephrasing. Sounds like you refer to the time course by which these drugs exert their effect after being applied.

      184: Since both endocytic blockers and scaffold inhibitors enhance synaptic depression with a similar time course→Since the enhancement of synaptic depression by endocytic blockers or scaffold inhibitor occurred mostly at the early phase of synaptic depression.

      Same on page 11, line 250: "At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker" Please consider rephrasing.

      At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker →the early phase of synaptic depression like endocytic blockers

      Page 13, line 318: Please clearly state which experiments were performed at 1.3 mM and which at 2 mM external Ca if two different concentrations were used during recordings.

      320: Added text “Unless otherwise noted, EPSCs were recorded in 1.3 mM [Ca2+] aCSF at 37oC” in the methods.

      Page 15: line 346: Reference in the wrong format.

      346; (25) → (Yamashita et al, 2005)

      Page 15: line 351: Do you mean to say every 10 s and every 20 s? Please clarify.

      No, averaged at 10 ms and 20 ms, respectively as written.

      Page 16, line 369: 1 mM kyn was present in only very few experiments shown in the supplemental figures. Please clarify.

      368: In some experiments, to test in the presence of 1 mM kyn, if there is any difference in enhanced STD following endocytic block. However, as shown in Figure S3, our results are essentially the same with or without kynurenate, suggesting glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) unlike in pre-hearing calyces (Yamashita et al, 2009).

      Page 16, line 387: You cannot simply use multiple t-tests to compare a single control to multiple test conditions which seems to be the scenario here. Please correct or clarify.

      Experimental protocols are clarified in Methods as “Experiments were designed as population study using different cells from separate brain slices under control and drug treatment, rather than on a same cell before and after the drug exposure.”

      Table S1: 'Endo decay rate'. It's either the 'Endo rate' or the 'Deacy rate of delta Cm'. Please correct.

      Corrected as Endocytosis rate (Endo rate).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major change:

      All three of our reviewers raised the possibility that changes in movement during the time spent at the center ports could have contributed to changes in SWR rates. Analyses to address this possibility, based on the examination of trials with high and low speeds, were originally included in the supplement but we did not sufficiently highlight and explain these results. To rectify this, we have moved these results into a new main Figure 3 and now include a paragraph describing our interpretation of these results (page 9). We also include a more detailed description of the subjects’ behavior during port times – namely, that all subjects must remain quite stationary while at the reward ports in order to keep their nose in a specific position which keeps the port triggered. As a result, all subjects maintain head speeds well below our typical speed threshold for immobility while at the ports. This leads us to predict that any feedback based on periods of immobility alone (as requested by Reviewer 3) would show results very similar to our Control cohort and would not alter SWR rates seen during neurofeedback trials.

      Minor changes:

      (1) Reviewer 1 observed our that reported statistics appeared to be missing an interaction term showing that neurofeedback differentially affected the SWR rate/count pre- and postreward. We apologize for a lack of clarity here: we fit pre- and post-reward times with separate linear mixed effects models, so this interaction term is neither expected nor defined in our model. We have added a sentence clarifying this aspect of our LME approach in the Methods section: “Each model is designed to compare samples from all trials of the control group to samples from neurofeedback and delay trials from the neurofeedback cohort for a specific time period (for instance, pre-reward-delivery at the center ports).” Combining both times in the same model would require adding an additional hierarchical level in order to preserve the pairing of the pre- and post-reward time period for each trial, which we are concerned would complicate the formulation and interpretation of the model. However, the reviewer raises a good point that the comparison between these two time periods reveals an additional difference between the trial types: SWR rate remains relatively consistent between the pre- and post-reward periods during neurofeedback trials, while delay and control trials show a clear increase in SWR rate between the two time periods. To visualize and quantify this effect, we calculated the difference in SWR rates between the two time periods and now include this plot as Supplementary Figure 2F, which is referenced in page 8 of the main text.

      (2) Reviewer 2 found our original title, “Neurofeedback training can modulate task-relevant memory replay in rats” to be misleading and suggestive of a manipulation to memory content. We are in complete agreement with the Reviewer in that our manipulation does not alter replay content, so to be more specific and accurate, we have changed our title to their suggestion “Neurofeedback training can modulate task-relevant memory replay rate in rats” accordingly.

      (3) Reviewer 2 also requested that we include analyses quantifying baseline SWR rates for each of our experimental subjects. Although we initially considered reporting our results in measures of change relative to each individual animal’s baseline, we decided against this approach for several reasons.

      First, it is important to clarify that we extensively train the animals on the task prior to implant, so we do not have access to a truly naïve, pre-behavior baseline SWR rate for any of our subjects. However, because the pre-implant training is conducted consistently between our neurofeedback and our control cohort, we have no reason to believe that the behavioral training prior to implant would introduce differences in SWR rate between the cohorts. Indeed, we find no difference in post-reward SWR rate (or SWR rate at the home well) when we quantify the first 250 trials of post-implant behavior for each subject (see panel A below). Note that we cannot compare the pre-reward SWR rate at this point, because it is influenced by the task structure which guarantees at least one SWR in each neurofeedback trial pre-reward.

      Further, we do find that SWR rate is quite consistent over many days of task performance in the control cohort (show for the post-reward period in panel B below). This suggests that comparing the post-neurofeedback training SWR rates for the neurofeedback cohort to SWR rates throughout the training for the control cohort is not likely to be confounded by differing amounts of training experience. This is supported by our analyses in Figure 2 which show no differences in SWR rate between the two cohorts when considering pre- and post-reward times combined.

      Author response image 1.

      (A) SWR rate calculated during the post-reward period at the center port for the first 250 trials of postimplant behavior for each animal. Trials of all types are included (ie both neurofeedback trials and delay trials for the manipulation cohort). Groupwise comparison p=0.192. (B) Mean SWR rate during the post-reward period at the center port for each behavioral training epoch shows no systematic change over time across subjects within the control cohort.

      Finally, within each cohort, we found the overall SWR rates to be quite consistent across animals. If each subject in the neurofeedback cohort had shown dramatically different SWR rates at the beginning of neurofeedback training, we would have needed to express the effect of neurofeedback training relative to baseline for each animal. However, since the range of SWR rates were highly comparable, we felt that it was more accessible, and easier to place our results within the context of the literature, by expressing our results as simple SWR rates themselves rather than measures of relative change. Within the neurofeedback cohort, comparing neurofeedback to delay trials is inherently matched for baseline SWR rate since these comparisons are made within the same animal.

      (4) Finally, Reviewer 2 raises the possibility that older animals or those with cognitive deficits might respond to neurofeedback differently. We entirely agree with this possibility, and note this in our Discussion section: “Since the neurofeedback paradigm depends on the occurrence of at least a low endogenous rate of SWR occurrence, it would be important to implement neurofeedback training as a relatively early interventional strategy prior to extensive neurodegeneration, and training may take longer in aged or impaired subjects.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.

      We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:

      “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”

      We also rephase our research questions in the introduction section on Page 5 with tracked changes:

      “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”

      Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:

      “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”

      Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”

      References:

      Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281

      Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200

      Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212

      Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986

      (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.

      We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.

      (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,

      “Last night, my lazy brother/jacket came to the party one minute before it was over.

      Lily says this blue jacket/brother will be a big fashion trend this fall.”

      In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.

      (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.

      The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).

      We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.

      Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.

      Author response image 1.

      We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:

      “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.

      We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:

      “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”

      References:

      Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005

      Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058

      Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061

      McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972

      Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.

      Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.

      We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:

      “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”

      We reformulated the last sentence:

      “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”

      (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.

      We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).

      In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:

      “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”

      References:

      Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.

      We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.

      Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.

      (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?

      We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:

      "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."

      (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?

      In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:

      “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”

      (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).

      We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:

      "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."

      References:

      Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.

      (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?

      We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:

      "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."

      (6) Source analysis, p. 25f.: How was the beamformer regularized?

      This information was already included in the original manuscript on Page 26. The original text is provided below for reference:

      “No regularisation was performed to the CSD matrices (lambda = 0).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Zhu, et al present a genome-wide histone modification analysis comparing patients with schizophrenia (on or off antipsychotics) to non-psychiatric controls. The authors performed analyses across the dorsolateral prefrontal cortex and tested for enrichment of nearby genes and pathways. The authors performed an analysis measuring the effect of age on the epigenomic landscape as well. While this paper provides a unique resource around SCZ and its epigenetic correlates, and some potentially intriguing findings in the antipsychotic response dataset there were some potential missed opportunities - related to the integration of outside datasets and genotypes that could have strengthened the results and novelty of the paper.

      Major Comments

      (1) Is there genotype data available for this cohort of donors or can it be generated? This would open several novel avenues of investigation for the authors. First the authors can test for enrichment of heritability for SCZ or even highly comorbid disorders such as bipolar. Second, it would allow the authors to directly measure the genetic regulation of histone markers by calculating QTLs (in this case histone hQTLs). The authors assert that although interesting, ATACseq approach does not provide the same chromatin state information as histone mods mapped by ChiP. Why do the authors not test this? There are several ATACseq datasets available for SCZ [https://pubmed.ncbi.nlm.nih.gov/30087329/]and an additional genomic overlap could help tease apart genetic regulation of the changes observed.

      As detailed in our Methods section, brain samples have previous medical diagnosis, treatment record, and toxicological screening. Unfortunately, there was no genotype information on our brain sample collection. However, we examined overlap of differential enhancer and promoter peaks with genetic variants using linkage disequilibrium score regression (Fig. S10). Additionally, to assess agreement with the literature, we compared DEGs identified in our study with a previous snRNA-seq study in postmortem prefrontal cortex of schizophrenics and controls (Table S7).

      Repressive histone marks tend to provide different information than ATAC-seq data. However, we examined only activating marks in this study. Thus, the sentence in the Introduction mentioning that “ATAC-seq approach does not provide the same chromatin state information as histone modifications mapped by chromatin immunoprecipitation sequencing (ChIP-seq) assays do” has been removed.

      (2) Can the authors theorize why their analysis found significant effects for H3K27Ac for antipsychotic use when a recent epigenomic study of SCZ using a larger cohort of samples and including the same histone modifications did not [https://pubmed.ncbi.nlm.nih.gov/30038276/]? Given the lower n and lower number of cells in this group, it would be helpful if the authors could speculate on why they see this. Do the authors know if there is any overlap with the Girdhar study donors or if there are other phenotypic differences that could account for this?

      As mentioned in the Methods sections, three strengths of this brain bank include i) inclusion of samples of schizophrenia subjects with antemortem diagnosis (i.e., based on clinical histories) and not with postmortem diagnosis (i.e., based on interviews with relatives and friends – a diagnostic approach used by many brain banks worldwide but with important limitations, see here: PMID: 15607306), ii) inclusion of control subjects individually matched by sex, age and PMD, and iii) our possibility to test the presence or absence of antipsychotic medications in blood samples as an independent experimental variable. This allowed us to obtained novel and statistically valid conclusions related to cell-type epigenetic alterations in the frontal cortex of schizophrenia subjects, and the impact of age and antipsychotic treatment on chromatin organization.

      There is no overlap with Girdhar study donors.

      (3) The reviewer is concerned about the low concordance between bulk nuclei RNA-seq and single-cell RNA-seq for SCZ (236 of 802 DEGs in NeuN+ and 63 of 1043 NEuN-). While it is not surprising for different cohorts to have different sets of DEGs these seem to be vastly different. Was there a particular cell type(s) that enriched for the authors' DEGs in the single-cell dataset? Do the authors know if any donors overlapped between these cohorts?

      This overlap is acceptable considering that these are datasets originated from an entirely distinct cohort of postmortem human brain samples.

      (4) Functional enrichment analyses: details are not provided by the authors and should be added. The authors need to consider a) providing a gene universe, ie only considering the sets of genes with nearby H3K4me3/ H3K27ac levels, to such pathway tools, and b) should take into account the fact that some genes have many more peaks with data. There are known biases in seemingly just using the best p-value per gene in other epigenetic analysis (ie. DNA methylation data) and software is available to run correct analyses: https://pubmed.ncbi.nlm.nih.gov/23732277.

      GREAT was used to map differential peak loci to target genes using the whole genome as the background set and default basal extension as per Nord et al. http://dx.doi.org/10.1016/j.cell.2013.11.033. We argue that it is more biologically relevant than comparing against an artificially selected background. These gene sets were then passed to Panther for Gene Ontology enrichment analysis as per Liu et al. 10.1186/s12940-015-0052-5.

      Additional details are provided in Materials and Methods section:

      ChIP-seq annotation and functional enrichment

      GREAT analysis (http://great.standford.edu) was performed on differential peaks using the whole genome as background and default basal extension from 5kb upstream to 1kb downstream of the TSS.

      Significantly enriched Gene Ontology biological processes were identified using the Panther Classification tools using a hypergeometric test.

      Reviewer #2 (Public Review):

      The manuscript by Zhu has generated ChIP-seq and RNA-seq data from sizeable cohorts of SCZ patient samples and controls. The samples include 15 AF-SCZ samples and 15 controls, as well as 14 AT-SCZ samples and 14 controls. The genomics data was generated using techniques optimized for low-input samples: MOWChIP-seq and SMART-seq2 for histone profiles and transcriptome, respectively. The study has generated a significant data resource for the investigation of epigenomic alterations in SCZ. I am not convinced that the hierarchical pairwise design - first comparing AF-SCZ and AT-SCZ with their corresponding controls and secondarily contrasting the two comparisons is fully justified. The authors should repeat the statistical analysis by modeling all three groups simultaneously with an interaction effect for treatment or directly compare AF-SCZ to AT-SCZ groups and evaluate if the main conclusions remain supported.

      Major comments

      (1) The manuscript did not discuss (mention) the quality control of RNA-seq data shown in Fig. 1B. The color scheme choice for the heatmap visualization did not provide a quantitative presentation of the specificity of the RNA-seq data. I would recommend using bar plots to present the results more quantitatively.

      QC of raw RNA-seq data including per sequence GC and adapter content was assessed with FastQC. Reads underwent soft-clipping during STAR alignment with on average 73.8% (+/- 0.08%) reads for neurons and 69.0% (+/- 0.99%) reads for glia being uniquely mapped. A new supplementary figure (Figure S5) has been included to show four bar plots representing the expression values more quantitatively.

      These details are now provided in the RNA-seq data processing part of the Materials and Methods section:

      RNA-seq data processing

      The human genome (GRCh38) and comprehensive gene annotation were obtained from GENCODE (v29). Quality control of RNA-seq reads including per sequence GC and adapter content was assessed with FastQC. Reads were mapped with STAR (2.7.0f) with soft-clipping (average of 73.8% (+/- 0.08%) reads uniquely mapped for neurons and 69.0% (+/- 0.99%) reads for glia) and quantified with featureCounts (v2.0.1) using the default parameters.

      (2) How does the specificity of this RNA-seq dataset compare to previous studies using a similar NeuN sorting strategy?<br /> As mentioned in the Results section, highly significant (median p-value = 6 ´ 10-7) pairwise differences in molecular marker expression were observed for all markers ranging from mature, functional and synaptic neuron markers to astrocyte, oligodendrocyte and microglial markers (Figure 1B; Figures S4 and S5; Table S5). This confirms neuronal and non-neuronal cell-type identities in the NeuN+ and NeuN- nuclei samples, respectively.

      (3) I appreciate the effort to assess the ChIP-seq data quality using phantompeakqualtools. However, prior knowledge/experience with this tool is required to fully understand the QC results. The authors should additionally provide browser shots at different scales for key neuronal/glial genes, so readers can have a more direct assessment of data quality, such as the enrichment of H3K4me3 at promoters (but not elsewhere), and H3K27ac at promoters and enhancers. Existing browser views, such as Fig. 2B are too zoomed out for assessing the data quality.

      A new Fig 2B has been generated with a magnified view for clearer examination.

      (4) The pairwise regression model should be explicitly reported in methods.

      Additional details are included in the Methods section:

      Differential analysis for RNA-seq data

      We analyzed the bulk RNA-seq data of 29 schizophrenia subjects and 29 controls. The initial step involved filtering out genes with low read counts (less than 20 reads in over 50% of samples). The analysis then employed a two-step method to estimate the technical and biological noise. The first step was identifying the top 10 principal components (PCs) of the dataset. Subsequently, the correlation between each PC and various experimental (alignment rate, unique rate, exon percentage, number of unique mapped reads) and demographic (sex, age at death, PMD, antemortem diagnosis) factors was calculated. Covariates with high correlation to the PCs were included in the analysis to minimize their impact. The analysis was conducted using the 'DESeq2' software package, and genes with a false discovery rate (FDR) below 0.05 were identified as differentially expressed.

      (5) The statistical strategy to compare AF-SCZ and AT-SCZ to their corresponding control groups was unjustified. Why not model all three groups simultaneously with an interaction effect for treatment or directly compare AF-SCZ to AT-SCZ groups? If the manuscript argues that the antipsychotic effect is the main novelty, why not directly compare AF-SCZ and AT-SCZ?

      This is an important point. As mentioned above, one of the main strengths of our experimental design is that schizophrenia subjects and controls were individually matched by sex and age and (if possible) postmortem delay and freezing storage time. Our study is also among the first to report the potential impact of antipsychotic treatment on chromatin organization using postmortem human brain samples. Because of this individual matching method, we only compared schizophrenia subjects (either antipsychotic-free or antipsychotic-treated) with their respective individually matched controls. This experimental design is supported by our previous publications with postmortem human brain samples (PMID: 36100039; PMID: 28783139; PMID: 26758213; PMID: 23129762; PMID: 22864611; PMID: 18297054). The rationale behind this experimental design – as well as potential limitations particularly related to the division of the schizophrenia group in antipsychotic-free and antipsychotic-treated – is mentioned in the Discussion:

      Related to the effect of antipsychotic treatment, frontal cortex samples of schizophrenia subjects were divided into AF and AT based on postmortem toxicological analysis in both blood and when possible brain samples, which provides information about a longer retrospective drug-free period due to the high liposolubility of antipsychotic medications (Voicu and Radulescu, 2009). However, we cannot fully exclude the possibility of previous exposure to antipsychotic medications in the AF-schizophrenia group, and hence that the epigenetic alterations observed exclusively in the AF-schizophrenia group are a consequence of a potential period of decompensation, which typically occurs following voluntary treatment discontinuation (Liu-Seifert et al., 2005).

      It is also worth mentioning here that data were analyzed both at the cohort level, as well as at an individual level (schizophrenia/cohort pairs). This is mentioned in the manuscript:

      It should be noted that in the differential analyses here, the schizophrenia subjects (whether AF or AT) and their controls were compared at the cohort level, while matched schizophrenia/control pairs were examined individually in the TF-based analyses.

      (6) The method of pairwise comparison to corresponding control groups, then further comparing the pairwise results opens the study to a number of statistical vulnerabilities. For example, on page 12, the studies identified 166 DEGs between AF and control, and 1273 DEGs between AT and control. Instead of implicating a greater amount of difference between AT and control, such a result can often be driven by differences in between-group variance, rather than between-group means, that is, are the SCZ-AF and SCZ-treated effect size magnitudes and directionalities similar (but the treated group has lower variance) or are the two groups truly different in terms of means? The result in Fig. 5A suggests effect sizes for the two comparisons (AF-Ctrl and AT-Ctrl) are similar but have lower variability in the treated group.

      For a discussion regarding our approach, which involves a pairwise comparison, see above.

      (7) The pairwise comparison further raised the possibility the results were driven by the difference in the two control cohorts rather than the two SCZ cohorts.

      We clearly show that age is an important independent factor (Fig 7). Since controls are individually matched by sex and age, this limits the validity of the comparison among the two cohort groups including subjects of different age (see Tables S1 and S2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor Comments

      (1) Why not mention what histone modifications you measured by Chip-seq in the abstract? A certainly minor point but I felt I read for quite a while before I got to that point in the intro.

      The two histone marks are now mentioned in the abstract.

      (2) There are several places in the introduction where improper grammar is utilized and this should be edited.

      Introduction has been edited.

      (3) Related to major comments, how many donors overlapped with the PsychENCODE, CommonMind papers?

      Our datasets were generated from an entirely distinct cohort of postmortem human brain samples. Our postmortem sample collection does not overlap with postmortem samples included in PsychENCODE and/or CommonMind publications.

      (4) Since studies have already measured H3K4me3 and H3K27ac in the SCZ prefrontal cortex, why didn't the authors consider measuring changes in a related repressive marker? This is not to suggest the authors should do that now, but additional comments about other markers would help provide context for this analysis and point toward potential future studies.

      This is an interesting question and will be the goal of our future investigation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Why does stimulation at 0.15 Hz show a third harmonic signal (Figure 5A) but 0.25 Hz does not show a second harmonic signal?

      Second and third harmonic signals were sometimes observed in 0.15 Hz and also in 0.25 Hz and other frequency stimulations. The second harmonic signal is easier to understand as vasomotion may be reacting to both directions of oscillating stimuli. The reason for the emergence of the third harmonics was totally unknown. These harmonic signals were not always observed, and the magnitude of these signals was variable. The frequency-locked signal was robust, thus, in this manuscript, we decided to describe only this signal. These observations are mentioned in the revised manuscript (Results, page 9, paragraph 2).

      References for the windows are missing. Closed craniotomy: (Morii, Ngai, and Winn 1986). Thinned skull: (Drew et al. 2010).

      These references were incorporated into the revised manuscript.

      An explanation of, or at least a discussion on, why a flavoprotein or other intrinsic signal from the parenchyma might follow vasomotion with high fidelity would be most helpful.

      We spend a large part of the Results describing that any fluorescence signal from the brain parenchyma follows the vasomotion because the blood vessels largely lack fluorescence signals within the filter band that we observe. This is described as “shadow imaging”. What was rather puzzling was that flavoprotein or other intrinsic signals were phase-shifted in time. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. This is described in the manuscript as the following.

      (Results, page 13, paragraph 2)

      “Production and degradation of flavin and other metabolites may be induced by the fluctuation in the blood vessel diameter with a fixed delay time. The phase shift in the autofluorescence could be due to the additive effect of “shadow” imaging of the vessel and to the concentration fluctuation of the autofluorescent metabolite”

      Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections to the text and figures:

      (1) Figures 1 and 2- The single line slice basal and dilated traces are larger in Figure 2 (intact skull) than in Figure 1 (thinned skull)- have these been mixed up, as the authors state in the text that larger dilations are detected in the thinned skull preparation?

      The example vessel described for the thinned skull (Figure 1) happened to be larger than that shown for the intact skull (Figure 2). We did not describe that larger dilations are observed in the thinned skull preparation. What was described was that the vessel profiles were shallower in the intact skull. This is because the presence of the intact skull blurs the fluorescence image.

      (2) Figure 3- I think the lower panel of the amplitude spectrums from 3 individual animals included in D would benefit from being in its own panel within this Figure (i.e. E). The peak ratio is also used in this figure, but the equation to calculate this is not displayed until Figure 4.

      We thank the reviewer for recommending making the figure more comprehensible. We have divided panel D into D and E and shifted the panel character accordingly. The manuscript text was also updated.

      As the reviewer describes, the peak ratio of 0.25 Hz is used in Figure 3E (original). However, the equation to calculate this figure is described in the appropriate location within the main text of the manuscript (Results, page 10, paragraph 2) as well as in the figure legend.

      (3) Figure 5- In the visual stimulation traces displayed in C you have included a 10-degree scale bar, which looks similar in amplitude to the trace but the text states these are 17-degree amplitude traces.

      We thank the reviewer for noticing this mistake of labeling in the figure. We have corrected the error in the revised figure.

      (4) Figure 6- For the Texas red fluorescence traces and image scales displayed in F, you have shown the responding traces on the right and non-responding on the left, but the figure legend states the amplitude is strong on the left and weak on the right.

      We thank the reviewer for noticing the error in the figure legend text. We have corrected the error in the revised manuscript.

      (5) Figure 6- It would be helpful for the reader if the r value was displayed on the graph in G.

      We thank the reviewer for the suggestion. We have indicated the r value in Figure 6G as the reviewer recommended.

      Reviewer #3 (Recommendations For The Authors):

      Major

      It is unclear to me if the authors are studying vasomotion per se. Vasomotion is an intrinsic, natural rhythm of blood vessel diameter oscillation that is entrained by endogenous rhythmic neural activity. Importantly, if you take neural activity away, the blood vessel (with flow and pressure) should still be capable of oscillating due to an intrinsic mechanism within the vessel wall. In contrast, if one increases neural activity by way of sensory stimulation and blood flow increases, this is the basis of functional hyperemia. If one stimulates the brain over and over again at a particular frequency, it is expected that blood flow will increase whenever neural activity increases to the stimulus, up to a particular frequency until the blood vessel cannot physically track the stimulus fast enough. Functional hyperemia does not depend on an intrinsic oscillator mechanism. It occurs when the brain becomes active above endogenous resting activity due to sensory or motor activity.

      We thank the reviewer for stressing the importance of the distinction between “vasomotion” and functional “hyperemia”.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia, with both vasoconstriction and vasodilation, induced with slow oscillating visual stimuli was called “visually induced vasomotion”. This distinction in the terminology is now explicitly introduced in the revised manuscript (Introduction, page 3, paragraph 2-3; page 4, paragraph 1-2).

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed. Importantly, this visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. How much of the visually induced vasomotion relies on the mechanisms of intrinsic spontaneous vasomotion is also undetermined. Discussion about the future directions of understanding the mechanisms of visually induced vasomotion and entrainment is described in better detail in the revised manuscript (Discussions, page 19, paragraph 1).

      To me, one would need to silence the naturally occurring vasomotion to study it. As soon as one activates the brain with an external stimulus, functional hyperemia is being studied. One idea that would be interesting to look at is whether a single or perhaps a double stimulus, in an untrained vs trained mouse, shows vasodilation that occurs across the cortex and in the cerebellum. In other words, is there something special about repeating the signal over and over again that results in brain-wide synchronization, or does a single or double oscillation of the same frequency (0.25Hz) also transiently synchronize the brain? My guess is that a short stimulus would give you the same thing (especially in a trained mouse) and that there is nothing special about oscillating the signal over and over again (except for the learning component).

      We thank the reviewer for the ideas of new experiments to understand whether the visually induced vasomotion shares the same mechanisms for creating spontaneous vasomotion or not.

      We would like to emphasize again that the visually induced vasomotion is not observed in the Novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to the visual stimuli. Entrainment with repeated presentation of visual stimuli is required for this global synchronization phenomenon to occur.

      We would also like to emphasize that, even in Expert animals, the visually induced vasomotion that is frequency-locked to the presented stimulus does not always occur immediately. As shown in Figure 3D lower panel (Figure 3E in the revised figure), the vasomotion did not always immediately frequency-lock. The vasomotion was also not always stable throughout the 15 min of visual stimulation presentation. These characteristics are emphasized in the revised manuscript (Results, page 10, paragraph 1).

      Therefore, we would assume that a single or double frequency of the visual stimulation would not always be sufficient to transiently frequency-lock the visually induced vasomotion.

      An alternative idea is to test frequencies lower than vasomotion. Vasomotion typically oscillates around a wide range of very low frequencies averaging around 0.1Hz, yet here the authors entrain blood vessel oscillations towards the top end of vasomotion, at 0.25Hz. What would happen if the authors tried synchronizing brain activity with 0.025Hz? Would the natural vasomotion frequency still be there, or would it be gone, dominated by the 0.025Hz entrainment?

      We would assume that visually induced vasomotion will not be induced with 0.025 Hz visual stimuli. This is too slow to induce smooth pursuit of the visual stimuli with eye movement. We show that, even if smooth eye pursuit occurs, the visually induced vasomotion may or may not occur (Figure 6F). However, visually induced vasomotion does not largely occur without eye movement. Therefore, the proposed experiment by the reviewer is likely not doable.

      Finally, perhaps the authors can see if there is a long-lasting change in natural vasomotion occurring after the animal has been trained to 0.25Hz. For example, is there greater power in the endogenous fluctuation at either 0.25Hz (or perhaps 0.1Hz) with no visual stimulation given but after the animal has been trained? These ideas would be interesting to test and could help clarify whether this is plasticity in functional hyperemia or plasticity in vasomotion.

      It should also be mentioned that the frequency-locked vasomotion quickly dissipates as soon as the visual stimulation is halted (Figure 3D upper panel, middle). However, we agree with the reviewer that it would be interesting to see whether the fragmentation of the spontaneous vasomotion is observed less in the Trained or Expert mice compared to the Novice mice, to understand whether the entrainment effect would propagate to the properties of the spontaneous vasomotion.

      This issue I have raised is not a fundamental flaw in the paper, it pertains more to the wording, phrasing, and pitch of the paper i.e. is this really entrained and plastic vasomotion? I am skeptical. Nevertheless, I think the authors should try some of these suggestions to better characterize this effect.

      We agree that the phrasing used in the original manuscript was rather confusing, as “vasomotion” normally refers to spontaneous vascular movement. However, functional “hyperemia” may not adequately express the phenomenon that we observe either. The phenomenon that we observe is slowly oscillating vasodilation and vasoconstriction that is induced with visual stimuli with a temporal frequency similar to the spontaneously occurring “vasomotion”. This phenomenon is not a direct hyperemia response to the visual stimuli as it requires entrainment and it spreads globally throughout the whole brain. We revised our manuscript to define the terminology that we use.

      An important question is if neural activity is entraining the CBF responses. The authors should do one experiment in a pan-neural GCaMP line to test if neural activity in the visual cortex (and other areas captured in the widefield microscope) shows a progressive and gradual synchronization (or not) to the vasomotion responses with training. It is possible to do this through a thinned skull window. This important to know if/how synchronized population neural activity scales with training. Perhaps they will not correlate and there is something more subtle going on.

      In our paper, we mainly studied visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex.

      An important point that should be pointed out is that the neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated. This argument is now incorporated in the revised manuscript (Discussions, page 19, paragraph 1).

      We agree with the reviewer that, to identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required. We recognize this “shadow” effect and we are currently developing methods to take out the “shadow” effect and the intracellular pH fluctuation effect from the fluorescence traces.

      The authors nicely show that plasticity in vasomotion coincides with the mouse learning the HOKR task and that as eye movement tracks the stimulus, CBF gets entrained. However, there could also be a stress effect going on in the early trials, and as the mouse gets used to the procedure and stress comes down, the vasomotion entrainment can be seen. It could be the case that the vasomotion process is there on the first trial, but masked by stress-induced effects on neural and/or vascular activity. I did not see anything in the methods about how the mouse was habituated to head restraint. Was the first visual stim trial the first time the mouse was head restrained? If so, there could be a strong stress effect. The authors should address this either by clarifying that habituation to head restraint was done, or by doing a control experiment where each animal receives at least 1week of progressive and gradual head restraint before doing the same HOKR experiment using multiple trials.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the TexasRed experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, TexasRed was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, TexasRed was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion. This argument is included in the revised manuscript (Discussions, page 20, paragraph 2).

      Minor

      The first sentence of the introduction requires citations. It is also a somewhat irrelevant comparison to make.

      Necessary citation was made in the revised manuscript, as the reviewer suggested. We think that describing how the energy is distributed in the brain would provide one of the most important breakthroughs to the understanding of how efficient information processing in the brain works. Therefore, we would like to keep this introduction.

      The third and fourth sentence of the introduction equates vasodilation/vasoconstriction with vasomotion and it is not this simple. Vasomotion is a specific physiological process involving rhythmic changes to artery diameter. Also, the frequency of these slow oscillations needs to be stated. The authors only say they are slower than 10Hz.

      The definition of spontaneous vasomotion with indication of typical temporal frequency is described in the revised manuscript, as the reviewer suggested.

      More than half of the introduction is describing the paper itself, rather than setting the stage for the findings. The authors need a more thorough account of what is known and what is not known in this area. Some of this information is in the discussion, which should be moved up to the intro.

      We have revised the introduction to include the definition of spontaneous vasomotion and visually induced vasomotion or functional hyperemia, as the reviewer suggested.

      In the first paragraph of the results section, the authors should state in what way the mice are awake. Are they freely mobile? Are they head-restrained? Are they resting or moving or doing both at different times? This is clarified later but it should come up front as someone reads through the paper.

      As the reviewer suggested, we clarified that the experiments were done in awake and head-restrained mice within the first paragraph for the Results section.

      The authors say "As shown later, blood vessels on the surface...". There is no need to say "as shown later".

      This is deleted as the reviewer suggested.

      The use of "full width at 10% maximum" of the Texas red intensity for the diameter measure is a little odd, as it may actually overestimate the diameter, but I see what the authors were trying to do. A full-width half max is standard here and that is likely more appropriate. Also, the line profiles of intensity are not raw data. The authors say the trace is strongly filtered/smoothed. If so, this creates a somewhat artificial platform to make the diameter measurement. The authors should show raw data from a single experiment and make the measurement from that. The raw line profile should look almost square, where a full-width half-max would work well.

      Contrary to what the reviewer observed, the raw line profile was not almost square. Even if there were almost no blur in the XY dimension in the optical imaging system, one would not expect to see a square line profile, as the thickness of the vessel increases in the Z dimension towards the center, as this is not a confocal or two-photon microscope image, and an ideal optical section was not created. Therefore, the full-width half-maximum value would definitely be an underestimate of the actual vessel diameter. It may be possible to equate an ideal value for cutoff if we have the 3D point spread function of the imaging. 10% is an arbitrary number but we think 10% is the minimum intensity that we can distinguish from the background intensity fluctuations. We did not attempt to derive the “true” diameter of the vessel and full-width at 10% maximum is just an index of the actual diameter. In most of the manuscript, we only deal with the change of the vessel diameter relative to the basal diameter, therefore, we considered that careful derivation of the absolute diameter estimate is not necessary. This argument is detailed in the Materials and Methods section in the revised manuscript (page 31, paragraph 2).

      The raw line profile before filtering is shown overlaid in Figure 1C, as the reviewer suggested.

      In Figures 1 and 2, state/label what brain region this is.

      The blood vessels between the bregma and lambda on the cortex were observed and described in Figures 1 and 2. This is described in the revised manuscript, as the reviewer suggested.

      Can the authors also show what a vein or venule looks like using their quantification method in Figures 1 and 2? This would be a helpful comparison to a static vein.

      The methods shown in Figures 1 and 2 would not allow us to distinguish between vein and venule in our study. Methods that allow quantification of the relative blood vessel diameter fluctuation due to spontaneous or visually induced vasomotion activities are shown in Figures 1 and 2. Later in the manuscript, the whole intensity fluctuation of TexasRed or autofluorescence in the brain parenchyma is studied, and in this case, no distinction between vein and venules could be made.

      Statements such as this are not necessary: "Later in the manuscript, we will be dealing with vasomotion dynamics observed with the optical fiber photometry methods, in which the blood vessel type under the detection of the fiber could not be identified". Simply talk about this data when you get to it.

      We have deleted this statement in this part of the manuscript, as the reviewer suggested.

      Same as this, please consider deleting: "Spontaneous vasomotion dynamic differences between different classes of blood vessels would be of interest to study using a more sophisticated in vivo two-photon microscope which we do not own." Just describe the data you have from the methods you have. There is no need to lament.

      We deleted this sentence, as the reviewer suggested.

      Figure 3 D the light blue boxes showing the time period of visual stimulation physically overlay with the frequency-time spectrograms. They should not overlay with this graph because it makes them more light blue, distorting the figure which also uses light blue in the heat map.

      Figure 3D was modified, as the reviewer suggested.

      The authors say: "The reason why the vasomotion detected in our system through the intact skull in awake in vivo mice was less periodic was unknown." Yes, but you are imaging an awake mouse. Many spontaneous behaviours such as whisking, grooming, twitching, and struggling will manifest as increased artery diameter. These will be functional hyperemia occurring events on top of rhythmic vasomotion. This can be briefly discussed.

      As the reviewer comments, the vasomotion detected in awake mice was likely to be less periodic because the spontaneous animal behavior induces functional hyperemia and interrupts spontaneous vasomotion. This interpretation was included in the revised manuscript (Results, page 8, paragraph 1).

      The authors say "extremely tuned" on page 8. They should not use words like "extremely". Perhaps say "more strongly tuned" or equivalent.

      We have changed “extremely” to “more strongly”, as the reviewer suggested.

      The authors say "First, the Texas Red fluorescence images were Gaussian filtered in the spatial XY dimension to take out the random noise presumably created within the imaging system." It is inadvisable to alter the raw data in this way unless there is a sound reason to do so. If there is random noise this should not affect the Fast Fourier Transform analysis. If there is regular noise caused by instrumentation artefact, which is picked up by the analysis then perhaps this could be filtered out. A static Texas red sample in a vial can be used to determine if there is artefactual noise.

      We mainly used the Gaussian filter for better presentation of the imaged data. The TexasRed fluorescence was low in intensity and the acquired images were Gaussian filtered in the spatial XY dimesion to reduce the pixelated noise at the expense of spatial resolution reduction. This filter should not affect the temporal frequency of the observed vasomotion. This is now more clearly indicated in the revised manuscript (Results, page 10, paragraph 2).

      There are endogenous fluorescent molecules in cell metabolism that change dynamically to neural activity: NADH, NADPH, and FAD. These are almost certainly a fraction of the auto-fluorescent signal the authors are measuring and it would be expected to see small fluctuations in these metabolites with neural activity. Perhaps this can be discussed, and the authors can likely argue that metabolic signals are much smaller than the change caused by vasodilation.

      We found that the autofluorescence signal was phase-shifted in time relative to the vasomotion, which was visualized with TexasRed. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. It is also expected that these metabolites may fluctuate according to the neuronal activity that triggers visually induced vasomotion or functional hyperemia. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      The authors say "however, we found that, if Texas Red had to be injected before every training session, the mouse did not learn very well." This is interesting. Why do the authors suppose this was the case? Stress from the injection? Or perhaps some deleterious effect on blood vessel function caused by the dye itself? Either way, I think this honest statement should remain. Others need to know about it.

      We think that the stress from the injection interferes with the HOKR learning. However, as shown, TexasRed injection after the mouse had learned did not interfere with the eye movement or with the visually induced vasomotion. We do not know whether the injection stress directly interferes with the blood vessel function and affects the plastic vasomotion entrainment. These arguments are now described in the revised manuscript (Discussions, page 20, paragraph 2). The statement above remains as is, as the reviewer suggested.

      YCnano50 is a calcium sensor and not really appropriate for the use employed by the authors. They are exciting YFP at 505nm but unless the authors are using a laser line, there is some bandwidth of excitation light that is likely exciting the CFP too which still absorbs light up to ~490nm. Here, calcium signalling may affect the YFP signal. This can be discussed.

      Multiband-pass filter (Chroma 69008x with the relevant band of 503 nm / 19.5 nm (FWHM)) was used for direct excitation of YFP. Negligible light is passed below 490 nm. CFP excitation above 490 nm is assumed to be negligible and usually not defined in literature. We assume that with our optical system, fluorescence by direct YFP excitation dominates the effect from the minor CFP excitation effect. We explicitly describe this in the revised manuscript (Materials and Methods, page 28, paragraph 2).

      The discussion is interesting but does not actually discuss much of the data or measurements in the paper. Most of the discussion reads more like a topical review, rather than a critical analysis of the effects/measurements and why the authors' interpretations are likely correct. This can be improved.

      As the reviewer suggests, we have improved the discussion by starting with the summary of the results (Discussion, page 19, paragraph 1). We also included the possibility of stress affecting visually induced vasomotion (Discussion, page 20, paragraph 2).

    1. Author Response

      OVERVIEW OF RESPONSE TO REVIEWS

      I thank the three anonymous reviewers for providing well-informed, constructive feedback on the initial version of this manuscript. Based on their comments I will revise the manuscript and hopefully improve it in several ways. I expected a great deal of resistance to the ideas proposed in this model because they break from traditional approaches. One of my goals in developing this model was to argue for a paradigm shift regarding the concept of a “receptive field”. Experimentally, the receptive field is defined as the set of preferred environmental sensory circumstances that cause a neuron to become highly active. Traditional interpretation of receptive fields implicitly assumes that the environmental circumstances that give rise to the receptive field do so in a purely bottom-up fashion (the cell is “receiving” its field), in which case the receptive field specifies the function of the cell. In other words, the receptive field is what the cell does. However, some brain regions (e.g., entorhinal cortex) receive substantial feedback from downstream regions (e.g., hippocampus), and feedback can play an important role in determining the receptive field. As applied to a memory account of MTL, this feedback is memory retrieval and reactivation. Thus, the multifield spatial response of grid cells doesn’t necessarily mean that their function is spatial. Consideration of bottom-up versus top-down signals gives rise to the proposal that the bottom-up preference of many grid cells is some non-spatial attribute even though they exhibit a spatial receptive field owing to retrieval in specific locations.

      One thing I will emphasize in a revision is that this model can address findings in the vast literature on learning, memory, and consolidation. The question asked in this study is whether a memory model can also explain the rodent navigation literature. This is not an attempt to provide definitive evidence that this is a better account of the rodent navigation literature. Instead, the goal is to model the rodent navigation literature even though this is a memory model rather than a spatial/navigation model. Nevertheless, within the domain of rodent spatial/navigation, this model makes different predictions/explanations than spatial/navigation models. For instance, this is the only model predicting that many grid cells with spatial receptive fields are non-spatial (see predictions in Box 1). As reviewed in Box 1, this is the only model that can explain why head direction conjunctive grid cells become head direction cells in the absence of hippocampal feedback and it is the only model that can explain why some grid cells are also sensitive to sound frequency (see several other unique explanations in Box 1).

      This study is an attempt to unify the spatial/navigation and learning/memory literatures with a relatively simply model. Given the simplicity of the model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations. The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental or evolutionary time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells. In evolution and/or in development, it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      Grid cell models that are purely spatial are agnostic regarding the thousands of findings in the literature on memory, learning, and consolidation whereas this model can potentially unify the learning/memory and spatial/navigation literatures. The reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account. There are other grid cell models that can explain non-spatial grid-like responses (Mok & Love, 2019; Rodríguez‐Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015) and these models may be similarly positioned to explain memory results. However, these models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (these models would need to assume that rodent hippocampus is almost entirely concerned with spatial navigation). This account provides an answer to this conundrum by proposing that grid cells with spatial receptive fields have been misclassified as spatial. Below I give responses to some of the specific comments made by reviewers, grouping these comments by topic:

      COMMENTS RELATED TO THE NEED/MOTIVATION FOR THIS MODEL

      In a revision, I will clarify that the non-spatial MTL cell types that are routinely found in primate and human studies are fully compatible with this model. The reported simulations are focused on the specific question of how it can be that most mEC and hippocampal cell types in the rodent literature appear to be spatial. It is known that perirhinal cortex is not spatial. However, entorhinal cortex is the gateway to hippocampus. If the hippocampus has the capacity to represent non-spatial memories, it must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial.

      Lateral entorhinal cortex also projects to hippocampus, and one reviewer asks about the distinction between lateral versus medial entorhinal cortex. From this memory perspective, the important question is which part of the entorhinal cortex represents the non-spatial attributes common to the entire recording session, under the assumption that the animal is creating and retrieving memories during recording. If these non-spatial attributes are represented in lateral EC, there would be grid cells in lateral EC (but these are not found). There is evidence that lateral EC cells respond selectively in relation to objects (Deshmukh & Knierim, 2011), but in a typical rodent navigation study there are no objects in the enclosure.

      One reviewer asks whether this model is built to explain the existing data or whether the assumptions of this model are made for theoretical reasons. The BVC model (Barry et al., 2006), which is a precursor to this model, is a theoretically efficient representation of space that could support place coding. If the distances to different borders are known, it’s not clear why the MTL also needs the two-dimensional Fourier-like representation provided by grid cells. This gives rise to the proposal that grid cells with spatial receptive fields are serving some function other than representing space. In the proposed model, the precise hexagonal arrangement of grid cells indicates a property that is found everywhere in the enclosure (i.e., a “tiling” of knowledge for where the property can be found). This arrangement arises from the well-documented learning process termed “differentiation” in the memory literature (McClelland & Chappell, 1998; Norman & O’Reilly, 2003; Shiffrin & Steyvers, 1997), which highlights differences between memories to avoid interference and confusion.

      CONCERNS RELATED TO LIMITATIONS AND CONFLICTING RESULTS

      One reviewer points out that individual grid cells will typically reveal a grid pattern regardless of the environmental circumstances, which, according to this model, indicates that all such circumstances have the same non-spatial attribute. This might seem strange at first, but I suggest that there is a great deal of “sameness” to the environments used in the published rodent navigation experiments. For instance, as far as I’m aware, the animal is never allowed to interact with other animals during spatial navigation recording. Furthermore, the animal is always attached to wires during recording. The internal state of the animal (fear, aloneness, the noise of electronics, etc.) is likely similar across all recording situations and attributes of this internal state are likely represented in the hippocampus as well as in the regions that provide excitatory drive to hippocampus. The claim of this model is that the grid cells are “tagging” different navigation enclosures as places where these things happen (fear, aloneness, electronics, metal floor, no objects, etc.). The interesting question is what happens when the animal is allowed to navigate in a more naturalistic setting that includes varied objects, varied food sources, varied surfaces, other animals, etc. Do grid cells persist in such a naturalistic environment? Or do they lose their regularity, or even become silent, considering that there is no longer a uniformity to the non-spatial attributes? The results of Caswell Barry et al. (2012), demonstrate that the grid pattern expands and becomes less regular in a novel environment. Nevertheless, the novel environment in that study was uncluttered rather than naturalistic. It remains to be seen what will happen with a truly naturalistic environment.

      One reviewer asks how this model relates to non-grid multifield cells found in mEC (Diehl et al., 2017; see also the irregularly arranged 3D multifield cells reported by Ginosar et al., 2021). A full explanation of these cells would require a new simulation study. In a revision, I will discuss these cells, which reveal a consistent multifield spatial receptive field and yet the multiple fields are irregular in their arrangement rather than a precise hexagonal lattice. On this memory account, precise hexagonal arrangement of memories is something that occurs when there is a non-spatial attribute found throughout the enclosure. However, in a typical rodent navigation study, there may be some non-spatial attributes that are not found everywhere in the enclosure. For instance, consider the set of locations within the enclosure that afford a particular view of something outside of the enclosure or the set of locations corresponding to remembered episodic events (e.g., memory for the location where the animal first entered the enclosure). For non-spatial characteristics that are found in some locations but not others within in the enclosure, the cells representing those non-spatial attributes should reveal multifield firing at irregular locations, reflecting the subset of locations associated with the non-spatial attribute.

      One reviewer suggests that this model cannot explain the finding that grid fields become warped (e.g., grid fields arranged in an ellipse rather than a circle) in the same manner that the enclosure is warped when a wall is moved (Barry et al., 2007). The way in which I would simulate this result would be to assume that the change in the boundary location was too modest to be noticed by the animal. Because the distances are calculated relative to the borders, an unnoticed change in the border would not change the model in terms of the grid field as measured by proportional distances between borders. However, because the real-world Euclidean positions of the border are changed, the grid fields would be changed in terms of real-world coordinates. This is what I was referring to in the paper when I wrote “For instance, perhaps one egocentric/allocentric pair of mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders whereas a different egocentric/allocentric pair is based on head direction in remembered positions relative to landmarks exterior to the enclosure. This might explain why a deformation of the enclosure (moving in one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells. At the same time, if other grid modules are based on exterior properties (e.g., perhaps border cells in relation to the experimental room rather than the enclosure), then those grid modules would be unperturbed by moving the enclosure wall.” Related to the question of enclosure geometry, the irregularity that can emerge in trapezoid shaped enclosures was discussed in the section of the paper that reads “As seen in Figure 12, because all but one of the place cells was exterior when the simulated animal was constrained to a narrow passage, the hippocampal place cell memories were no longer arranged in a hexagonal grid. This disruption of the grid array for narrow passages might explain the finding that the grid pattern (of grid cells) is disrupted in the thin corner of a trapezoid (Krupic et al., 2015) and disrupted when a previously open enclosure is converted to a hairpin maze by insertion of additional walls within the enclosure (Derdikman et al., 2009).”

      CONCERNS THAT WILL BE ADDRESSED WITH GREATER CLARIFICATION

      One reviewer asks why a cell representing a non-spatial attribute found everywhere in the enclosure would not fire everywhere in the enclosure. In theory, cells could fire constantly. However, in practice, cells habituate and rapidly reduce their firing rate by an order of magnitude when their preferred stimulus is presented without cessation (Abbott et al., 1997; Tsodyks & Markram, 1997). After habituation, the firing rate of the cell fluctuates with minor variation in the strength of the excitatory drive. In other words, habituation allows the cell to become sensitive to changes in the excitatory drive (Huber & O’Reilly, 2003). Thus, if there is stronger top-down memory feedback in some locations as compared to others, the cell will fire at a higher rate in those remembered locations. In brief when faced with constant excitatory drive, the cell accommodates, and becomes sensitive to change in the magnitude of the excitatory drive.

      One reviewer asks for greater clarification regarding the simulation result of immediate stability for grid cells but not place cells. In a revision, I will provide a video showing a sped-up birds-eye view of the place cell memories for the 3D simulations that include head direction, showing the manner in which memories tend to linger in some locations more than others as they consolidate. This behavior was explained in the text that reads “Because the non-spatial cell’s grid field reflects on-average memory positions during the recording session (i.e., the locations where the non-spatial attribute is more often remembered, even if the locations of the memories are shifting), the grid fields for the non-spatial are immediately apparent, reflecting the tendency of place cells to linger in some locations as compared to other locations during consolidation. More specifically, the place cells tend to linger at the peaks and troughs of the border cell tuning functions (see the explanation above regarding the tendency of the grid to align with border cell dimensions). By analogy, imagine a time-lapsed birds-eye view of cars traversing the city-block structure of a densely populated city; this on-average view would show a higher density of cars at the cross-street junctions owing to their tendency to become temporarily stuck at stoplights. However, with additional learning and consolidation, the place cells stabilize their positions (e.g., the cars stop traveling), producing a consistent grid field for the head direction conjunctive grid cells.” The text describing why some locations are more “sticky” than others reads “Additional analyses revealed that this tendency to align with border cell dimensions is caused by weight normalization (Step 6 in the pseudocode). Specifically, connection weights cannot be updated above their maximum nor below their minimum allowed values. This results in a slight tendency for consolidated place cell memories to settle at one of the three peak values or three trough values of the sine wave basis set. This “stickiness” at one of 6 peak or trough values for each basis set is very slight and only occurred after many consolidation steps. In terms of biological systems, there is an obvious lower-bound for excitatory connections (i.e., it is not possible to have an excitatory weight connection that is less than zero), but it is not clear if there is an upper-bound. Nevertheless, it is common practice with deep learning models include an upper-bound for connection weights because this reduces overfitting (Srivastava et al., 2014) and there may be similar pressures for biological systems to avoid excessively strong connections.”

      One reviewer points out that Border cells are not typically active in the center of enclosure. However, the model can be built without assuming between-border cells (early simulations with the model did not make this assumption). Regarding this issue, the text reads “Unlike the BVC model, the boundary cell representation is sparsely populated using a basis set of three cells for each of the three dimensions (i.e., 9 cells in total), such that for each of the three non-orthogonal orientations, one cell captures one border, another the opposite border, and the third cell captures positions between the opposing borders (Solstad et al., 2008). However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).” The Solstad paper found a few cells that responded in positions between borders, but perhaps not as many as 1 out of 3 cells, such as this particular model simulation predicts. If the paucity of between-border cells is a crucial data point, the model can be reconfigured with opponent-border cells without any between border cells. The reason that 3 border cells were used rather than 2 opponent border cells was for simplicity. Because 3 head direction cells were used to capture the face-centered cubic packing of memories, the simulation also used 3 border cells per dimensions to allow a common linear sum metric when conjoining dimensions to form memories. If the border dimensions used 2 cells while head direction used 3 cells, a dimensional weighting scheme would be needed to allow this mixing of “apples and oranges” in terms of distances in the 3D space that includes head direction.

      REFERENCES Abbott, L. F., Varela, J. A., Sen, K., & Nelson, S. B. (1997). Synaptic depression and cortical gain control. Science, 275(5297), 220–224.

      Barry, C., Ginzberg, L. L., O’Keefe, J., & Burgess, N. (2012). Grid cell firing patterns signal environmental novelty by expansion. Proceedings of the National Academy of Sciences of the United States of America, 109(43), 17687–17692. https://doi.org/DOI 10.1073/pnas.1209918109

      Barry, C., Hayman, R., Burgess, N., & Jeffery, K. J. (2007). Experience-dependent rescaling of entorhinal grids. Nature Neuroscience, 10(6), 682–684.

      Barry, C., Lever, C., Hayman, R., Hartley, T., Burton, S., O’Keefe, J., Jeffery, K., & Burgess, Ν. (2006). The boundary vector cell model of place cell firing and spatial memory. Reviews in the Neurosciences, 17(1–2), 71–98.

      Derdikman, D., Whitlock, J. R., Tsao, A., Fyhn, M., Hafting, T., Moser, M. B., & Moser, E. I. (2009). Fragmentation of grid cell maps in a multicompartment environment. Nat Neurosci, 12(10), 1325-U155. https://doi.org/Doi 10.1038/Nn.2396

      Deshmukh, S. S., & Knierim, J. J. (2011). Representation of non-spatial and spatial information in the lateral entorhinal cortex. Frontiers in Behavioral Neuroscience, 5, 69.

      Diehl, G. W., Hon, O. J., Leutgeb, S., & Leutgeb, J. K. (2017). Grid and nongrid cells in medial entorhinal cortex represent spatial location and environmental features with complementary coding schemes. Neuron, 94(1), 83-92. e6.

      Ginosar, G., Aljadeff, J., Burak, Y., Sompolinsky, H., Las, L., & Ulanovsky, N. (2021). Locally ordered representation of 3D space in the entorhinal cortex. Nature, 596(7872), 404–409.

      Huber, D. E., & O’Reilly, R. C. (2003). Persistence and accommodation in short-term priming and other perceptual paradigms: Temporal segregation through synaptic depression. Cognitive Science, 27(3), 403–430. https://doi.org/10.1207/s15516709cog2703_4

      Krupic, J., Bauza, M., Burton, S., Barry, C., & O’Keefe, J. (2015). Grid cell symmetry is shaped by environmental geometry. Nature, 518(7538), 232–235.

      McClelland, J. L., & Chappell, M. (1998). Familiarity breeds differentiation: A subjective-likelihood approach to the effects of experience in recognition memory. Psychological Review, 105(4), 724–760.

      Mok, R. M., & Love, B. C. (2019). A non-spatial account of place and grid cells based on clustering models of concept learning. Nature Communications, 10(1), 5685.

      Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementary-learning-systems approach. Psychological Review, 110(4), 611–646.

      Rodríguez‐Domínguez, U., & Caplan, J. B. (2019). A hexagonal Fourier model of grid cells. Hippocampus, 29(1), 37–45.

      Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM - retrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145–166.

      Solstad, T., Boccara, C. N., Kropff, E., Moser, M. B., & Moser, E. I. (2008). Representation of Geometric Borders in the Entorhinal Cortex. Science, 322(5909), 1865–1868. https://doi.org/DOI 10.1126/science.1166466

      Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

      Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. (2017). The hippocampus as a predictive map. Nature Neuroscience, 20(11), 1643–1653.

      Tsodyks, M. V., & Markram, H. (1997). The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc Natl Acad Sci U S A, 94(2), 719–723. https://doi.org/10.1073/pnas.94.2.719

      Wei, X.-X., Prentice, J., & Balasubramanian, V. (2015). A principle of economy predicts the functional architecture of grid cells. Elife, 4, e08362.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. We have records of the fast-z correction applied by the ScanImage on microscope during acquisition, so we have supplied the online fast-z motion correction .csv files for two example sessions on our GitHub page as supplementary files:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      These files correspond to Figure S3b (2367_200214_E210_1) and to Figures 5 and 6 (3056_200924_E235_1). These are now also referenced in the main text. See lines ~595, pg 18 and lines ~762, pg 24.

      We have also made minor revisions to the main text of the manuscript with clear descriptions of methods that we have found important for the minimization of movement artifacts, such as fully tightening all mounting devices, implanting the cranial window with proper, evenly applied pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel. See Line ~309, pg 10.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We have opted to not change the title of the paper, because we feel that adding the qualifier, “in two preparations,” would add unnecessary complexity. In addition, while the dorsal mount preparation allows for imaging of bilateral dorsal cortex, the side mount preparation does indeed allow for imaging of both dorsal and lateral cortex across the right hemisphere (a bit of contralateral dorsal cortex is also imageable), and the design can be easily “flipped” across a mirror-plane to allow for imaging of left dorsal and lateral cortex. Taken together, we do show preparations that allow for pan-cortical 2-photon imaging.

      We do agree that imprecise reference to the two preparations can sometimes lead to confusion. Therefore, we made several small revisions to the manuscript, including at ~line 545, to make it clearer that we used two imaging preparations to generate our combined 2-photon mesoscope dataset, and that each of those two preparations had both benefits and limitations.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: The preparation by Esmaeili et al. 2021 has some similarities to, but also differences from, our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries for our side mount preparation, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We have compared these preparations more thoroughly in the revised manuscript. (See lines ~278.)

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion of areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons per session. We now mention these various factors and have made clear that we were not, for the purposes of this paper, trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      We refer to these issues now briefly in the main text. (See ~line 93, pg 3).

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript at ~line 235, pg. 7.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it is possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use ultrasound gel instead (which we found to be, to some degree, optically inferior to water), but without the horizontal light shield, light from the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult under these conditions because the camera would need the same optical access angle as the 2-photon objective, or would need to be moved downward toward the air table and rotated up at an angle of 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance.

      The B-SOiD analysis that we show in Figure 6 is based on a model trained on 80% of the data from four sessions taken from the same mouse, and then tested on all of a single session from that mouse. Initial attempts to train across sessions from different mice were unsuccessful, probably due to differences in behavioral repertoires across mice. However, we have performed extensive tests with B-SOiD and are confident that these sorts of results are reproducible across mice, although we are not prepared to publish these results at this time.

      We now clarify these points in the main text at ~line 865, pg 27.

      An additional comparison of the results of B-SOiD trained on different numbers of sessions to that of keypoint-MOSEQ (Weinreb et al, 2023, bioRxiv) trained on ~20 sessions can now be found as supplementary material on our GitHub site:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/Figure_SZZ_BSOID_MOSEQ_align.pdf

      The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: After re-examination of the original analysis output files, we have indeed discovered that some of the Rastermap neuron density maps in Figure 6e were incorrectly aligned with their respective qualitative behaviors due to a discrepancy in file numbering between the images in 6e and the ensembles identified in 6c (each time that Rastermap is run on the same data, at least with the older version available at the time of creation of these figures, the order of the ensembles on the y-axis changes and thus the numbering of the ensembles would change even though the neuron identities within each group stayed the same for a given set of parameters).

      This unfortunate panel alignment / graphical display error present in the original reviewed preprint has been fixed in the current, updated figure (i.e. twitch corresponds to Rastermap groups 2 and 3, whisk to group 6, walk to groups 5 and 4, and oscillate to groups 0 and 1), and in the main text at ~line 925, pg 29. We have also changed the figure legend, which also contained accurate but misaligned information, for Figure 6e to reflect this correction.

      One can now see that, because the data from both figures is from the same session in the same mouse, as you correctly point out, Fig 5d left (walk and whisk) corresponds roughly to Fig 6e group R7, “walk”, and that Fig 5d right (whisk) corresponds roughly to Fig 6e group R4, “twitch”.

      We have double-checked the identity of other CCF map displays of Rastermap neuron density and of mean correlations between neural activity and behavioral primitives in all other figures, and we found no other such alignment or mis-labeling errors.

      We have also added a caveat in the main text at ~lines 925-940, pg. 30, pointing out the preliminary nature of these findings, which are shown here as an example of the viability of the methods. Analysis of the variability of Rastermap alignments across sessions is beyond the scope of the current paper, although it is an issue that we hope to address in upcoming analysis papers.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We added a brief explanation of Suite2p motion correction at ~line 136, pg 4. We have also added additional details concerning CCF / MMM alignment and other analysis issues. In general we cite other papers where possible to avoid repeating details of analysis methods that are already published.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including those for multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We have edited the motivations behind the study to clarify the general problems that are being addressed. However, as the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details necessarily deal specifically with this system.

      We briefly compare the methods and results from our Thorlabs system to that of Diesel-2p, another comparable system, based on what we have been able to glean from the literature on its strengths and weaknesses. See ~lines 206-213, pg 6.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with in-depth analysis papers that are currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (d). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      We now reference this figure on ~lines 190-192, pg 6 of the main text, near the beginning of the Results section.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453:

      “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks”

      ,we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For the claim stated on line 463:

      “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”

      ,we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107).

      We have included these two new references in the new, revised version of our paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: It would be useful if we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group within each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons, or ~3.3% density, per CCF area.) Our current figure legend states the maximums of the scale bar look-up values (reds) for each group, which range from ~8% to 32%.

      However, because the data in panel 6e are from a single session and are being provided as an example of our methods and not for the purpose of claiming a specific result at this point, we choose not to report statistics. It is worth pointing out, perhaps, that Rastermap group densities for a given CCF area close to 3.3% are likely not different from chance, and those closer to ~40%, which is our highest density (for area M2 in Rastermap group 7, which corresponds to the qualitative behavior “walk”), are most likely not due to chance. Without analysis of multiple sessions from the same mouse we believe that making a clear statement of significance for this likelihood would be premature.

      We now clarify this decision and related considerations in the main text at ~line 920, pg 29.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript.

      Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (c). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data that one can expect to obtain using our methods. We will provide a more complete analysis of data obtained using our methodology in the near future in another manuscript.

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare 1-photon widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli (i.e. “passive sensory stimulation”), while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We have corrected these statements and incorporated these and other relevant references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, and speed. We will reference the papers you mention without an extensive literature review, but we would like to emphasize the following points:

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity related fluorescence from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      We have further clarified our discussion of these issues in the main text at ~lines 76-80, pg 2.

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we now include brief supplementary material demonstrating the changes in the window preparation that we observed over the prolonged time periods of our study, for both the dorsal and side mount preparations. The following link to this material is now referenced at ~line 287, pg 9, and at the end of Fig S1:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      We have also included brief additional details in the main text that we found were useful for facilitating long term use of these preparations. These are located at ~line 287-290, pg 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sharing raw data and code:

      I strongly encourage sharing some of the raw data from your experiments and all the code used for data analysis (e.g. in a github repository). This would help the reader evaluate data quality, and reproduce your results.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      Our existing GitHub repository, already referenced in the paper, is located here:

      https://github.com/vickerse1/mesoscope_spontaneous

      We have added an additional reference in the main text to the existence of these publicly available resources, including the appropriate links, located at ~lines 190-200, pg 6.

      (2) Use of proprietary software:

      The reliance on proprietary tools like LabView and Matlab could be a limitation for some researchers, given the associated costs and accessibility issues. If possible, consider incorporating or suggesting alternatives that are open-source, to make your methodology more accessible to a broader range of researchers, including those with limited resources.

      Authors’ Response: We are reluctant to recommend open source software that we have not thoroughly tested ourselves. However, we will mention, when appropriate, possible options for the reader to consider.

      Although LabView is proprietary and can be difficult to code, it is particularly useful when used in combination with National Instruments hardware. ScanImage in use with the Thorlabs mesoscope uses National Instruments hardware, and it is convenient to maintain hardware standards across the integrated rig/experimental system. Labview is also useful because it comes with a huge library of device drivers that makes addition of new hardware from basically any source very convenient.

      That being said, there are open source alternatives that could conceivably be used to replace parts of our system. One example is AutoPilot (author: Jonny Saunders), for control of behavioral data acquisition: https://open-neuroscience.com/post/autopilot/.

      We are not aware of an alternative to Matlab for control of ScanImage, which is the supported control software for the ThorLabs 2-photon mesoscope.

      Most of our processing and analysis code (see GitHub page: https://github.com/vickerse1/mesoscope_spontaneous) is in Python, but some of the code that we currently use remains in Matlab form. Certainly, this could be re-written as Python code. However, we feel like this is outside the scope of the current paper. We have provided commenting to all code in an attempt to aid users in translating it to other languages, if they so desire.

      (3) Quantifying the effect of tilted head:

      To address the potential impact of tilting the mouse's head on your findings, a quantitative analysis of any systematic differences in the behavior (e.g. Bsoid motifs) could be illuminating.

      Authors’ Response: We have performed DeepLabCut analysis of all sessions from both preparations, across several iterations with different parameters, to extract pose estimates, and we have also performed BSOiD of these sessions. We did not find any obvious qualitative differences in the number of behavioral motifs identified, the dwell times of these motifs, and similar issues, relating to the issue of tilting of the mouse’s head in the side mount preparation. We also did not find any obvious differences in the relative frequencies of high level qualitative behaviors, such as the ones referred to in Fig. 6, between the two preparations.

      Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      (4) Clarification in the discussion section:

      The paragraph titled "Advantages and disadvantages of our approach" seems to diverge into discussing future directions, rather than focusing on the intended topic. I suggest revisiting this section to ensure that it accurately reflects the strengths and limitations of your approach.

      Authors’ Response: We agree with the reviewer that this section included several potential next steps or solutions for each advantage and disadvantage, which the reviewer refers to as “future directions” and are thus arguably beyond the scope of this section. Therefore we have retitled this section as, “Advantages and disadvantages of our approach (with potential solutions):”.

      Although we believe this to be a logical organization, and we already include a section focused purely on future directions in the Discussion section, we have refocused each paragraph of the advantages/disadvantages subsection to concentrate on the advantages and disadvantages per se. In addition, we have made minor changes to the “future directions” section to make it more succinct and practical. These changes can be found at lines ~1016-1077, pg 33-34.

      Reviewer #2 (Recommendations For The Authors):

      Below are some more detailed points that will hopefully help to further improve the quality and scope of the manuscript.

      • While it is certainly favorable for many questions to measure large-scale activity from many brain regions, the introduction appears to suggest that this is a prerequisite to understanding multimodal decision-making. This is based on the argument that combining multiple recordings with movement indicators will 'necessarily obscure the true spatial correlation structures'. However, I don't understand why this is the case or what is meant by 'true spatial correlation structures'. Aren't there many earlier studies that provided important insights from individual cortical areas? It would be helpful to improve the writing to make this argument clearer.

      Authors’ Response: The reviewer makes an excellent point and we have re-worded the manuscript appropriately, to reflect the following clarifications. These changes can be found at ~lines 58-71, pg. 2.

      We believe you are referring to the following passage from the introduction:

      “Furthermore, the arousal dependence of membrane potential across cortical areas has been shown to be diverse and predictable by a temporally filtered readout of pupil diameter and walking speed (Shimoaka et al, 2018). This makes simultaneous recording of multiple cortical areas essential for comparison of the dependence of their neural activity on arousal/movement, because combining multiple recording sessions with pupil dilations and walking bouts of different durations will necessarily obscure the true spatial correlation structures.”

      Here, we do not mean to imply that earlier studies of individual cortical areas are of no value. This argument is provided as an example, of which there are others, of the idea that, for sequences or distributed encoding schemes that simultaneously span many cortical areas that are too far apart to be simultaneously imaged under conventional 2-photon imaging, or are too sparse to be discovered with 1-photon widefield imaging, there are some advantages of our new methods over conventional imaging methods that will allow for truly novel scientific analyses and insights.

      The general idea of the present example, based on the findings of Shimoaka et al, 2018, is that it is not possible to directly combine and/or compare the correlations between behavior and neural activity across regions that were imaged in separate sessions, because the correlations between behavior and neural activity in each region appear to depend on the exact time since the behavior began (Shimoaka et al, 2018), in a manner that differs across regions. So, for example, if one were to record from visual cortex in one session with mostly brief walk bouts, and then from somatosensory cortex in a second session with mostly long walk bouts, any inferred difference between the encoding of walk speed in neural activity between the two areas would run the risk of being contaminated by the “temporal filtering” effect shown in Shimoaka et al, 2018. However, this would not be the case in our recordings, because the distribution of behavior durations corresponding to our recorded neural activity across areas will be exactly the same, because they were recorded simultaneously.

      • The text describes different timescales of neural activity but is an imaging rate of 3 Hz fast enough to be seen as operating at the temporal dynamics of the behavior? It appears to me that the sampling rate will impose a hard limit on the speed of correlations that can be observed across regions. While this might be appropriate for relatively slow behaviors and spontaneous fluctuations in arousal, sensory processing and decision formation likely operate on faster time scales below 100ms which would even be problematic at 10 Hz which is proposed as the ideal imaging speed in the manuscript.

      Authors’ Response: Imaging rate is always a concern and the limitations of this have been discussed in other manuscripts. We will remind the reader of these limitations, which must always be kept in mind when interpreting fluorescence based neural activity data.

      Previous studies imaging on a comparable yet more limited spatial scale (Stringer et al, 2019) used an imaging speed of ~1 Hz. With this in view, our work represents an advance both in spatial extent of imaged cortex and in imaging speed. Specifically, we believe that ~1 Hz imaging may be sufficient to capture flip/flop type transitions between low and high arousal states that persist in general for seconds to tens of seconds, and that ~3-5 Hz imaging likely provides additional information about encoding of spontaneous movements and behavioral syllables/motifs.

      Indeed, even 10 Hz imaging would not be fast enough to capture the detailed dynamics of sensory processing and decision formation, although these speeds are likely sufficient to capture “stable” encodings of sensory representations and decisions that must be maintained during a task, for example with delayed match-to-sample tasks.

      In general we are further developing our preparations to allow us to perform simultaneous widefield imaging and Neuropixels recordings, and to perform simultaneous 1.2 x 1.2 mm 2-photon imaging and visually guided patch clamp recordings.

      Both of these techniques will allow us to combine information across both the slow and fast timescales that you refer to in your question.

      We have clarified these points in the Introduction and Discussion sections, at ~lines ~93-105, pg 3, and ~lines 979-983, pg 31 and ~lines 1039-1045, pg 33, respectively.

      • The dorsal mount is very close to the crystal skull paper and it was ultimately not clear to me if there are still important differences aside from the headbar design that a reader should be aware of. If they exist, it would be helpful to make these distinctions a bit clearer. Also, the sea shell implants from Ghanbari et al in 2019 would be an important additional reference here.

      Authors’ Response: We have added brief references to these issues in our revised manuscript at ~lines 89-97, pg 3:

      Although our dorsal mount preparation is based on the “crystal skull paper” (Kim et al, 2016), which we reference, the addition of a novel 3-D printable titanium headpost, support arms, light shields, and modifications to the surgical protocols and CCF alignment represent significant advances that made this preparation useable for pan-cortical imaging using the Thorlabs mesoscope. In fact, we were in direct communication with Cris Niell, a UO professor and co-author on the original Kim et al, 2016 paper, during the initial development of our preparation, and he and members of his lab consulted with us in an ongoing manner to learn from our successful headpost and other hardware developments. Furthermore, all of our innovations for data acquisition, imaging, and analysis apply equally to both our dorsal mount and side mount preparations.

      Thank you for mentioning the Ghanbari et al, 2019 paper on the transparent polymer skull method, “See Shells.” We were in fact not aware of this study. However, it should be noted that their preparation seems to, like the crystal skull preparation and our dorsal mount preparation, be limited to bilateral dorsal cortex and not to include, as does our cranial window side mount preparation and the through-the-skull widefield preparation of Esmaeili et al, 2021, a fuller range of lateral cortical areas, including primary auditory cortex.

      • When using the lateral mount, rotating the objective, rather than the animal, appears to be preferable to reduce the stress on the animal. I also worry that the rather severe head tilt could be an issue when training animals in more complex behaviors and would introduce an asymmetry between the hemispheres due to the tilted body position. Is there a strong reason why the authors used water instead of an imaging gel to resolve the issue with the meniscus?

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      • In parts, the description of the methods is very specific to the Thorlabs mesoscope which makes it harder to understand the general design choices and challenges for readers that are unfamiliar with that system. Since the Mesoscope is very expensive and therefore unavailable to many labs in the field, I think it would increase the reach of the manuscript to adjust the writing to be less specific for that system but instead provide general guidance that could also be helpful for other systems. For example (but not exclusively) lines 231-234 or lines 371 and below are very Thorlabs-specific.

      Authors’ Response: We have revised the manuscript so that it is more generally applicable to mesoscopic methods.

      We will make revisions as you suggest where possible, although we have limited experience with the other imaging systems that we believe you are referring to. However, please note that we already mentioned at least one other comparable system in the original eLife reviewed pre-print (Diesel 2p, line 209; Yu and Smith, 2021).

      Here are a couple of examples of how we have broadened our description:

      (1) On lines ~231-234, pg 7, we write:

      “However, if needed, the objective of the Thorlabs mesoscope may be rotated laterally up to +20 degrees for direct access to more ventral cortical areas, for example if one wants to use a smaller, flat cortical window that requires the objective to be positioned orthogonally to the target region.”

      Here have modified this to indicate that one may in general rotate their objective lens if their system allows it. Some systems, such as the Thorlabs Bergamo microscope and the Sutter MOM system, allow more than 20 degrees of rotation.

      (2) On line ~371, pg 11, we write:

      “This technique required several modifications of the auxiliary light-paths of the Thorlabs mesoscope”

      Here, we have changed the writing to be more general such as “may require…of one’s microscope.”

      Thank you for these valuable suggestions.

      • Lines 287-299: Could the authors quantify the variation in imaging depth, for example by quantifying to which extent the imaging depth has to be adjusted to obtain the position of the cortical surface across cortical areas? Given that curvature is a significant challenge in this preparation this would be useful information and could either show that this issue is largely resolved or to what extent it might still be a concern for the interpretation of the obtained results. How large were the required nominal corrections across imaging sites?

      Authors’ Response: This information was provided previously (lines 297-299):

      “In cases where we imaged multiple small ROIs, nominal imaging depth was adjusted in an attempt to maintain a constant relative cortical layer depth (i.e. depth below the pial surface; ~200 micrometer offset due to brain curvature over 2.5 mm of mediolateral distance, symmetric across the center axis of the window).”

      This statement is based on a qualitative assessment of cortical depth based on neuron size and shape, the density of neurons in a given volume of cortex, the size and shape of blood vessels, and known cortical layer depths across regions. A ground-truth measurement of this depth error is beyond the scope of the present study. However, we do specify the type of glass, thickness, and curvature that we use, and the field curvature characterization of the Thorlabs mesoscope is given in Fig. 6 of the Sofroniew et al, 2016 eLife paper.

      In addition, we have provided some documentation of online fast-z correction parameters on our GitHub page at:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      ,and some additional relevant documentation can be found in our publicly available data repository on FigShare+ at: https://doi.org/10.25452/figshare.plus.c.7052513

      • Given the size of the implant and the subsequent work attachments, I wonder to which extent the field of view of the animal is obstructed. Did the authors perform receptive field mapping or some other technique that can estimate the size of the animals' remaining field of view?

      Authors’ Response: The left eye is pointed down ~22.5 degrees, but we position the mouse near the left edge of the wheel to minimize the degree to which this limits their field of view. One may view our Fig. 1 and Suppl Movies 1 and 6 to see that the eyes on the left and right sides are unobstructed by the headpost, light shields, and support arms. However, other components of the experimental setup, such as the speaker, cameras, etc. can restrict a few small portions of the visual field, depending on their exact positioning.

      The facts that mice responded to left side visual stimuli in preliminary recordings during our multimodal 2-AFC task, and that the unobstructed left and right camera views, along with pupillometry recordings, showed that a significant portion of the mouse’s field of view, from either side, remains intact in our preparation.

      We have clarified these points in the text at ~lines 344-346, pg. 11.

      • Line 361: What does movie S7 show in this context? The movie seems to emphasize that the observed calcium dynamics are not driven by movement dynamics but it is not clear to me how this relates to the stimulation of PV neurons. The neural dynamics in the example cell are also not very clear. It would be helpful if this paragraph would contain some introduction/motivation for the optogenetic stimulation as it comes a bit out of the blue.

      Authors’ Response: This result was presented for two reasons.

      First, we showed it as a control for movement artifacts, since inhibition of neural activity enhances the relative prominence of non-activity dependent fluorescence that is used to examine the amplitude of movement-related changes in non-activity dependent fluorescence (e.g. movement artifacts). We have included a reference to this point at ~lines 587-588, pg 18.

      Second, we showed it as a demonstration of how one may combine optogenetics with imaging in mesoscopic 2-P imaging. References to this point were already present in the original version of the manuscript (the eLife “ reviewed preprint”).

      • Lines 362-370: This paragraph and some of the following text are quite technical and would benefit from a better description and motivation of the general workflow. I have trouble following what exactly is done here. Are the authors using an online method to identify the CCF location of the 2p imaging based on the vessel pattern? Why is it important to do this during the experiment? Wouldn't it be sufficient to identify the areas of interest based on the vessel pattern beforehand and then adjust the 2p acquisition accordingly? Why are they using a dial, shutter, and foot pedal and how does this relate to the working distance of the objective? Does the 'standardized cortical map' refer to the Allen common coordinate framework?

      Authors’ Response: We have revised this section to make it more clear.

      Currently, the general introduction to this section appears in lines 349-361. Starting in line 362, we currently present the technical considerations needed to implement the overall goals stated in that first paragraph of this section.

      In general we use a post-hoc analysis step to confirm the location of neurons recorded with 2-photon imaging. We use “online” juxtaposition of the multimodal map image with overlaid CCF with the 2-photon image by opening these two images next to each other on the ScanImage computer and matching the vasculature patterns “by eye”. We have made this more clear in the text so that the interested reader can more readily implement our methods.

      By use of the phrase “standardized cortical map” in this context, we meant to point out that we had not decided a priori to use the Allen CCF v3.0 when we started working on these issues.

      • Does Fig. 2c show an example of the online alignment between widefield and 2p data? I was confused here since the use of suite2p suggests that this was done post-recording. I generally didn't understand why the user needed to switch back and forth between the two modes. Doesn't the 2p image show the vessels already? Also, why was an additional motorized dichroic to switch between widefield and 2p view needed? Isn't this the standard in most microscopes (including the Thorlabs scopes)?

      Authors’ Response: We have explained this methodology more clearly in the revised manuscript, both at ~lines 485-500, pg 15-16, and ~lines 534-540, pg 17.

      The motorized dichroic we used replaced the motorized mirror that comes with the Thorlabs mesoscope. We switched to a dichroic to allow for near-simultaneous optogenetic stimulation with 470 nm blue light and 2-photon imaging, so that we would not have to move the mirror back and forth during live data acquisition (it takes a few seconds and makes an audible noise that we wanted to avoid).

      Figure 2c shows an overview of our two step “offline” alignment process. The image at the right in the bottom row labeled “2” is a map of recorded neurons from suite2p, determined post-hoc or after imaging. In Fig. 2d we show what the CCF map looks like when it’s overlaid on the neurons from a single suite2p session, using our alignment techniques. Indeed, this image is created post-hoc and not during imaging. In practice, “online” during imaging, we would have the image at left in the bottom row of Fig. 2c (i.e. the multimodal map image overlaid onto an image of the vasculature also acquired on the widefield rig, with the 22.5 degree rotated CCF map aligned to it based on the location of sensory responses) rotated 90 degrees to the left and flipped over a horizontal mirror plane so that its alignment matches that of the “online” 2-photon acquisition image and is zoomed to the same scale factor. Then, we would navigate based on vasculature patterns “by-eye” to the desired CCF areas, and confirm our successful 2-photon targeting of predetermined regions with our post-hoc analysis.

      • Why is the widefield imaging done through the skull under anesthesia? Would it not be easier to image through the final window when mice have recovered? Is the mapping needed for accurate window placement?

      Authors’ Response: The headpost and window surgeries are done 3-7 days apart to increase success rate and modularize the workflow. Multimodal mapping by widefield imaging is done through the skull between these two surgeries for two major reasons. First, to make efficient use of the time between surgeries. Second, to allow us to compare the multimodal maps to skull landmarks, such as bregma and lambda, for improved alignment to the CCF.

      Anesthesia was applied to prevent state changes and movements of the mouse, which can produce large, undesired effects on neural responses in primary sensory cortices in the context of these mapping experiments. We sometimes re-imaged multimodal maps on the widefield microscope through the window, roughly every 30-60 days or whenever/if significant changes in vasculature pattern became apparent.

      We have clarified these points in the main text at ~lines 510-522, pg 20-21, and we added a link to our new supplementary material documenting the changes observed in the window preparation over time:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      Thank you for these questions.

      • Lines 445 and below: Reducing the noise from resonant scanners is also very relevant for many other 2p experiments so it would be helpful to provide more general guidance on how to resolve this problem. Is the provided solution only applicable to the Thorlabs mesoscope? How hard would it be to adjust the authors' noise shield to other microscopes? I generally did not find many additional details on the Github repo and think readers would benefit from a more general explanation here.

      Authors’ Response: Our revised Github repository has been modified to include more details, including both diagrams and text descriptions of the sound baffle, respectively:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_for_noise_reduction_on_resonant_scanner_devices.pdf

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      However, we can not presently disclose our confidential provisional patent application. Complete design information will likely be available in early 2025 when our full utility patent application is filed.

      With respect to your question, yes, this technique is adaptable to any resonant scanner, or, for that matter, any complicated 3D surface that emits sound. We first 3D scan the surface, and then we reverse engineer a solid that fully encapsulates the surface and can be easily assembled in parts with bolts and interior foam that allow for a tight fit, in order to nearly completely block all emitted sound.

      It is this adaptability that has prompted us to apply for a full patent, as we believe this technique will be quite valuable as it may apply to a potentially large number of applications, starting with 2-photon resonant scanners but possibly moving on to other devices that emit unwanted sound.

      • Does line 458 suggest that the authors had to perform a 3D scan of the components to create the noise reduction shield? If so, how was this done? I don't understand the connection between 3D scanning and printing that is mentioned in lines 464-466.

      Authors’ Response: We do not want to release full details of the methodology until the full utility patent application has been submitted. However, we have now included a simplified text description of the process on our GitHub page and included a corresponding link in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      We also clarified in the main text, at the location that you indicate, why the 3D scanning is a critical part of our novel 3D-design, printing, and assembly protocol.

      • Lines 468 and below: Why is it important to align single-cell data to cortical areas 'directly on the 2-photon microscope'? Is this different from the alignment discussed in the paragraph above? Why not focus on data interpretation after data acquisition? I understand the need to align neural data to cortical areas in general, I'm just confused about the 'on the fly' aspect here and why it seems to be broken out into two separate paragraphs. It seems as if the text in line 485 and below could also be placed earlier in the text to improve clarity.

      Authors’ Response: Here by “such mapping is not routinely possible directly on the 2-photon mesoscope” what we mean is that it is not possible to do multimodal mapping directly on the mesoscope - it needs to be done on the widefield imaging rig (a separate microscope). Then, the CCF is mapped onto the widefield multimodal map, which is overlaid on an image of the vasculature (and sometimes also the skull) that was also acquired on the widefield imaging rig, and the vasculature is used as a sort of Rosetta Stone to co-align the 2-photon image to the multimodal map and then, by a sort of commutative property of alignment, to the CCF, so that each individual neuron in the 2-photon image can be assigned a unique CCF area name and numerical identifier for subsequent analysis.

      We have clarified this in the text, thank you.

      The Python code for aligning the widefield and 2-photon vessel images would also be of great value for regular 2p users. It would strongly improve the impact of the paper if the repository were better documented and the code would be equally applicable for alignment of imaging data with smaller cranial windows.

      Authors’ Response: All of the code for multimodal map, CCF, and 2-photon image alignment is, in fact, already present on the GitHub page. We have made some minor improvements to the documentation, and readers are more than welcome to contact us for additional help.

      Specifically, the alignment you refer to starts in cell #32 of the meso_pre_proc_1.ipynb notebook. In general the notebooks are meant to be run sequentially, starting with cell #1 of meso_pre_proc_1, then going to the next cell etc…, then moving to meso_pre_proc_2, etc… The purpose of each cell is labeled at the top of the cell in a comment.

      We now include a cleaned, abridged version of the meso_pre_proc_1.pynb notebook that contains only the steps needed for alignment, and included a direct link to this notebook in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/python_code/mesoscope_preprocess_MMM_creation.ipynb

      Rotated CCF maps are in the CCF map rotation folder, in subfolders corresponding to the angle of rotation.

      Multimodal map creation involves use of the SensoryMapping_Vickers_Jun2520.m script in the Matlab folder.

      We updated the main text to clarify these points and included direct links to scripts relevant to each processing step.

      • Figure 4a: I found it hard to see much of the structure in the Rastermap projection with the viridis colormap - perhaps also because of a red-green color vision impairment. Correspondingly, I had trouble seeing some of the structure that is described in the text or clearer differences between the neuron sortings to PC1 and PC2. Is the point of these panels to show that both PCs identify movement-aligned dynamics or is the argument that they isolate different movement-related response patterns? Using a grayscale colormap as used by Stringer et al might help to see more of the many fine details in the data.

      Authors’ Response: In Fig. 4a the viridis color range is from blue to green to yellow, as indicated in the horizontal scale bar at bottom right. There is no red color in these Rastermap projections, or in any others in this paper. Furthermore, the expanded Rastermap insets in Figs. S4 and S5 provide additional detailed information that may not be clear in Fig 4a and Fig 5a.

      We prefer, therefore, not to change these colormaps, which we use throughout the paper.

      We have provided grayscale png versions of all figures on our GitHub page:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/grayscale_figures

      In Fig 4a the point of showing both the PC1 and PC2 panels is to demonstrate that they appear to correspond to different aspects of movement (PC1 more to transient walking, both ON and OFF, and PC2 to whisking and sustained ON walk/whisk), and to exhibit differential ability to identify neurons with positive and negative correlations to arousal (PC1 finds both, both PC2 seems to find only the ON neurons).

      We now clarify this in the text at ~lines 696-710, pg 22.

      • I find panel 6a a bit too hard to read because the identification and interpretation of the different motifs in the different qualitative episodes is challenging. For example, the text mentions flickering into motif 13 during walk but the majority of that sequence appears to be shaped by what I believe to be motif 11. Motif 11 also occurs prominently in the oscillate state and the unnamed sequence on the left. Is this meaningful or is the emphasis here on times of change between behavioral motifs? The concept of motif flickering should be better explained here.

      Authors’ Response: Here motif 13 corresponds to a syllable that might best be termed “symmetric and ready stance”. This tends to occur just before and after walking, but also during rhythmic wheel balancing movements that appear during the “oscillate” behavior.

      The intent of Fig. 6a is to show that each qualitatively identified behavior (twitch, whisk, walk, and oscillate) corresponds to a period during which a subset of BSOiD motifs flicker back and forth, and that the identity of motifs in this subset differs across the identified qualitative behaviors. This is not to say that a particular motif occurs only during a single identified qualitative behavior. Admittedly, the identification of these qualitative behaviors is a bit arbitrary - future versions of BSOiD (e.g. ASOiD) in fact combine supervised (i.e. arbitrary, top down) and unsupervised (i.e. algorithmic, objective, bottom-up) methods of behavior segmentation in attempt to more reliably identify and label behaviors.

      Flickering appears to be a property of motif transitions in raw BSOiD outputs that have not been temporally smoothed. If one watches the raw video, it seems that this may in fact be an accurate reflection of the manner in which behaviors unfold through time. Each behavior could be thought of, to use terminology from MOSEQ (B Datta), as a series of syllables strung together to make a phrase or sentence. Syllables can repeat over either fast or slow timescales, and may be shared across distinct words and sentences although the order and frequency of their recurrence will likely differ.

      We have clarified these points in the main text at ~lines 917-923, pg 29, and we added motif 13 to the list of motifs for the qualitative behavior labeled “oscillate” in Fig. 6a.

      • Lines 997-998: I don't understand this argument. Why does the existence of different temporal dynamics make imaging multiple areas 'one of the keys to potentially understanding the nature of their neuronal activity'?

      Authors’ Response: We believe this may be an important point, that comparisons of neurobehavioral alignment across cortical areas cannot be performed by pooling sessions that contain different distributions of dwell times for different behaviors, if in fact that dependence of neural activity on behavior depends on the exact elapsed time since the beginning of the current behavioral “bout”. Again, other reasons that imaging many areas simultaneously would provide a unique advantage over imaging smaller areas one at a time and attempting to pool data across sessions would include the identification of sequences or neural ensembles that span many areas across large distances, or the understanding of distributed coding of behavior (an issue we explore in an upcoming paper).

      We have clarified these points at the location in the Discussion that you have identified. Thank you for your questions and suggestions.

      Minor

      Line 41: What is the difference between decision, choice, and response periods?

      Authors’ Response: This now reads “...temporal separation of periods during which cortical activity is dominated by activity related to stimulus representation, choice/decision, maintenance of choice, and response or implementation of that choice.”

      Line 202: What does ambulatory mean in this context?

      Authors’ Response: Here we mean that the mice are able to walk freely on the wheel. In fact they do not actually move through space, so we have changed this to read “able to walk freely on a wheel, as shown in Figs. 1a and 1b”.

      Is there a reason why 4 mounting posts were used for the dorsal mount but only 1 post was sufficient for the lateral mount?

      Authors’ Response: Here, we assume you mean 2 posts for the side mount and 4 posts for the dorsal mount.

      In general our idea was to use as many posts as possible to provide maximum stability of the preparations and minimize movement artifacts during 2-photon imaging. However, the design of the side mount headpost precluded the straight-forward or easy addition of a right oriented, second arm to its lateral/ventral rim - this would have blocked access of both the 2-photon objective and the right face camera. In the dorsal mount, the symmetrical headpost arms are positioned further back (i.e. posterior), so that the left and right face cameras are not obscured.

      When we created the side mount preparation, we discovered that the 2 vertical 1” support posts were sufficient to provide adequate stability of the preparation and minimize 2-photon imaging movement artifacts. The side mount used two attachment screws on the left side of the headpost, instead of the one screw per side used in the dorsal mount preparation.

      We have included these points/clarifications in the main text at ~lines 217-230, pg 7.

      Figure S1g appears to be mislabeled.

      Authors’ Response: Yes, on the figure itself that panel was mislabeled as “f” in the original eLife reviewed preprint. We have changed this to read “g”.

      Line 349 and below: Why is the method called pseudo-widefield imaging?

      Authors’ Response: On the mesoscope, broad spectrum fluorescent light is passed through a series of excitation and emission filters that, based on a series of tests that we performed, allow both reflected blue light and epifluorescence emitted (i.e. Stokes-shifted) green light to reach the CCD camera for detection. Furthermore, the CCD camera (Thorlabs) has a much smaller detector chip than that of the other widefield cameras that we use (RedShirt Imaging and PCO), and we use it to image at an acquisition speed of around 10 Hz maximum, instead of ~30-50 Hz, which is our normal widefield imaging acquisition speed (it also has a slower readout than what we would consider to be a standard or “real” 1-photon widefield imaging camera).

      For these 3 reasons we refer to this as “pseudo-widefield” imaging. We would not use this for sensory activity mapping on the mesoscope - we primarily use it for mapping cortical vasculature and navigating based on our multimodal map to CCF alignment, although it is actually “contaminated” with some GCaMP6s activity during these uses.

      We have briefly clarified this in the text.

      Figures 4d & e: Do the colors show mean correlations per area? Please add labels and units to the colorbars as done in panel 4a.

      Authors’ Response: For both Figs 4 and 5, we have added the requested labels and units to each scale bar, and have relabeled panels d to say “Rastermap CCF area cell densities”, and panels e to say “mean CCF area corrs w/ neural activity.”

      Thank you for catching these omissions/mislabelings.

      Line 715: what is superneuron averaging?

      Authors’ Response: This refers to the fact that when Rastermap displays more than ~1000 neurons it averages the activity of each group of adjacent 50 neurons in the sorting to create a single display row, to avoid exceeding the pixel limitations of the display. Each single row representing the average activity of 50 neurons is called a “superneuron” (Stringer et al, 2023; bioRxiv).

      We have modified the text to clarify this point.

      Line 740: it would be good to mention what exactly the CCF density distribution quantifies.

      Authors’ Response: In each CCF area, a certain percentage of neurons belongs to each Rastermap group. The CCF density distribution is the set of these percentages, or densities, across all CCF areas in the dorsal or side mount preparation being imaged in a particular session. We have clarified this in the text.

      Line 745: what does 'within each CCF' mean? Does this refer to different areas?

      Authors’ Response: The corrected version of this sentence now reads: “Next, we compared, across all CCF areas, the proportion of neurons within each CCF area that exhibited large positive correlations with walking speed and whisker motion energy.”

      How were different Rastermap groups identified? Were they selected by hand?

      Authors’ Response: Yes, in Figs. 4, 5, and 6, we selected the identified Rastermap groups “by hand”, based on qualitative similarity of their activity patterns. At the time, there was no available algorithmic or principled means by which to split the Rastermap sort. The current, newer version of Rastermap (Stringer et al, 2023) seems to allow for algorithmic discretization of embedding groups (we have not tested this yet), but it was not available at the time that we performed these preliminary analyses.

      In terms of “correctness” of such discretization or group identification, we intend to address this issue in a more principled manner in upcoming publications. For the purposes of this first paper, we decided that manual identification of groups was sufficient to display the capabilities and outcomes of our methods.

      We clarify this point briefly at several locations in the revised manuscript, throughout the latter part of the Results section.

      Reviewer #3 (Recommendations For The Authors):

      In "supplementary figures, protocols, methods, and materials", Figure S1 g is mislabeled as Figure f.

      Authors’ Response: Yes, on the figure itself this panel was mislabeled as “f” in the original reviewed preprint. We have changed this to read “g”.

      In S1 g, the success rate of the surgical procedure seems quite low. Less than 50% of the mice could be imaged under two-photon. Can the authors elaborate on the criteria and difficulties related to their preparations?

      Authors’ Response: We will elaborate on the difficulties that sometimes hinder success in our preparations in the revised manuscript.

      The success rate indicated to the point of “Spontaneous 2-P imaging (window) reads 13/20, which is 65%, not 50%. The drop to 9/20 by the time one gets to the left edge of “Behavioral Training” indicates that some mice do not master the task.

      Protocol I contains details of the different ways in which mice either die or become unsuitable or “unsuccessful” at each step. These surgeries are rather challenging - they require proper instruction and experience. With the current protocol, our survival rate for the window surgery alone is as high as 75-100%. Some mice can be lost at headpost implantation, in particular if they are low weight or if too much muscle is removed over the auditory areas. Finally, some mice survive windowing but the imageable area of the window might be too small to perform the desired experiment.

      We have added a paragraph detailing this issue in the main text at ~lines 287-320, pg 9.

      In both Suppl_Movie_S1_dorsal_mount and Suppl_Movie_S1_side_mount provided (Movie S1), the behaviour video quality seems to be unoptimized which will impact the precision of Deeplabcut. As evident, there were multiple instances of mislabeled key points (paws are switched, large jumps of key points, etc) in the videos.

      Many tracked points are in areas of the image that are over-exposed.

      Despite using a high-speed camera, motion blur is obvious.

      Occlusions of one paw by the other paws moving out of frame.

      As Deeplabcut accuracy is key to higher-level motifs generated by BSOi-D, can the authors provide an example of tracking by exclusion/ smoothing of mislabeled points (possibly by the median filtering provided by Deeplabcut), this may help readers address such errors.

      Authors’ Response: We agree that we would want to carefully rerun and carefully curate the outputs of DeepLabCut before making any strong claims about behavioral identification. As the aim of this paper was to establish our methods, we did not feel that this degree of rigor was required at this point.

      It is inevitable that there will be some motion blur and small areas of over-exposure, respectively, when imaging whiskers, which can contain movement components up to ~150 Hz, and when imaging a large area of the mouse, which has planes facing various aspects. For example, perfect orthogonal illumination of both the center of the eye and the surface of the whisker pad on the snout would require two separate infrared light sources. In this case, use of a single LED results in overexposure of areas orthogonal to the direction of the light and underexposure of other aspects, while use of multiple LEDs would partially fix this problem, but still lead to variability in summated light intensity at different locations on the face. We have done our best to deal with these limitations.

      We now briefly point out these limitations in the methods text at ~lines 155-160, pg 5.

      In addition, we have provided additional raw and processed movies and data related to DeepLabCut and BSOiD behavioral analysis in our FigShare+ repository, which is located at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      In lines 153-154, the authors mentioned that the Deeplabcut model was trained for 650k iterations. In our experience (100-400k), this seems excessive and may result in the model overfitting, yielding incorrect results in unseen data. Echoing point 4, can the authors show the accuracy of their Deeplabut model (training set, validation set, errors, etc).

      Authors’ Response: Our behavioral analysis is preliminary and is included here as an example of our methods, and not to make claims about any specific result. Therefore we believe that the level of detail that you request in our DeepLabCut analysis is beyond the scope of the current paper. However, we would like to point out that we performed many iterations of DeepLabCut runs, across many mice in both preparations, before converging on these preliminary results. We believe that these results are stable and robust.

      We believe that 650k iterations is within the reasonable range suggested by DLC, and that 1 million iterations is given as a reasonable upper bound. This seems to be supported by the literature for example, see Willmore et al, 2022 (“Behavioral and dopaminergic signatures of resilience”, Nature, 124:611, 124-132). Here, in a paper focused squarely on behavioral analysis, DLC training was run with 1.3 million iterations with default parameters.

      We now note, on ~lines 153-154, pg 5, that we used 650K iterations, a number significantly less than the default of 1.03 million, to avoid overfitting.

      In lines 140-141, the authors mentioned the use of slicing to downsample their data. Have any precautions, such as a low pass filter, been taken to avoid aliasing?

      Authors’ Response: Most of the 2-photon data we present was acquired at ~3 Hz and upsampled to 10 Hz. Most of the behavioral data was downsampled from 5000 Hz to 10 Hz by slicing, as stated. We did not apply any low-pass filter to the behavioral data before sampling. The behavioral variables have heterogeneous real sampling/measurement rates - for example, pupil diameter and whisker motion energy are sampled at 30 Hz, and walk speed is sampled at 100 Hz. In addition, the 2-photon acquisition rate varied across sessions.

      These facts made principled, standardized low-pass filtering difficult to implement. We chose rather to use a common resampling rate of 10 Hz in an unbiased manner. This downsampled 10 Hz rate is also used by B-SOiD to find transitions between behavioral motifs (Hsu and Yttri, 2021).

      We do not think that aliasing is a major factor because the real rate of change of our Ca2+ indicator fluorescence and behavioral variables was, with the possible exception of whisker motion energy, likely at or below 10 Hz.

      We now include a brief statement to this effect in the methods text at ~lines 142-146, pg. 4.

      Line 288-299, the authors have made considerable effort to compensate for the curvature of the brain which is particularly important when imaging the whole dorsal cortex. Can the authors provide performance metrics and related details on how well the combination of online curvature field correction (ScanImage) and fast-z "sawtooth"/"step" (Sofroniew, 2016)?

      Authors’ Response: We did not perform additional “ground-truth” experiments that would allow us to make definitive statements concerning field curvature, as was done in the initial eLife Thorlabs mesoscope paper (Sofroniew et al, 2016).

      We estimate that we experience ~200 micrometers of depth offset across 2.5 mm - for example, if the objective is orthogonal to our 10 mm radius bend window and centered at the apex of its convexity, a small ROI located at the lateral edge of the side mount preparation would need to be positioned around 200 micrometers below that of an equivalent ROI placed near the apex in order to image neurons at the same cortical layer/depth, and would be at close to the same depth as an ROI placed at or near the midline, at the medial edge of the window. We determined this by examining the geometry of our cranial windows, and by comparing z-depth information from adjacent sessions in the same mouse, the first of which used a large FOV and the second of which used multiple small FOVs optimized so that they sampled from the same cortical layers across areas.

      We have included this brief explanation in the main text at ~lines 300-311, pg 9.

      In lines 513-515, the authors mentioned that the vasculature pattern can change over the course of the experiment which then requires to re-perform the realignment procedure. How stable is the vasculature pattern? Would laser speckle contrast yield more reliable results?

      Authors’ Response: In general the changes in vasculature we observed were minimal but involved the following: i) sometimes a vessel was displaced or moved during the window surgery, ii) sometimes a vessel, in particular the sagittal sinus, enlarged or increased its apparent diameter over time if it is not properly pressured by the cranial window, and iii) sometimes an area experiencing window pressure that is too low could, over time, show outgrowth of fine vascular endings. The most common of these was (i), and (iii) was perhaps the least common. In general the vasculature was quite stable.

      We have added this brief discussion of potential vasculature changes after cranial window surgery to the main text at ~lines 286-293, pg 9.

      We already mentioned, in the main text of the original eLife reviewed preprint, that we re-imaged the multimodal map (MMM) every 30-60 days or whenever changes in vasculature are observed, in order to maintain a high accuracy of CCF alignment over time. See ~lines 507-511, pg 16.

      We are not very familiar with laser speckle contrast, and it seems like a technique that could conceivably improve the fine-grained accuracy of our MMM-CCF alignment in some instances. We will try this in the future, but for now it seems like our alignments are largely constrained by several large blood vessels present in any given FOV, and so it is unclear how we would incorporate such fine-grained modifications without applying local non-rigid manipulations of our images.

      In lines 588-598, the authors mentioned that the occasional use of online fast-z corrections yielded no difference. However, it seems that the combination of the online fast-z correction yielded "cleaner" raster maps (Figure S3)?

      Authors’ Response: The Rastermaps in Fig S3a and b are qualitatively similar. We do not believe that any systematic difference exists between their clustering or alignments, and we did not observe any such differences in other sessions that either used or didn’t use online fast-z motion correction.

      We now provide raw data and analysis files corresponding to the sessions shown in Fig S3 (and other data-containing figures) on FigShare+ at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      Ideally, the datasets contained in the paper should be available on an open repository for others to examine. I could not find a clear statement about data availability. Please include a linked repo or state why this is not possible.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here:

      Vickers, Evan; A. McCormick, David (2024). Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice. Figshare+. Collection:

      https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Public Review

      Summary:

      (1) This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid.

      (2) It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      (3) The primary strength is in applying a biomechanical model to omega-turn behaviors.

      (4) The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling.

      (5) The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      (6) The strength of the model presented in this work relative to prior approaches is not well supported, and in general, the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion.

      (7) This paper claims to improve on previous approaches to taking body shapes as inputs.

      (8) However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data.

      (9) Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed.

      (10) Finally, the overall novelty of the approach is questionable.

      (11) A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      9-11: The paper you recommended and our manuscript have some similarities and differences.

      Similarities

      Firstly, the components constituting the worm are similar in both models. ElegansBot models the worm as a chain of n rods, while the study by Majmudar et al. (2012) models it as a chain of n beads. Each bead in the Majmudar et al. model has a directional vector, making it very similar to ElegansBot's rod. However, there's a notable difference: in the Majmudar et al. model, each bead has an area for detecting contact between the obstacle and the bead, while in ElegansBot, the rod does not feature such an area.

      Secondly, the types of forces and torques acting on the components constituting the worm are similar. Each rod in ElegansBot receives frictional force, muscle force, and joint force. Each bead in the Majmudar et al. model receives a constraint force, viscous force, and a repulsive force from obstacles. Each rod in ElegansBot receives frictional torque, muscle torque, and joint torque. Each bead in the Majmudar et al. model receives elastic torque, constraint torque, drive torque, and viscous torque. The Majmudar et al. model's constraint force and torque are similar to ElegansBot's joint force and torque in that they prevent two connected components of the worm from separating. The Majmudar et al. model's viscous force and torque are similar to ElegansBot's frictional force and torque in that they are forces exchanged between the worm and its surrounding environment (ground surface). The Majmudar et al. model's drive torque is similar to ElegansBot's muscle force and muscle torque as a cause of the worm's motion. However, unlike ElegansBot, the Majmudar et al. model did not consider the force generating the drive torque, and there are differences in how each force and torque is calculated. This will be discussed in more detail below.

      Differences

      Firstly, the medium in which the worm locomotes is different. ElegansBot is a model describing motion in a homogeneous medium like agar or water without obstacles, while the Majmudar et al. model describes motion in water with circular obstacles fixed at each lattice point. This is because the purposes of the models are different. ElegansBot analyzes locomotion patterns based on the friction coefficient, while the Majmudar et al. model analyzes locomotion patterns based on the characteristics of the obstacle lattice, such as the distance between obstacles. Also, for this reason, the Majmudar et al. model's bead, unlike ElegansBot's rod, receives a repulsive force from obstacles.

      Secondly, the specific methods of calculating similar types of forces differ. ElegansBot calculates joint forces by substituting frictional forces, muscle forces, frictional torques, and muscle torques into an equation derived from differentiating a boundary condition equation twice over time, where two neighboring rods always meet at one point. This involves determining the process through which various forces and torques are transmitted across the worm. Specifically, it entails calculating how the frictional forces and torques, as well as the muscle forces and torques acting on each rod, are distributed throughout the entire length of the worm. In contrast, The Majmudar et al. model uses Lagrange multipliers method based on a boundary condition that the curve length determined by each bead's tangential angle does not change, to calculate the constraint force and torque before calculating the drive torque and viscous force. This implies that the Majmudar et al. model did not consider the mechanism by which the drive torque and viscous force received by one bead are distributed throughout the worm. ElegansBot's rod receives an anisotropic Stokes frictional force from the ground surface, while the Majmudar et al. model considered the frictional force according to the Navier-Stokes equation for incompressible fluid, assuming the fluid velocity at the bead's location as the bead's velocity.

      Thirdly, unlike the Majmudar et al. model, ElegansBot considers the inertia of the worm components. Therefore, ElegansBot can simulate regardless of how low or high the ground surface's friction coefficient is. the Majmudar et al. model is not like this.

      (12) The idea of applying biomechanical models to describe omega turns in C. elegans is a good one, however, the kinematic basis of the model as used in this paper (the authors do note that the control angle could be connected to a neural model, but don't do so in this work) limits the generation of neuromechanical control hypotheses.

      8, 12: We do not agree with the claim that ElegansBot could limit other researchers in generating neuromechanical control hypotheses. The term θ_("ctrl" ,i)^((t) ) used in our model is designed to be replaceable with neuromechanical control in the future.

      (13) The model may provide insights into the biomechanics of such behaviors, however, the results described are very minimal and are purely qualitative.

      (14-1) Overall, direct comparisons to the experiments are lacking or unclear.

      14-1: If you look at the text explaining Fig. 2 and 5 (Fig. 2 and 4 in old version), it directly compares the velocity, wave-number, and period as numerical indicators representing the behavior of the worm, between the experiment and ElegansBot.

      (14-2) Furthermore, the paper claims the value of the model is to produce the force fields from a given body shape, but the force fields from omega turns are only pictured qualitatively.

      13, 14-2: We gratefully accept the point that our analysis of the omega-turn is qualitative. Therefore, we have conducted additional quantitative analysis on the omega-turn and inserted the results into the new Fig. 4. We have considered the term 'Force field' as referring to the force vector received by each rod. We have created numerical indicators representing various behaviors of the worm and included them in the revised manuscript.

      (15) No comparison is made to other behaviors (the force experienced during crawling relative to turning for example might be interesting to consider) and the dependence of the behavior on the model parameters is not explored (for example, how does the omega turn change as the drag coefficients are changed).

      Thank you for the great idea. To compare behaviors, first, a clear criterion for distinguishing behaviors is needed. Therefore, we have created a new mathematical definition for behavior classification in the revised manuscript (“Defining Behavioral Categories” in Method). After that, we compared the force and power (energy consuming rate) between each forward locomotion, backward locomotion, and omega-turn (Fig. 4). And in the revised manuscript, we newly analyzed how the turning behavior changes with variations in the friction coefficients in Figs. S4-S7.

      (16) If the purpose of this paper is to recapitulate the swim-to-crawl transition with a simple model, and then apply the model to new behaviors, a more detailed analysis of the behavior of the model variables and their dependence on the variables would make for a stronger result.

      In our revised manuscript, we have quantitatively analyzed the changes occurring in turning behavior from water to agar, and the results are presented in Figs. S9 and S10.

      (17) In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest.

      (18) Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored.

      (19) Plate conditions are difficult to replicate and the rheology of plates likely depends on a number of factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      18, 19: As mentioned in the paper, we do not know if the friction coefficients in the study of Boyle et al. (2012) and the friction coefficients in the experiment of Stephens et al. (2016) are the same. In our revised manuscript, we have explored more in detail the effects of the friction coefficient's scale factor, and explained why we chose a scale factor of 1/100 (“Proper Selection of Friction Coefficients” in Supplementary Information). In summary, we analyzed the changes in trajectory due to scaling of the friction coefficient, and chose the scale factor 1/100 as it allowed ElegansBot to accurately reproduce the worm's trajectory while also being close to the friction coefficients in the Boyle et al. paper.

      (20) Finally, the language used to distinguish different modeling approaches was often unclear.

      (21) For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Thank you for the feedback. As you pointed it out, we have corrected that part to 'kinematic' in the revised manuscript.

      (22) Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

      We agree that the expression may not be immediately clear. This is due to the word limit for the abstract (the abstract of eLife VOR should be under 200 words, and our paper's abstract is 198 words), which forced us to convey the causality in a limited number of words. Therefore, although we will not change the abstract, the expression in question means that the muscle tension, which is the cause of the worm's locomotion, ultimately generates the frictional force between the worm and the ground surface.

      Recommendations For The Authors

      (23) As I stated in my public review, I think the paper could be made much stronger if a more detailed exploration of turning mechanics was presented.

      (24) Relatedly, rather than restricting the analysis to individual videos of turning behaviors, I wonder if a parameterized model of the turning kinematics would be fruitful to study, to try to understand how different turning gaits might be more or less energetically favorable.

      We thank the reviewer once again for their suggestion. Thanks to their proposal, we were able to conduct additional quantitative analysis on turning behavior.

      Reviewer #2

      Public Review

      Summary:

      (1) Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at a low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface.

      (2) The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motiliy behavior.

      Strengths: (3) The model is general due to its simplicity and likely useful for various undulatory movements.

      (4) The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc).

      (5) The model is predictive (semi?) as shown in the liquid-to-solid gait transition.

      (6) The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Weaknesses:

      (7) Since the inputs to the model are the actual shape changes in time, parameterized as angles (or curvature), the ability of the model to reproduce a realistic facsimile of C. elegans motion is not really a huge surprise. (8) The authors do not include some important physical parameters in the model and should explain in the text these assumptions.

      (9. 1) The cuticle stiffness is significant and has been measured [1].

      (10. 2) The body of C. elegans is under high hydrostatic pressure which adds an additional stiffness [2].

      (11. 3) The visco-elasticity of C. elegans body has been measured. [3]

      Thank you for asking. The stiffness of C. elegans is an important consideration. We took this into account when creating ElegansBot, but did not explain it in the paper. The detailed explanation is as follows. C. elegans indeed has stiffness due to its cuticle and internal pressure. This stiffness is treated as a passive elastic force (elastic force term of lateral passive body force) in the paper of Boyle et al. (2012). However, the maximum spring constant of the passive elastic force is 1/20 of the maximum spring constant of the active elastic force. If we consider this fact in our model, the elastic term of the muscle torque is as follows: ( is the active torque elasticity coefficient, is the passive torque elasticity coefficient)

      where

      Therefore, there is no need to describe the active and passive terms separately in

      Furthermore, since , assuming , then and .

      (12) There is only a very brief mention of proprioception.

      (13) The lack of inclusion of proprioception in the model should be mentioned and referenced in more detail in my opinion.

      As you emphasized, proprioception is an important aspect in the study of C. elegans' locomotion. In our paper, its importance is briefly introduced with a sentence each in the introduction and discussion. However, our research is a model about the process of the creation of body motion originated from muscle forces, and it does not model the sensory system that senses body posture. Therefore, there is no mention of using proprioception in our paper's results section. What is mentioned in the discussion is that ElegansBot can be applied as the kinetic body model part in a combination model of a kinetic body model and a neuronal circuit model that receives proprioception as a sensory signal.

      (14) These are just suggested references.

      (15) There may be more relevant ones available.

      The papers you provided contain specific information about the Young's modulus of the C. elegans body. The first paper (Rahimi et al., 2022) measured the Young's modulus of the cuticle after chemically isolating it from C. elegans, while the second paper (Park et al., 2007) and third paper (Backholm et al., 2013) measured the elasticity and Young's modulus of C. elegans without separating the cuticle. Based on the Young's modulus provided in each paper (although the second and third papers did not measure stiffness in the longitudinal direction), we derived the elastic coefficient (assuming a worm radius of 25 μm, cuticle thickness of 0.5 μm, and 1/25 of longitudinal length of the cuticle of 40 μm). The range was quite broad, from 9.82ⅹ1011 μg/sec2 (from the first paper) to 2.16 ⅹ 108 μg / sec2 (from the third paper). Although the elastic coefficient value in our paper falls within this range, since the range of the elastic coefficient is wide, we think we can modify the elastic coefficient in our paper and will be able to reapply our model if more accurate values become known in the future.

      Reviewer #3

      Public Review

      Summary:

      (1) A mechanical model is used with input force patterns to generate output curvature patterns, corresponding to a number of different locomotion behaviors in C. elegans

      Strengths:

      (2) The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths.

      (3) The matching of speeds (though qualitative and shown only on agar) is a strength.

      Weaknesses:

      (4) What is the relation between input and output data?

      ElegansBot takes the worm's body control angle as the input, and produces trajectory and force of each segment of the worm as the output.

      (5) How does the input-output relation depend on the parameters of the model?

      If 'parameter' is understood as vertical and horizontal friction coefficients, then the explanation for this can be found in Fig. 5 (Fig. 4 in the old version).

      (6) What biological questions are addressed and can significant model predictions be made?

      Equation of motion deciphering locomotion of C. elegans including turning behaviors which were relatively less well understood.

      Recommendations For The Authors

      (7) The novelty and significance of the paper should be clarified.

      We have added quantitative analyses of turning behavior in the revised manuscript, and we hope this will be helpful to you.

      (8) Previously much more detailed models have been published, as compared to this one.

      We hope the reviewer can point out any previous model that we may have missed.

      (9) The mechanics here are simplified (e.g. no information about dorsal/ventral innervation but only a bending angle) setting limitations on the capacity for model predictiveness.

      (10) Such limitations should be discussed.

      We view the difference between dorsal/ventral innervation and bending angle not as a matter of simplification, but rather as a reflection of the hierarchy that our model implements. Our model does not consider dorsal/ventral innervation, but it uses the bending angle to reproduce behavior in various input and frictional environments, which signifies the strong predictiveness of ElegansBot (Figure 2, 3, 5 (2, 3, 4 in the old version)). Moreover, if the midline of C. elegans is incompressible, then modeling by dividing into dorsal/ventral, as opposed to modeling solely with the bending angle, does not increase the degree of freedom of the worm model, and therefore does not increase its predictiveness.

      (11) The aims of the paper and results need to be supported quantitatively and analyzed through parameter sweeps and intervention.

      We have conducted additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

      (12) The methods are given only in broad brushstrokes, and need to be much more clear (and ideally sharing all code).

      We have thoroughly detailed every aspect of this research, from deriving the physical constants of C. elegans, agar, and water to developing the formulas and proofs necessary for operating ElegansBot and its applications. This comprehensive information is all presented in the Results, Methods, and Supplementary Information sections, as well as in the source code. Moreover, we have already ensured that our research can be easily reproduced by providing detailed explanations and by making ElegansBot accessible through public software databases (PyPI, GitHub). To further aid in its application and understanding, especially for those less familiar with the subject, we have also included minimal code as examples in the database. This code is designed to simplify the process of reproducing the results of the paper, thereby making our research more accessible and understandable. Therefore, we believe that readers will easily gain significant assistance from the extensive information we have provided. Should readers require further help, they can always contact us, and we will be readily available to offer support.

      (13) The supporting figures and movies need to include a detailed analysis to evidence the claims.

      We have conducted and provided additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      This manuscript provides some valuable findings concerning the hippocampal circuitry and the potential role of adult-born granule cells in an interesting long-term social memory retrieval. The behavior experiments and strategy employed to understand how adult-born granule cells contribute to long-term social discrimination memory are interesting.

      We thank the reviewer for the positive evaluation.

      I have a few concerns, however with the strength of the evidence presented for some of the experiments. The data presented and the method described is incomplete in describing the connection between cell types in CA2 and the projections from abGCs. Likewise, I worry about the interpretation of the data in Figures 1 and 2 given the employed methodology. I think that the interpretation should be broadened. This second concern does not impact the interest and significance of the findings.

      In response to this concern, we have removed the data concerning abGC projections to PCP4+ and PV-GFP+ cell bodies from Figure 1 and have focused this analysis on dendrites. We now provide high magnification images of dendrites and expand on the methodology, results, and interpretations in the manuscript. We also broaden the interpretation throughout the manuscript to address the reviewer’s concern.

      Strengths:

      The behavior experiments are beautifully designed and executed. The experimental strategy is interesting.

      We appreciate these positive comments.

      Weaknesses:

      The interpretation of the results may not be justified given the methods and details provided.

      We have addressed this concern by providing more methodological details and broadening our interpretation of the results.

      Reviewer #2:

      Summary:

      Laham et al. investigate how the projection from adult-born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. I find the study to be very interesting, the results are important for our understanding of how social memories of different natures (remote or immediate) are encoded and supported by the hippocampal circuitry. I have some points that I added below that I think could help clarify the conclusions:

      We appreciate the positive assessment and have addressed the more specific points below.

      My major concern with the manuscript was that making the transitions between the different experiments for each result section is not very smooth. Maybe they can discuss a bit in a summary conclusion sentence at the end of each result section why the next set of experiments is the most logical step.

      In response, we have added summary conclusion sentences at the end of each result section.

      In line 113, the authors say that "the DG is known to influence hippocampal theta-gamma coupling and SWRs". Another recent study Fernandez-Ruiz et al. 2021, examined how various gamma frequencies in the dentate gyrus modulate hippocampal dynamics.

      We cite this paper in the revised manuscript.

      Having no single cells in the electrophysiological recordings makes it difficult to interpret the ephys part. Perhaps having a discussion on this would help interpret the results. If more SWRs are produced from the CA2 region (perhaps aided by projections from abGC), more CA2 cells that respond to social stimuli (Oliva et al. 2020) would reactivate the memories, therefore making them consolidate faster/stronger. On the other hand, the projections from abGC that the authors see, also target a great deal of PV+ interneurons, which have been shown to pace the SWRs frequency (Stark et al 2014, Gan et al 2017), which further suggests that this projection could be involved in SWRs modulation.

      We discuss these possibilities and cite Gan et al 2017, Schlingloff et al., 2014, and Stark et al., 2014 in the revised manuscript.

      The authors should cite and discuss Shuo et al., 2022 (A hypothalamic novelty signal modulates hippocampal memory).

      We mention Chen et al (A hypothalamic novelty signal modulates hippocampal memory.) in the revised manuscript. “Shuo” is the first name of the first author on this paper, so we believe that this is the same paper to which the reviewer refers.

      I think the authors forgot to refer to Fig 3a-f, maybe around lines 163-168.

      We thank the reviewer for pointing out this error. In the revised manuscript, we refer to all figure panels. Since Fig 3 is now broken into two figures (Fig 3 and 4), the panel lettering has changed in the revised manuscript.

      Are the SWRs counted only during interaction time or throughout the whole behavior session for each condition?

      The SWRs are counted throughout the whole behavior session for each condition. This is now stated in the revised manuscript.

      Figure 3t shows a shift in the preferred gamma phase within theta cycles as a result of abGC projections to CA2 ablation with CNO, especially during Mother CNO condition. I think this result is worth mentioning in the text.

      We now mention this finding in the revised manuscript.

      Figure 3u in the legend mention "scale bars = 200um", what does this refer to?

      The scale bar refers to that shown in Figure 3b, which is now indicated in the legend.

      What exactly is calculated as SWR average integral? Is it a cumulative rate? Please clarify.

      The integral measure provides information regarding the average total power of SWR events. It sums z-scored amplitude values from beginning to the end of each SWR envelope, and then takes the average across all summed envelopes. SWR integral has been shown to influence SWR propagation (De Filippo and Schmitz, 2023). This is now described in the text.

      Alexander et al 2017, "CA2 neuronal activity controls hippocampal oscillations and social behavior", examined some of the CA2 effects in the hippocampal network after CNO silencing, and the authors should cite it.

      Alexander et al., 2018, which we believe is the relevant paper, is now cited in the revised manuscript.

      Strengths:

      Behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      Electrophysiological experiments are difficult to interpret without additional quantifications (single-cell responses during interactions etc.)

      We have addressed this concern by expanding the interpretation of our results.

      Reviewer #3:

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing, and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused on the social memory of the mother, and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation, and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking in the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      In response to these comments, we provide raw interaction times in a new Figure (Fig. S1). We also provide more information about the experiments and figures in the revision. We explain the rationale for our behavioral interpretations and discuss proposed mechanisms for how abGCs regulate SWR and PAC.

      The manuscript is well-written with the appropriate references. The choice of the behavioral test is somewhat debatable, however. It is surprising that the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternation) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthen the results if the authors could repeat a key experiment from their investigation using such a test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval would allow us to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      We selected the direct social interaction test because it allows for more naturalistic social behaviors than measuring investigation times toward social stimuli located inside wire mesh containers. We also decided to focus our studies on the retrieval of mother memories because these are likely the first social memories to be formed. We emphasize that our results cannot be generalized to memories of other social stimuli but given studies on recent social memory formation and retrieval in adults that manipulate abGCs and CA2 separately, we feel that it is likely that this circuit is involved in these functions as well. However, we specify throughout the manuscript that our experiments can only tell us about mother memories. We have also changed the title to reflect this.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In Figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At a minimum, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      We have divided Figure 3 into two figures (Figures 3 and 4) and revised the electrophysiology section of the results section. In the revised paper, we now discuss how abGC projections to PV+ interneurons may facilitate SWR and PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further, and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

      In response to these comments, we discuss possible answers to these interesting questions.

      Recommendations for the authors:

      Reviewer #1:

      Specifically, in Figure 1, for the analysis of the synapses formed between abGCs and CA2 PNS (as identified by PCP4 expression) and CA2 PV+ cells (as identified by cre-dependent AAV-mCherry expression) in PV-cre line. In panels c and d the soma of a CA2 PN cell is shown, as well as the soma of a PV cell is shown. Why was the soma analyzed? What relevance is there for this? It is my understanding that synapses form on dendrites- this would be much more relevant to show, in my opinion. Also, the methods for panels e and f state that the 3R-Tau+ intensity was analyzed only in stratum lucidum. (There was a normalization for the overall 3R-Tau intensity in SL of CA2 that was obtained by dividing the 3R-Tau intensity of corpus callosum). I don't understand then how a comparison of 3RTau intensity could have been done for CA2 PN soma. There are no CA2 PN soma in stratum lucidum. (This is fairly clearly shown in Figure 1aiii, with the PCP4 staining showing the soma in the somatic layer... not in stratum lucidum). What is being analyzed here?

      If the 3R-Tau intensity for dendrites is higher for PV cell dendrites, an example image of dendrites would be very helpful. How was the CA2 PV cell dendrite delimited from the CA2 PN dendrites at 40x magnification for the 3R-Tau intensity? Why were pre-synaptic puncta not examined? Is it possible to determine the post-synaptic target with these methods? This result could be particularly interesting, but I find it very difficult to understand the quantification or the justification behind it. To truly know if a cell is getting a connection, the best method would be to perform whole-cell patch clamp recordings of the post-synpatic target cells and use optogenetics of the abGCs. I understand that perhaps this may be beyond the scope of the paper, but it is a severe limitation for these results.

      We have eliminated the cell body measures from Figure 1 and focus instead on the dendrite measures, which we agree are more relevant. We now provide high magnification example images of pyramidal cell (PCP4+) and PV+ interneuron (GFP+) dendrites in Figure 1. We thank the reviewer for pointing out the error about the stratum lucidum as some of the dendrites analyzed are located in the pyramidal cell layer. In addition, neither PCP4 nor GFP label the full extent of dendrites emanating from CA2 pyramidal cells or PV+ interneurons respectively. We mention this in the revised manuscript because abGC projections to more distal dendrites might show a different pattern than that which was observed for proximal dendrites. We also provide more details about how the dendrites were delimited for the analysis, and mention that these results cannot definitively inform us about whether functional synaptic connections have been formed.

      Canulation over CA2 is potentially not specific to CA2 terminals. It would be optimal if the authors had some histology demonstrating specific cannula placement, as these surgeries are really tough to get perfectly centered over CA2. Even if it is perfectly centered, how much would the CNO diffuse into CA3? I think that given the methodology, the authors really need to consider that the behavioral results are not only a result of blocking abGC terminals in CA2 alone. Would it really change much if the abGC terminals are also silenced in CA3a/b as well? The McHugh lab has shown that area CA3 is also playing a role in social memory (Chiang, M.-C., Huang, A. J. Y., Wintzer, M. E., Ohshima, T. & McHugh, T. J. A role for CA3 in social recognition memory. Behav Brain Res 354, 2018). It may be that both areas CA2 and CA3 are important for the phenomenon being demonstrated in Figure 2. I think the impact of the study is just as interesting, as this examination of early social memories is very interesting and nicely done. In fact, areas CA2 and CA3 may be acting together (please see Stöber, T. M., Lehr, A. B., Hafting, T., Kumar, A. & Fyhn, M. Selective neuromodulation and mutual inhibition within the CA3-CA2 system can prioritize sequences for replay. Hippocampus 30, 1228-1238, 2020).

      We agree that it is possible that CNO infusions targeted at the CA2 would also influence CA3a/b and have revised the paper to include this possible interpretation. We also cite the suggested paper on CA3 involvement in social memory (Chiang et al., 2018) and the paper on CA2-CA3 interactions (Stöber et al, 2020).

      Figure 3 is packed with information, but not communicated in a reasonable way. Much more information and a description of the experimental protocol need to be presented. Furthermore, why are there no example traces for the SWRs recorded? There should be more analysis than just a difference score and frequency. How is j, k, and l analyzed and interpreted? Why no example traces there? Also, the n's seem way too small for Figure 3mr. Are there only 32 or three animals used for some of these conditions? This is insufficient in my opinion to conclude much for a 5-minute interaction.

      In response to this concern, we have divided Figure 3 into 2 figures – Figure 3 and Figure 4. In Figure 3, we provide example traces for SWRs, with additional SWR data presented in Figures S3 and S4, including data to complement the difference score data in Figure 3. In Figure 4, we include traces of phase amplitude coupling. We also provide more information in the methods about how the phase amplitude coupling data were analyzed. For Figure 4, we used methods described by Tort et al., 2010 to produce a modulation index, which is a measure of the intensity of coupling between theta phase and gamma amplitude. This method additionally allows for visualization of how gamma amplitude is modified across individual theta phase cycles. Regarding the question about n sizes in the 10-12 week abGC group (Fig. 3), the numbers are lower than in the 4-6 week abGC group because by 6 weeks after the first set of recordings, the electrodes in some of the mice were no longer usable. The n sizes for this specific study are 4-5 per group for Nestin-cre mice; 7-8 for Nestin-cre:Gi. This is now clarified in the figure legend.

      The discussion section of this paper does not put these results into a broader context with the field. There are other studies examining abGCs and their roles in novelty and memory formation (the work from Juna Song's lab, for example). These should be properly mentioned and discussed.

      In response, we have added discussion on the roles of abGCs in nonsocial novelty and memory formation and have cited papers from the Song lab.

      In the figure legend for Figure 2, there is no specific explanation for panel h. Perhaps the label is missing in the legend.

      We thank the reviewer for noting this error and now include a description in the revised manuscript.

      Reviewer #2:

      Adding more quantifications (single cells, isolating data during interactions versus noninteraction times) would help understand the results better. In the lack of this, adding a more clear rationale (even if only through the presentation of hypotheses) in between the transitions of the different results sections would make the study easier to read.

      In response to this comment, we have added transition sentences between results sections to clarify the rationale and make the manuscript easier to understand.

      Reviewer #3:

      Line 110: "Hippocampal phase-amplitude coupling (PAC) and generation of sharp waveripples (SWRs) have been linked to novel experience, memory consolidation, and retrieval (Colgin, 2015; Fernandez Ruiz et al., 2019; Meier et al., 2020; Joo and Frank, 2018; Vivekananda et al., 2021). The DG is known to influence hippocampal theta-gamma coupling and SWRs (Bott et al, 2016; Meier et al., 2020), yet no studies have examined the influence of abGCs on these oscillatory patterns." This information comes too early in the result section and is somewhat confusing.

      In response to this comment, we have moved this information and provided a better description.

      Line 118: "we found that mice with normal levels of abGCs can discriminate between their own mother and a novel mother." Be more descriptive of the results (present the raw interaction times with the statistical test to compare them), this is the conclusion.

      In response to this comment, we provide the raw interaction times in a new Figure (Fig. S1) and describe the results in more detail.

      Line 121: "These effects were not due to changes in physical activity". Be more specific. Did you subject the mice to a specific test? If not, how did you calculate locomotion? The data presented in the supplementary figure 1a only states the % locomotion.

      Locomotion was manually scored whenever an animal moved in the testing apparatus. Speed was not recorded. Total locomotion was divided by trial duration to create a % locomotion measure. We have added these details to the methods.

      Line 124: "Coinciding with the recovery of adult neurogenesis, GFAP-TK animals regained the ability to discriminate between their mother and a novel mother". Explain how the difference in interaction time can be interpreted as the ability to discriminate. You could also compute the discrimination index used by several other laboratories (difference of interaction normalized by the total interaction time).

      In response to this comment, we describe how the difference in interaction time can be interpreted as the ability to discriminate between novel and familiar mice.

      Line 133: "Targeted CNO infusion in Nestin-Cre:Gi mice enabled the inhibition of GiDREADD+ abGC axon terminals present in CA2." Provide data or references to support this claim. Injection of a dye of comparable size to CNO would help. Otherwise, mention that nearby CA3a could be affected as well.

      We cannot rule out that nearby CA3a was affected by our cannula infusions of CNO into CA2. Furthermore, since dyes likely diffuse at different rates than CNO, we believe that a dye injection would not eliminate this concern completely. Therefore, we have revised the paper to acknowledge the likelihood that the CNO infusion affected parts of CA3 in addition to CA2. We also changed the title to focus more on the CA2 electrophysiological recordings, which we know were obtained only from the CA2.

      Line 150: "When reintroduced to the now familiar adult mouse 6 hours later, after the effects of CNO had largely worn off". Provide data or references supporting this claim.

      In response, we cite articles that show behavioral effects of CNO DREADD activation are returned to baseline 6 hrs later.

      Line 165: "We found that SWR production is increased during social interaction, with more SWRs produced during novel mouse investigation, presumably during encoding social memories, than during familiar mouse investigation, presumably during retrieval of developmental social memories". How does this compare to the results in Oliva et al, Nature 2021?

      The Oliva et al 2021 paper recorded CA2 SWRs during home cage and during post-social stimulus exposure periods of sleep. The timing of the study does not coincide with the measures we made, but we cite the paper.

      Line 168: "Inhibition of abGCs in the presence of a social stimulus". How does silencing abGC impact CA2 pyramidal neurons' firing rate?

      The direct answer to this question is unknown because we did not measure single units, but based on studies done in the CA3, it is likely that firing rate in CA2 would increase.

      Line 203: "abGCs possess a time-sensitive ability to support retrieval of developmental social memories." Can you speculate on the function of the cells later on?

      In the revised paper, we speculate about the function of abGCs after they mature and no longer support retrieval of developmental social memories.

      Line 229: "GFAP-TK mice were group housed by genotype". Why not housed them with CD1 littermates?

      We housed these mice according to genotype to avoid having mice with different levels of abGCs (GFAP-TK + VGCV and CD1 + VGCV) living together in social groups. We did this to avoid potential differences that might emerge in social behavior.

      Line 237: "Adult TK, Nestin-cre, and Nestin-cre:Gi offspring underwent a social interaction test in which they directly interacted with the mother". Specify how long was the social interaction time.

      In the revised manuscript, we specify that mice interacted with each social stimulus for 5 minutes.

      Line 240: "After a 1-hour delay spent in the home cage". Were the mice single-housed or with their littermates during this delay?

      In the revised manuscript, we indicate that mice were put back into the home cage with their cagemates during the 1 hr delay period.

      Line 241: "The order of stimulus exposure was counterbalanced in all tests." Can you show some data to confirm that the order of presentation did not impair the interaction? Have you considered using your own version of the classical 3-chamber test in order to assess directly the preference for one or the other female mouse?

      Our data suggest that the order of testing is not responsible for the observed results. Across all experimental groups without an abGC manipulation (i.e., all direct social interaction assays excluding VGCV+ GFAP-TK trials and CNO+ Nestin-cre:Gi trials), ~84.4% of animals demonstrate a social preference for the novel mother over the mother (CD1 + GFAP-TK VGCV- cohort: 28/33; CD1 VGCV+ cohort: 17/17; CD1 and TK recovery cohort: 24/31; Nestin-cre and Nestin-cre:GI 4-6-week-old abGC cohort: 77/95; 10-12-week-old abGC cohort: 49/55; Total = 195/231 mice with an investigation preference for the novel mother). If stimulus presentation order were to bias social investigation preference toward the first stimulus presented, we would expect the percentage of animals demonstrating a social preference for each stimulus to be around 50%, as roughly half the animals were first exposed to the mother with the other half first exposed to the novel mother. The social novelty preference percentage reported above is comparable to percentages we observe in our lab's novel to familiar social interaction experiments, in which all animals are first exposed to a novel conspecific. We have yet to conduct experiments testing adults using the modified 3-chamber assay described in Laham et al., 2021.

      Statistics: The statistical tests used throughout the paper are appropriate but their description is too cursory. Please provide F values and specify the name of the tests used in the figure legends before giving the exact p values.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, mediated by the RNA binding protein IGF2BP2. While the study presents interesting and largely solid evidence, part of the work is incomplete, requiring additional controls to more robustly support the major claims. The work would also benefit from further discussion addressing the apparently contradictory effects of circHIPK3 and STAT3 depletion in cancer progression.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, by showing that it interacts with an RNA binding protein (IGF2BP2) and, by sequestering it, it regulates the expression of hundreds of genes containing a sequence (11-mer motif) in their untranslated regions (3'-UTR). This sequence is also present in circHIPK3, precisely where IGF2BP2 binds. The study further focuses on one specific case, the STAT3 gene, whose mRNA product is downregulated upon circHIPK3 depletion apparently through sequestering IGF2BP2, which otherwise binds to and stabilizes STAT3 mRNA. The study presents mechanistic insight into the interactions, sequence motifs, and stoichiometries of the molecules involved in this new mode of regulation. Altogether, this new mechanism seems to underlie the effects of circHIPK3 in cancer progression.

      Strengths:

      The authors show mechanistic insight into a proposed novel "sponging" function of circHIPK3 which is not mediated by sequestering miRNAs but rather by a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of study. They provide both genome-wide analysis and a specific case (STAT3) that is relevant for cancer progression.

      Weaknesses:

      One of the central conclusions of the manuscript, namely that circHIPK3 sequesters IGF2BP2 and thereby regulates target mRNAs, lacks more direct experimental evidence such as rescue experiments where both species are simultaneously knocked down. CircRNA overexpression lacks a demonstration of circularization efficiencies. There seem to be contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, namely that while circHIPK3 is frequently downregulated in cancer, circHIPK3 downregulation in this study leads to downregulation of STAT3. This does not seem to fit the fact that STAT3 is normally activated in a wide diversity of cancers and is positively associated with cell proliferation. The result is neither consistent with the fact that circHIPK3 expression positively correlates with good clinical outcomes. Overall, the authors have achieved some of their aims but additional controls would be advisable to fully support their conclusions.

      We thank the reviewer for the important and constructive criticism. All the raised points have now been addressed as described below.

      Rescue experiment:

      We have now performed the suggested rescue experiment, exploring the potential normalization of target expression upon double knockdown (both circHIPK3 and IGF2BP2). Expression of targets STAT3, NEU and TRAPPC9 were assessed, and all target mRNAs became normalized upon double knockdown, supporting our suggested IGF2BP2 sponging mechanism for circHIPK3. These results have been included in Supplementary Figure 5F.

      Circularization efficiency of ectopically expressed circRNAs:

      For efficient expression of circRNAs in human cells, we have used a state-of-the-art plasmid construct (Laccase2-circRNA; Kramer et al., 2015, Genes Dev. 2015 Oct 15;29(20):2168-82. doi: 10.1101/gad.270421.115), which has proved superior to many alternatives presented in the literature. To ensure proper circularization efficiency of circHIPK3, we have now subjected purified RNA from transfected HEK293 cells (and from HEK293 Flp-In T-Rex cells with stable integration of cassette) to northern blotting (Supplementary Figure S5H). This demonstrates the production of a single RNase R resistant band of correct size, for both circHIPK3 expression constructs. Due to relatively weak signal to noise ratio (rRNA background), we are unable to calculate an accurate linear-to-circ ratio. Nevertheless, the results suggest efficient production of WT and mutant circHIPK3 using the Laccase2 vector system.

      circHIPK3 and STAT3 expression in cancer:

      It is correct that STAT3 expression is oden positively correlated with disease progression in many patients suffering from different cancers, and that the observed expression pattern with downregulation of circHIPK3 and STAT3 in BC cells can be perceived as counterintuitive. We note that the STAT3 profile in our time-course knockdown experiments is somewhat dynamic. While downregulation of STAT3 is most pronounced After 24 hrs of circHIPK3 knockdown, the expression tends to be more normalized After 48 and 72 hrs, which could be due to initiating compensatory mechanisms elicited by the cells. Indeed, comparing long-term development of tumors in patients, with numerous primary and accumulating secondary effects, to transient (0-72 hrs) geneexpression analyses has limitations. In addition, despite the oncogenic role of STAT3 having been widely demonstrated, evidence suggest that STAT3 functions are multifaced and not always trivial to classify. Recent evidence has shown that STAT3 can have opposite functions in cancer and act as both a potent tumor promoter and a tumor suppressor (reviewed in Tolomeo and Cascio, 2021, Int J Mol Sci. 2021 Jan; 22(2): 603. doi: 10.3390/ijms22020603). We have now discussed this in more detail (in the discussion section) and stated some of the limitations of our study in terms of the regulation of the STAT3/p53 axis.

      Reviewer #2 (Public Review):

      The manuscript by Okholm and colleagues identified an interesting new instance of ceRNA involving a circular RNA. The data are clearly presented and support the conclusions. Quantification of the copy number of circRNA and quantification of the protein were performed, and this is important to support the ceRNA mechanism.

      We thank the reviewer for the positive feedback.

      Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking it down and performing an RNA-seq analysis, the authors found thousands of deregulated genes that look unaffected by miRNAs sponging function and that are, instead, enriched for an 11mer motif. Further investigations showed that the 11-mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD (resulting in downregulation and upregulation, respectively) the authors found the STAT3 gene. This was accompanied by consistent concomitant upregulation of one of its targets, TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation.

      Strengths:

      The number of circRNAs continues to drastically grow; however, the field lacks detailed molecular investigations. The presented work critically addresses some of the major pi‘alls in the field of circRNAs and there has been a careful analysis of aspects frequently poorly investigated. The timepoint KD followed by RNA-seq, investigation of the miRNAs-sponge function of circHIPK3, identification of 11-mer motif, identification, and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action have been extensively explored and, comprehensively are convincing.

      Weaknesses:

      In some parts, the manuscript lacks appropriate internal controls (eg: comparison with normal bladder cells, linear transcript measurements upon the KD, RIP internal controls/ WB analysis, etc), statistical analysis and significance (in some qPCRs), exhaustive description in the methods of microscopy and image analysis, western blot, and a separate section of cell lines used. The use of certain cell lines bladder cancer cells vs non-bladder cells in some experiments for the purpose of the study is also unclear.

      Overall, the presented study adds new knowledge in describing circHIPK3 function, its capability to regulate some downstream genes and its interaction and competition for IGF2BP2. However, whereas the experimental part appears technically logical, it remains unclear the overall goal of this study and the final conclusions. The mechanism of condensation proposed, although interesting and encouraging, would need further experimental support and information, especially in the context of cancer.

      In summary, this study is a promising step forward in the comprehension of the functional role of circHIPK3. These data could possibly help to better understand the circHIPK3 role in cancer.

      We thank the reviewer for the important and constructive criticism. All the raised points have now been addressed as described below.

      Internal controls/description of methods:

      We have now included suggested internal controls and provided statistical significance measures where needed. We have also described in more detail the usage of different cell lines for different experiments and a comprehensive description of microscopy, image, and western analyses.<br /> The condensation mechanism of circHIPK3 and IGF2BP2 that we propose has been toned down slightly in the discussion, as we agree that these observations are not unequivocal and could potentially be explained by alternative and yet undefined events as discussed in further detail.

      Recommendations for the authors:

      Major points

      (1) In Figure 1B the authors show neither error bars nor statistical analysis. Did they sequence each cell line in single replicates? A clarification on this point would be of help.

      All timepoints for J82 and UMUC3 were sequenced in biological triplicates (Figure 1C-G). The data shown in Figure 1B represents prior single RNA-seq runs of all specific cell lines sequenced for selection of appropriate BC cell lines used for further study.

      (2) In Figure 1C the quantification of the cognate linear Hipk3 RNA would be desired in order to rule out changes in this species levels that could account for the observed effects upon circHIPK3 KD.

      We do not observe a non-specific downregulation of the HIPK3 mRNA upon circHIPK3 knockdown, rather we observe a moderate upregulation at later timepoints. However, western blotting shows that this upregulation is not translated into significantly increased protein levels. This data is now available in Supplementary Figure S1A and S1B.

      (3) In Supplementary Figure S1B the authors show the number of differentially expressed genes between time points and baseline upon circHIPK3 KD or scr siRNA transfection. However, in this referee's opinion, the relevant comparison would be the differentially expressed genes between circHIPK3 KD and scr siRNA at different time points. Otherwise, they would be focusing on both circHIPK3-specific and non-specific effects.

      The requested comparison is part of the main figures (Figure 1F). The plotted data in Supplementary Figure 1B (Supplementary Figure S1D in the revised version) was included to allow the reviewer to better assess the variability in the data. We therefore believe it provides relevant information and that it should be kept in the final version.

      (4) Figure 1E. How many hours of KD do these measurements correspond to? Even if they correspond to 72 h, there seems to be a discrepancy between Fig 1E and 1F in terms of the total number of differentially expressed (DE) genes. Why are there more DE genes in 1E?

      The number of differentially expressed genes in Figure 1E represents the total number at all timepoints, while Figure 1F represent single timepoints. We have modified the figure legend to clarify this issue.

      (5) In Figure 3B, in order to verify pulldown efficiency, RT-qPCR should be performed instead of endpoint RT-PCR. Otherwise, no robust claim can be made regarding interaction affinities.

      We agree that these RIP-PCR results in Figure 3B are only semi-quantitative and therefore do not unequivocally assess binding strength. However, since IGF2BP2 is the RNA binding protein in focus throughout the rest of the study, where additional quantitative RIP-RT-qPCR experiments have been performed, we find this issue negligible. In addition, the semi-quantitative nature of the endpoint PCR experiment has now been mentioned in the main text and figure legend.

      (6) The authors claim that IGF2BP2 KD counteracts the effect of circHIPK3 KD on target mRNAs. However, in order to support this claim the authors should perform a rescue experiment where they simultaneously knock down both circHIPK3 and IGF2BP2. Otherwise, the conclusion remains largely supported by a correlation.

      Indeed, such an experiment is important. A rescue experiment with double knockdown has now been performed and demonstrates that levels of tested targets; STAT3, NEU and TRAPPC9 become normalized under these conditions, supporting our IGF2BP2/circHIPK3 sponging model. The data is available in Supplementary Figure S5F.

      (7) The authors claim that circHIPK3 interacts strongly with IGF2BP2 in bladder cancer cells but not with GRWD1. This is shown in Figure 4A where neither standard errors nor statistical analysis is shown. The authors need to show replicates of this experiment and perform statistics in order to support their claims.

      These experiments have been redone with even higher stringency in biological triplicates and fully supports our claims. The data is available in a modified Figure 4A – now including error bars and indications of significance. In addition, we have included western blots demonstrating Input (IN), Flowthrough (FT) and Immunoprecipitation (IP) of correctly sized proteins in Supplementary Figure S4A.

      (8) The authors claim that the STAT3 gene, which contains the 11-mer motif in its 3'UTR, becomes downregulated upon circHIPK3 KD in UMUC3 and J82 cells, while it is upregulated upon IGF2BP2 depletion in both cell lines. It is unclear why they show the effect of circHIPK3 KD on STAT3 within a time course while the effect of IGF2BP2 KD in a fixed time point (Figures 5A/S5A and 5B/S5B respectively), and it would be convenient to clarify this point.

      The initial time course knockdown experiment for circHIPK3 was conducted to provide a comprehensive dataset for circHIPK3-mediated events and clarify any temporal effects. After identification of IGF2BP2 as an interaction partner of circHIPK3, we chose to harvest cells After knockdown at 48 hrs as knockdown efficiency was prominent at this point. The temporal knockdown efficiency of RNAs (circHIPK3) and proteins (IGF2BP2) differ considerably due to increased stability of proteins compared to target RNA. This is the main reason why only a single timepoint has been assessed.

      (9) In Figure 5F the authors show that upon overexpression of wildtype or 11-mer motif-mutant circHIPK3, the binding of IGF2BP2 was reduced while the binding of STAT3 mRNA to IGF2BP2 was increased. In order to rule out differences in circularization efficiencies, it would be convenient to show a northern blot comparing the efficiency of circHIPK3 overexpression relative to its linear cognate RNA for both constructs.

      Indeed, circRNA expression constructs may differ considerably in circularization efficiencies. We are using the Laccase2 system developed by the Jeremy Wilusz lab (Kramer et al., 2015), which, at least in our hands, efficiently produces circRNAs from almost any inserted sequence. To address whether the WT and mutant circHIPK3 express similar amounts of circRNA with high efficiency, we performed the suggested northern blot, which displays very similar RNase R resistant circHIPK3 levels. The data is now available in Supplementary Figure S5H. Due to background signal from 18S rRNA in non-RNase R treated samples, we cannot accurately calculate a linear/circular RNA ratio, since no distinct linear RNA species above background is visible on the blot. However, the important part that mutant and WT (RNase R resistant) circRNA are expressed at similar levels, makes us confident about our conclusion that WT circHIPK3 expression interferes with IGF2BP2 binding to STAT3 mRNA.

      (10) Figure 1G, several genes were selected as up and downregulated for J82 and UMUc3 cell lines. Were these consistently involved in specific biological processes?

      Genes were classified as down or upregulated based on significant (FDR<0.1) fold changes. The most significant genes in both directions were named, disregarding of involvement in any specific biological processes. Initially, we performed a GO-term analysis on these genes and received many hits, but we did not observe a very specific pattern or cluster of genes, suggesting that we are looking at both primary and secondary effects of knocking down circHIPK3. We believe our GSEA of the 50 hallmarks of cancer genes sets, presented in Figure 4D, 4E and Supplementary Figure S4E and S4F is addressing this point in a satisfactory manner.

      (11) For differential expression analysis, which data sets were used to group outcomes at different time points. Also, there is an increased number of genes affected after KD - please describe in more detail how you reached that gene number.

      As also discussed above (point 3), at each timepoint (Figure 1F) “Scr” was compared to “circHIPK3” knockdown. It makes sense that more and more genes are DE over the course of time as both primary and secondary effects of knockdown will build up over time. We have now clarified which datasets have been used in the figure legend and rewritten the Methods’ section on differential expression analysis.

      (12) What happens with the expression of circHIPK3 if STAT3 is KD? What biological processes are modulated by silencing circHIPK3?

      (13) What happens in bladder cancer cells if STAT3 and circHIPK3 are KD?

      The main goal of our work is to clarify how circRNAs (here circHIPK3) affect gene-expression and cancer pathways. While it would be interesting to explore the consequences of STAT3 knockdown and in combination with circHIPK3, such experiments would require comprehensive additional analyses (RNA-seq), which we believe is beyond the scope of this study at this point.

      (14) The rationale of the study and conclusions are unclear. Quote "we extensively evaluate the functional impact of circHIPK3 in bladder cancer cells". As previously published by the authors, as well as mentioned in the manuscript, circHIPK3 is downregulated in cancers and possesses tumor suppressor functions in bladder cancers. Could the authors clarify how the results of the presented study based on the depletion of circHIPK3 fit with the previous discoveries? If the circHIPK3 is generally downregulated compared to normal cells (although higher compared to the linear transcript) why do the authors use a KD approach? Are the bladder cancer cells simply a cell model to study circRNA vs linear? How the condensation model reconciles with circHIPK3 tumor suppressor function based on these results?

      We believe that it remains unclear whether circHIPK3 is a direct tumor suppressor, although this is possible judged from the clinical patient data, since STAT3, which has been shown to become activated in many cancers, is also downregulated upon circHIPK3 knockdown. However, differences in immediate effects on gene-expression of circHIPK3 knockdown (0-72 hrs) and long-term development of tumors within patients, may be difficult to compare directly. If STAT3 downregulation contributes to cancer phenotypes in bladder cancer as suggested for several other cancer types (Glioblastoma, prostate cancer, lung cancer etc.) circHIPK3 may indeed still be classified as a tumor suppressor in bladder cancer. It is worth noting that circHIPK3 has been shown to be upregulated and have oncogenic phenotypes in many other cancers, which makes direct correlations between cancers complex and difficult to reconcile. We have revised the discussion to reflect these issues in a more comprehensive fashion. To fully delve into STAT3 regulation in terms of bladder cancer development, progression, cell invasiveness, and survival, we believe are more suitable for future experiments.

      At this point, we have identified a novel mechanism of a circRNA deregulated in cancer being able to sponge/regulate the function of an oncogenic RNA binding protein, even though it is severely outnumbered in cells. Importantly, circHIPK3 likely does not function as a miRNA sponge as previously proposed in several previous studies based on circRNA overexpression, reporter constructs and miRNA mimics. We therefore believe that these findings provide new important insights into circHIPK3 function and that the current understanding of circRNAs functioning primarily as miRNA sponges, likely should be revised.

      (15) Related to the previous point, if the purpose is to study the role of circHIPK3 in bladder cancer, there is a bit of a lack of consistency and it is sometimes confusing to understand the use of certain cell lines for specific experiments. The initial circHIPK3 KD experiments have been conducted in 2 (out of 11 not malignant/ metastatic) bladder cancer cell lines (J82 and UMUC3). Why this specific selection of exclusively metastatic bladder cell lines? For comparison are the normal bladder cell lines characterized by the same circRNA vs linear ratio?

      The selection of bladder cancer cell lines (J82, UMUC3 and FL3) is based on several criteria including expression levels of circHIPK3, cell maintenance characteristics and knockdown/transfection efficiencies. Initially, we included HT1197 cells as well, but batch effects precluded the use of these data.

      Furthermore, the subsequent miRNA analysis was conducted exclusively in one bladder cell line (J82 but not in UMUC3), the initial identification of motif again in bladder cells but the initial RBP identification and experimental interaction is conducted in non-bladder cells HepG2 and k562 (reported as main figure 3B) and only subsequently in bladder cell (4A), again in a different cell line (only FL3, but not in J82 and UMUC3). The validation of the interaction of STAT3 by RIP is performed exclusively in FL3. All this also makes someone wonder how specific this mechanism/binding is in bladder cancer cells. There is an attempt to explain this by comparing cell cycle progression analysis upon circHIPK3 KD and IGF2BP2 KD later on but the final conclusions of this analysis remain unclear. The authors should provide more explanation and information in this part of the manuscript.

      It is correct that the different bladder cancer cell lines (FL3, J82 and UMUC3) have been used more or less interchangeably between experiments. This is due to the observed common phenotypes, e.g. sharing up to 92% DE genes, and highly significant enrichment of the IGF2BP2 11-mer-motif in downregulated mRNAs upon circHIK3 knockdown in all three cell lines. The ENCODE cell lines HepG2 and K562 were used since the accessible RBP-CLIP data originates from the ENCODE project, where these cells have been used exclusively. Hence, we validated the binding of candidate RBPs (semi-quantitatively) in HepG2 and K562 prior to assessing their RNA binding in the BC cell line FL3. We have used FL3 for RIP and validation of IGF2BP2 binding mainly due to better transfection efficiency and higher expression levels, allowing detection all interrogated components. The fact that we have included three BC cell lines in many experiments instead of only one, and obtained consistent results, solidifies the conclusions that our phenotypes and regulatory mechanisms are likely common for most, if not all, bladder cancer cell lines. We have included a paragraph in the materials and methods section to further clarify the usage of cell lines in the different experiments.

      (16) STAT3 gene is used as an example. Where is this gene coming from? How has this gene been selected? Is there any complete list of RNA-seq data of up/down-regulated genes upon circHIPK3 KD? The raw data and gene list should be publicly available to the reviewers.

      STAT3 is a major regulator of cancer pathways and therefore an interesting candidate for further analysis as it is differentially expressed between control and circHIPK3 knockdown in all cell lines. We have now included the complete list of DE genes from the time-resolved RNA-seq analyses (DESeq2 output files) in the supplementary material. This data is now available in Supplementary Tables S6 and S7.

      (17) In performing the KD of circHIPK3 the authors use a unique siRNA on a splice junction. The authors claim that this is a way to not affect the linear transcript, however, have the authors also ensured experimentally that this doesn't affect in any way the linear RNA? This should be included as an initial internal control.

      We do not observe a downregulation of the HIPK3 mRNA upon circHIPK3 downregulation, rather we observe a moderate upregulation at later timepoints. When assessing the HIPK3 protein levels, we observe no significant change After 48 hrs of knockdown. This data is now available in Supplementary Figure S1A and S1B.

      (18) Additional controls should be provided for RIP, especially for Fig3B and 4A, Sfig4, 5C such as an internal positive control (es: AGAP2-AS1) of the correct pulldown of IGF2BP2 and/or WB should be shown (in the methods it is told that WB has been used for the analysis of RIP but I couldn't find any)

      Indeed, IGF2BP2 likely binds to many mRNAs in the cell. We have now included b-actin mRNA as a low affinity control in the Figure 4A RIP data, showing that circHIPK3 represents a tight binding substrate for IGF2BP2. We have also included a western blot showing the IP of IGF2BP2, IGF2BP2, GRWD1 and GFP. This data is now available in Supplementary Figure S4A.

      (19) Additional internal experimental controls should be included to assess the successful transfection and overexpression of circHIPK3 with the laccase-2 driven plasmid and mutated versions before the RIP in 4B and in the 5F. Supportive controls to show equal transfection would be required for Figure 6C-D. Further controls to show that the ASO specifically targets the 11-mer in circHIPK3 but not IGF2BP2 target genes should also be included. Please include this information in the supplementary materials.

      We have now included a northern blot showing successful transfection and expression of RNase R resistant circHIPK3 from the Laccase2 vector (WT and mutant) in relation to RIP experiments. This data is now available in Supplementary Figure S5H (see also comments about this above). Equal transfections in cells shown in Figure 6C-D is assessed by comparable levels of GFP expression, which is included as an expression cassette in the modified Laccase2 construct. Pictures were acquired with same exposure time and scaling to ensure that they can be compared directly. The ASO targets circHIPK3 with full complementarity, while STAT3 mRNA has 2 mismatches, leaving the “lesser interaction” with STAT3 theoretical. This has now been clarified in the main text.

      (20) Specifically, in 1C and 4A, Sfig4 there is no statistical analysis made and/or significance? This is only reported for the RIP experiment in Fig 5C.

      Statistical analyses have now been performed and shown in Figure 4A and we have included binding of ACTB as a low affinity control. In Figure 1C, which displays knockdown efficiency (highly efficient) at the various timepoints, no statistical significance has been displayed, since this is normally not done for such knockdown experiments. In addition, it is also not clear which comparisons would be beneficial. Except for the J82 cell line at 12 hrs compared to 0 hrs, knockdown efficiency is high and statistically significant at all timepoints.

      (21) In the assessment of copy number ensuring the same primer efficiency is fundamental, it can't be simply "assumed". Please clarify this point and possibly include this information in the supplementary materials.

      It is correct that identical, or at least very similar, primer efficiencies are necessary to make the conclusion that the relationship between GAPDH mRNA and circHIPK3 levels in the cell reflects the quantitatively measured number of molecules. However, since this single comment is only to support the quantitatively measured circHIPK3 molecules by a ballpark estimate, and since we already assume that there are an estimated 10.000-20.000 copies of GAPDH mRNAs in most cells (which we also do not know precisely), we have chosen to remove this statement.

      (22) The methodology section is not well organized and looks incomplete. For example, there are two separate sections for circHIPK3 expression conducted in different cell lines, this would be better explained in a single paragraph.

      We have now rewritten this section to make it clearer.

      The section reporting cell lines and growth conditions is incorporated in "circHIPK3 KD and overexpression" while it should be a separate paragraph and valid for all experiments where these cells have been used. There is no information regarding Western blots, including Antibodies used, and densitometry performed.

      This information has now been included.

      In "immunofluorescence microscopy" it is not clear what microscope has been used, how many acquisitions have been made, and how acquisition has been performed. Related to this, how the image analysis has been performed? Figures 5I-J "Finally, immunofluorescence staining showed that nuclear and overall STAT3 protein levels are significantly lower upon circHIPK3 KD, while nuclear p53 protein levels are higher" and 6C and D "we observed a significantly higher prevalence of large cytoplasmic condensates in cells expressing high levels of circHIPK3 compared to controls" how this quantification has been made? The conclusive part about the condensation role remains a bit too loose and mostly speculative, largely due to the lack of robust information provided on microscopy and image analysis

      We have now included a better description of the acquisition and quantification methods.

      Minor

      (1) The Van Nostrand et al 2018 citation should refer to the updated publication in Nature and not to the original preprint in Biorxiv.

      This reference has now been updated.

      (2) In Supplementary Figure S3B, the authors offer no explanation as to why genes that become upregulated upon circHIPK3 knockdown generally contain more circHIPK3-RBP binding sites other than for IGF2BP2. A clarification would be of help.

      We do not have any evidence to explain this observation. One possibility is that other RBPs elicit mRNA stabilizing effects on average, whereas abundant IGF2BP2 (~ 120.000-200.000 copies per cell) now able to bind more target mRNAs and elicit destabilization. This remains highly speculative though.

      (3) In Supplementary Figure S3D, the authors' claim that the 11-mer motif is found more bound to IGF2BP2 than for other circHIPK3-RBPs should be referred to the corresponding dataset/reference.

      This information is stated in the figure legend (K562) and we have now included it in the main text as well: “We evaluated how oden binding sites of circHIPK3-RBPs overlap the 11-mer motif and found that this is more oden the case for IGF2BP2 binding sites than binding sites of the other circHIPK3-RBPs when scrutinizing K562 datasets (Supplementary Figure S3D)”.

      (4) In Figure 4C the authors show that, according to previously performed experiments of their group, the 11-mer motif is enriched in upregulated genes compared to downregulated genes upon IGF2BP2 KD in UMUC3. This seems like a confirmation of the results presented in the preceding section (Figure 3H) and it would be clearer if it were presented in the same section.

      The data in Figure 3H is based on ENCODE data from IGF2BP2 knockdowns in K562 cells, while in Figure 4C these are from IGF2BP2 knockdown followed by sequencing in UMUC3 cells. We believe the timing of the data is fitting as is, since they relate to non-BC cells and BC cells, respectively.

      (5) More in vitro experiments are needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype, and how different cancer hallmarks are modulated by this ceRNA network.

      We agree that this study does not fully clarify how these complex molecular interactions relate to bladder cancer progression, including fluctuations of key cancer genes/proteins. Since our focus has been on the mechanisms of circRNA function in relation to bladder cancer, these issues will await further future experimentation.

      (6) "apparent" competition (introduction - pag4)? Maybe rephrase more appropriately.

      This has been rephrased and “apparent” excluded.´

      (7) Fig1C. Relative quantification. Statistical analysis? Is this significant?

      See also comment to point 20 above. In Figure 1C we show the knockdown efficiency at the different timepoints. At all timepoints knockdowns are highly significant compared to the control (Scr), which is not significantly changed over time. It seems somewhat redundant to include pvalues for such data. Also, which comparisons should be highlighted? Knockdown is highly efficient, which is what we want to show.

      (8) Figure 5H. Western blot. Densitometry quantification performed, how?

      This is now described in the Materials and Methods section.

      (9) Please specify the concentration of circHIPK3-specific siRNA used.

      20 nM. The information is included in the Materials and Methods section.

      (10) The control sample refers to scrambled or untreated cells? Instead of using "control samples without siRNA transfection" or "No siRNA" use untreated cells - otherwise, it is a bit confusing.

      This has now been modified.

      (11) Figure 3 is starting with hepatocellular and leukemia cells; why not with bladder cells?

      These experiments were performed based on CLIP-data and RBP knockdown data from the ENCODE project. The cells used are limited to HepG2 and K562.

      (12) For Figure 4B, which is the time-point?

      This is 24 hrs. Has now been stated.

      (13) Figure 5I and J, the expression of STAT3 and circHIPK3 can be also investigated for cellular distribution.

      The expression of STAT3 is investigated in Figure 5I. Localization of circRNA by standard RNA-FISH protocols using multiple (>20) probes is inherently difficult due to the cross reaction of probes with the linear mRNA. Certain amplification steps can be included if using a single backsplicing junction probe, but this is oden giving rise to highly ambiguous results as specificity is very limited due to the “one probe“ nature of the design.

      (14) Some discussion of the limitations of the study would be of value.

      We have included this in the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published).

      Strengths:

      The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      (1) The data presented are largely descriptive in terms of the effect of PiCo activation on the probability of swallowing and the pattern of motor activation changes following CIH. Comparisons made between experimental data acquired currently and those obtained in a previous cohort of animals (possibly years before) are extremely problematic, with the potential confounding influence of changing environments, genetics, and litter effects. The statistical analyses (i.e. comparing CIH with normoxic) appear insufficiently robust. Exactly how the data were compared is not described.

      Yes, we agree the data are descriptive in terms of characterizing the effect of CIH on PiCo activation. However, we would like to emphasize that the data are also mechanistic because they characterize the effects of specifically, optogenetically manipulating PiCo neurons after being exposed to CIH.

      Thank you for this comment and for pointing out our misleading description in the paper. This manuscript is meant to independently characterize the effects of CIH to the response of PiCo stimulation. We are not making direct comparisons between the previously published manuscript where mice were exposed to room air. There has been no statistical analysis made between previously published control and current CIH data, since we are not making a direct comparison, only an observational comparison.

      To make this clearer, and to address the reviewers concern, we have removed the room air data from figures 1E, 2C and 3A. However, we believe it is important to keep the data from mice exposed to room air in Figure 2B since we did not include this information in the previously published manuscript. It is important to point out that all mice exposed to CIH have some form of submental activity during laryngeal activation in response to PiCo stimulation. This is not the case when mice are exposed to room air only. In this figure, only descriptive analysis are presented. We adjusted our wording throughout the text, particularly in the discussion, to eliminate any confusion that we are making direct comparisons between the two studies. The following sentence has been added to the discussion “While we do not intend to make direct quantitative comparisons between the previously published PiCo-triggered swallows in control mice exposed to room air (Huff et al 2023) and the data presented here for mice exposed to CIH, we believe it is important to compare the conclusions made in these two studies.” This was the motivation for using the eLife Advance format. Since the present study demonstrates that PiCo affects swallow patterning which was not observed in the control data.

      (2) There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. For example, does CIH alter PiCo directly, or some other component of the circuit (NTS)? Techniques that silence or activation projections to/from PiCo should be interrogated. This is required to further delineate and define the swallowing circuit, which remains enigmatic.

      We agree with the reviewer that our study raises many more questions than we are able to answer at the moment. This however applies to most scientific studies. Even though swallowing has been studied for many decades, the underlying circuitry remains largely enigmatic. We will continue to investigate the role of PiCo and its interaction with the NTS, in healthy and diseased states. These investigations require many different techniques, and approaches, some of which are still in development. For example, we are currently conducting experiments that silence portions of the NTS related to swallow and PiCo: ChAT/Vglut2 neurons using novel unpublished viral approaches. However, these are separate and ongoing studies beyond the scope of the current one.

      To address the reviewer’s comment, we have added to the following to the limitation section: “In addition, this preparation does not allow for recording of PiCo neurons to evaluate the direct effects of CIH in PiCo neuronal activity”. The following has also been added to the discussion: “Rather, our data reveal CIH disrupts the swallow motor sequence which is likely due to changes in the interaction between PiCo and the SPG, presumably located in the cNTS. While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow motor patterning itself. Here we show for the first time that CIH leads to disturbances in the generation of the swallow motor pattern that is activated by stimulating PiCo. This suggests that PiCo is not only important for coordinating swallow and breathing, but also modulating swallow motor patterning. Further studies are necessary to directly evaluate the presumed interactions between PiCo and the cNTS.”

      (3) The functional significance of the altered (non-classic) patterns is unclear.

      Like in our original study, the preparation used to stimulate PiCo does not allow to simultaneously characterize the functional significance of swallowing. Therefore, we have included this as a limitation in the limitation section: “In this preparation we are unable to directly determine the functionality of the variable swallow motor pattern seen after CIH. Different experimental techniques, such as videofluoroscopy would need to be used to directly evaluate functional significance. This technique is beyond the scope of this study and not possible to perform in this preparation. We acknowledge this limits our ability to make direct comparisons between dysphagic swallows in OSA patients.”

      Reviewer #1 (Recommendations For The Authors):

      (1) A more rigorous experimental approach is required. Littermates should be separated and exposed to either room air or CIH at the same (or close to the same) time.

      As stated above, we did not directly compare mice exposed to room air with mice exposed to CIH. Hence, we believe this is not necessary, and it would have meant repeating all the experiments already published in the original eLife paper.

      (2) Robust statistical analyses are required to determine whether the effects of CIH on the pattern/probability of motor activation are required.

      Since control and CIH group were not compared in this study, statistical hypothesis testing is not appropriate or applicable.

      (3) Use a combination of retrograde, Cre- AAVs and Cre-dependent approaches to interrogate the circuitry to/from PiCO that forms the swallowing network. This is what is needed to push this area forward, in my view.

      Thank you for this suggestion, we will consider this suggestion as we plan for future experiments. Indeed, we are in the process of developing novel approaches. However, in this context we would like to emphasize that further network investigations are exponentially more complicated given that we need to use a Flpo/Cre approach to specifically characterize the glutamatergic-cholinergic PiCo neurons. Most other laboratories that have studied PiCo have avoided this experimental complication and used only a “cre-dependent” approach. This approach is much simpler, but the data are much less specific and the conclusions sometimes misleading. Stimulating for example cholinergic neurons in the PiCo area will also activate Nucleus ambiguus neurons, stimulating glutamatergic neurons will also activate glutamatergic neurons that are not necessarily the glutamatergic/cholinergic neurons that we use to define PiCo specifically. Readers that are unfamiliar with these different approaches often miss this important difference. Hence, compared to stimulating other areas, stimulating the cholinergic-glutamatergic neurons in PiCo is much more specific than e.g. stimulating preBötzinger complex neurons. There are no markers that will specifically stimulate only preBötzinger complex neurons or neurons in the parafacial Nucleus. Unfortunately, this difference is often overlooked.

      (4) It should be made more clear how each of the "non-classic" swallowing patterns could cause dysfunction - especially to the reader who is not completely familiar with the neural control of swallowing.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since our approach does not allow us to use any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not speculated on the functional implications. We have added the following to the discussion section of this manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns. ”

      Minor:

      The Results should be written in a way that better conveys the neurophysiological effects of the manipulations. As it stands, it reads like a statistical report on how activation of each neuronal phenotype is statistically different from each other. As such it is difficult to read and understand the salient findings.

      Thank you for this insight. We have adjusted the language in the results section.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the role of a medullary region, named Postinspiratory Complex (PiCo), in the mediation of swallow/laryngeal behaviours, their coordination with breathing, and the possible impact on the reflex exerted by chronic intermittent hypoxia (CIH). This region is characterized by the presence of glutamatergic/cholinergic interneurons. Thus, experiments have been performed in single allelic and intersectional allelic recombinase transgenic mice to specifically excite cholinergic/glutamatergic neurons using optogenetic techniques, while recording from relevant muscles involved in swallowing and laryngeal activation. The data indicate that in anaesthetized transgenic mice exposed to CIH, the optogenetic activation of PiCo neurons triggers swallow activity characterized by variable motor patterns. In addition, these animals show an increased probability of triggering a swallow when stimulation is applied during the first part of the respiratory cycle. They conclude that the PiCo region may be involved in the occurrence of swallow and other laryngeal behaviours. These data interestingly improve the ongoing discussion on neural pathways involved in swallow-breathing coordination, with specific attention to factors leading to disruption that may contribute to dysphagia under some pathological conditions.

      The Authors' conclusions are partially justified by their data. However, it should be acknowledged that the impact of the study is to a certain extent limited by the lack of knowledge on the source of excitatory inputs to PiCo during swallowing under physiological conditions, i.e. during water-evoked swallowing. Also the connectivity between this region and the swallowing CPG, a structure not well defined, or other brain regions involved in the reflex is not known.

      We thank the reviewer for the comments and the strength of the paper. However, with regards to the “lack of knowledge”, we would like to emphasize that PiCo was first described in 2016, while e.g. the preBötzinger complex was described in 1991. Thus, it is not fair to assume the same level of anatomical and physiological understanding for PiCo as we became accustomed to for the preBötzinger complex. We are fairly confident that in 25 years from now, our knowledge of the in- and outputs of PiCo will be much less limited than it currently is.

      Strengths:

      Major strengths of the manuscript:

      • The methodological approach is refined and well-suited for the experimental question. The in vivo mouse preparation developed for this study takes advantage of selective optogenetic stimulation of specific cell types with the simultaneous EMG recordings from upper airway muscles involved in respiration and swallowing to assess their motor patterns. The animal model and the chronic intermittent hypoxia protocol have already been published in previous papers (Huff et al. 2022, 2023).

      • The choice of the topic. Swallow disruption may contribute to the dysphagia under some pathological conditions, such as obstructive sleep apnea. Investigations aimed at exploring and clarifying neural structures involved in this behaviour as well as the connectivity underpinning muscle coordination are needed.

      • This study fits in with previous works. This work is a logical extension of previous studies from this group on swallowing-breathing coordination with further advances using a mouse model for obstructive sleep apnea.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      Major weaknesses of the manuscript:

      • The Authors should be more cautious in concluding that the PiCo is critical for the generation of swallowing itself. It remains to demonstrate that PiCo is necessary for swallowing and laryngeal function in a more physiological situation, i.e. swallow of a bolus of water or food. It should be interesting to investigate the effects of silencing PiCo cholinergic/glutamatergic neurons on normal swallowing. In this perspective, the title should be slightly modified to avoid "swallow pattern generation" (e.g. Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production).

      Thank you for pointing out that this manuscript suggest PiCo is necessary for swallow generation. We agree further interventions to silence specifically PiCo ChAt/Vglut2 neurons will be necessary to investigate this claim. Which we have begun to evaluate for a future study by developing a novel as yet unpublished approach. We have altered language throughout the text to limit the perception that PiCo is the swallow pattern generator. We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

      • The duration of swallows evoked by optogenetic stimulation of PiCo is considerably shorter in comparison with the duration of swallows evoked by a physiological stimulus (water). This makes it hard to compare the timing and the pattern of motor response in CIH-exposed mice. In Figure 1, the trace time scale should be the same for water-triggered and PiCo-triggered swallows. In addition, it is not clear if exposure to CIH alters the ongoing respiratory activity. Is the respiratory rhythm altered by hypoxia? If a disturbed or irregular pattern of breathing is already present in CIH-exposed mice, could this alteration interfere with the swallowing behaviour?

      Thank you. We have changed the time scale so that all representative traces are on the same time scale.

      We explained in the original paper (Huff et al 2023) that the significant decrease in PiCo-evoked swallow duration compared to water evoked is likely due to the absence of oral/upper airway feedback. We are not making comparisons of the effects of CIH on swallow motor pattern between water-evoked and PiCo-evoked. Rather, we are only characterizing the effects of CIH on the swallow motor pattern in PiCo-evoked swallows. The purpose of Figure 1A is to show that the rostocaudal submental-laryngeal sequence in water-evoked swallows is preserved in “canonical” PiCo-evoked swallow like is shown in the original study. While we did not measure the effects of CIH on breathing and the respiratory pattern in this study, it has been established, by others, that CIH causes respiratory muscle weakness, impaired motor control of the upper airway and variable respiratory rhythm and rhythm generation. However, when characterizing the timing of swallow in relation to inspiration (Figure 1 Figure Supplement 1) and the reset of the respiratory rhythm (Figure 3 figure supplement 1) and by observationally comparing these results with mice exposed to room air (Huff et al 2023) we do not observe any obvious differences in swallow-breathing coordination. However, a separate study in wild-type mice focusing on a characterization of swallowing via water after CIH would be better suited to achieve a better understanding of the physiological changes of swallowing after CIH. We would like to point out that this has shown in Huff et al 2022 that altering respiratory rate/pattern via activation of various preBötzinger Complex neurons does not change swallow behavior. Except in the case of Dbx1 PreBötC neuron activation, which was independent of CIH. Increasing or decreasing respiratory rate via activation of PreBötC Vgat and SST neurons did not change the swallow pattern rather it changed the timing of when swallows occurred. It has been reported before by others that swallow has a hierarchical control over breathing and has the ability to shut breathing down. We believe that the swallowing behavior is independent of respiratory pattern and alterations in breathing pattern does not necessarily affect the swallow motor pattern rather could affect the swallow timing.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Lines 37-41 "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly the generation of swallow motor pattern was significantly disturbed."

      It should be better:

      "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly triggers variable swallow motor patterns".

      Thank you, this has been changed

      Lines 41-43 "This suggests, glutamatergic-cholinergic neurons in PiCo are not only critical for the gating of postinspiratory and swallow activity but also play important roles in the generation of swallow motor pattern." I suggest removing any language claiming PiCo is swallow gating and change "generation" in "modulation"

      "This suggests that glutamatergic-cholinergic neurons in PiCo are not only critical in regulating swallow-breathing coordination but also play important roles in the modulation of swallow motor pattern."

      Thank you, this has been changed

      Introduction:

      Line 88-90: Actually, in Huff et al. 2023 it is said "PiCo acts as an interface between the swallow pattern generator and the preBötzinger complex to coordinate swallow and breathing". Please, change accordingly. Please, remove Toor et al., 2019 since their conclusions are quite different.

      Line 100-101: Please, change the sentence according to the comments reported above.

      Thank you, this has been changed

      Results:

      Lines 104-105: Did you mean: "We confirmed that optogenetic stimulation of PiCo neurons in ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH triggers swallow and laryngeal activation similar to the control mice exposed to room air (Huff et al., 2023)." Otherwise, the sentence is not clear.

      Thank you, this has been changed

      Lines 129-130: This finding is not surprising since similar results have been reported in Huff et al. 2023.

      Thank you, we wanted to confirm that CIH did not alter this characteristic, which it did not. We believe that it is important to include this as it is a criterion for characterizing laryngeal activation.

      Lines 219: The number of water swallows is considerably lower than stimulation-evoked swallows. Why?

      We inject water into the mouth three times. Typically, there is one swallow in response to each water injection. Pico is stimulated 25 times at each duration. If we were to stimulate swallow with water as many times as optogenetic stimulation there would be an adaptive response to the water stimulation and the mouse would not respond. This does not seem to be the case with PiCo stimulation. Simple answer is, there are many more PiCo stimulations than water stimulation.

      Lines 228-232: "PiCo-triggered swallows are characterized by a significant decrease in duration compared to swallows evoked by water in ChATcre:Ai32 mice (265 {plus minus} 132ms vs 144 {plus minus} 101ms; paired t-test: p= 0.0001, t= 5.21, df= 8), Vglut2cre:Ai32 mice (308 {plus minus} 184ms vs 125 {plus minus} 44ms; paired t-test: p= 0.0003, t= 6.46, df= 7), and ChATcre:Vglut2FlpO:ChR2 mice (230 {plus minus} 67ms vs 130 {plus minus} 35ms; paired t-test: p= 0.0005, t= 5.62, df= 8) exposed to CIH (Table S1).".

      Thank you, this has been changed

      Line 252 and 254: remove SEM.

      Thank you, this has been changed

      Discussion

      Line 267: ...(Figure 1Bi), while 28% of PiCo-triggered swallows...

      Thank you, this has been changed

      Lines 283-290: "Thus, CIH does not alter PiCo's ability to coordinate the timing for swallowing and breathing. Rather, our data reveals that CIH disrupts the swallow motor sequence likely due to changes in the interaction between PiCo and the SPG, presumably the cNTS.

      While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow pattern generation itself. Thus, here we show for the first time that CIH resulted in the instability of the swallow motor pattern activated by stimulating PiCo, suggesting PiCo plays a role in its modulation.".

      Thank you, this has been changed

      Could the observed effects be due to a non-specific effect of hypoxia on neuronal excitability? In addition, it should be considered that PiCo-triggered swallows lack the behavioural setting of water-evoked swallows and do not activate the sensory component of the SPG to the same extent as the water-evoked swallows.

      Yes, this is very possible. We stated in our first manuscript that the decrease in PiCo-triggered swallow duration, as compared to water-triggered swallow duration, is likely because oral sensory components are not being activated to the same extent (Huff et al. 2023). Since we do not directly measure neuronal excitability, it is not known (in this study) whether CIH causes changes in the excitability to swallow related areas. However, others have shown increased excitability and activity of Vglut2 neurons after CIH exposure (Kline et al 2007,2010), and we have shown e.g. changes in the excitability of preBötC neurons (Garcia et al. 2016, 2017).

      Lines 293-300: The sentence is not clear. Is there any evidence indicating that glutamatergic neurons are differently affected by hypoxia than cholinergic neurons?

      Thank you, these sentences have been changed to increase clarity. The section now reads: There was no statistical difference in the probability of triggering a swallow during optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 neurons in mice exposed to room air (Huff et al 2023). However, when exposed to CIH, ChATcre:Ai32 and Vglut2:Ai32 mice have a lower probability of triggering a swallow -- in some mice swallow was never triggered via PiCo activation, while water-triggered swallows remained – compared to the ChATcre:Vglut2FlpO:ChR2 mice. While it is possible that portions of the presumed SPG remain less affected by CIH, which could offset these instabilities to produce functional swallows, our data suggest that PiCo targets microcircuits within the SPG that are highly affected by CIH. The NTS is a primary first site for upper airway and swallow-related sensory termination in the brainstem (Jean, 1984). CIH induces changes to the cardio-respiratory Vglut2 neurons, resulting in an increase in cNTS neuronal activity (Kline, 2010; Kline et al., 2007), as well as changes to preBötzinger neurons (Garcia et al., 2017; Garcia et al., 2016) and ChAT neurons in the basal forebrain (Tang et al., 2020). It is reasonable to suggests that CIH has differential effects on neurons that only express ChATcre and Vglut2cre versus the PiCo-specific interneurons that co-express ChATcre and Vglut2FlpO, emphasizing the importance of targeting and manipulating these PiCo-specific interneurons.”

      Lines 372-374: "Here we show that PiCo, a neuronal network which is critical for the generation of postinspiratory activity (Andersen et al. 2016) and implicated in the coordination of swallowing and breathing (Huff et al., 2023), is severely affected by CIH.".

      Thank you, this has been changed.

      Methods

      Line 398: Did you mean Slc17a6-IRES2-FlpO-D?

      Thank you, this has been changed.

      Line 399: were.

      Thank you, this has been changed.

      Line 403: ... expressing both ChAT and Vglut2 and will be reported as ChATcre:Vglut2FlpO.

      Thank you, this has been changed.

      Line 437: Mice of the ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 lines were kept in collective cages with food and water ad libitum placed inside custom-built chambers.

      Thank you, this has been changed.

      Line 479: (Figure 6a in Huff et al., 2022).

      Line 497: What does Fig 7 refer to?

      This should say Figure 1- figure supplement 2, This has been changed

      Lines 501-506: "First, swallow was stimulated by injecting 0.1cc of water into the mouth using a 1.0 cc syringe connected to a polyethylene tube. Second, 25 pulses of each 40ms, 80ms, 120ms, 160ms and 200ms continuous TTL laser stimulation at PiCo was repeated, at random, throughout the respiratory cycle. The lasers were each set to 0.75mW and triggered using Spike2 software (Cambridge Electronic Design, Cambridge, UK). These stimulation protocols were performed in all ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2." .

      Thank you, this has been changed.

      Line 526 and 540: (Fig.6 in Huff et al., 2022) and (Fig.6d in Huff et al., 2022).

      Thank you, this has been fixed

      Line 594: Figure 5 doesn't exist. Please, change the sentence.

      Thank you, this has been fixed

      Line 595 and 609: The reference Kirkcaldie et al. 2012 is referred to the neocortex and doesn't seem appropriate. Please, quote the atlas of Paxinos and Franklin.

      Thank you, this has been changed.

      Reference:

      Please, correct throughout the text editing of references by removing e.g J.M. or A. or David D. and so on. Only surnames should be mentioned.

      Thank you, this has been changed.

      Figures:

      Figure 1. A and B as well as the purple arrow are lacking. In addition, optogenetic stimulation is applied during different periods of inspiratory activity and this could impact the swallow motor pattern. In Bv, Non-LAR seems very similar to LAR. In panel E, please add the number of animals.

      Thank you, this has been fixed.

      We used the same optogenetic protocols in the original paper (Huff et al. 2023) and did not observe any changes to the swallow motor patter in relation to the time PiCo was stimulated. The only phase dependent response seen in both control and CIH is when PiCo Is stimulated during inspiration and a swallow is triggered, inspiration will be inhibited. Therefore, we do not believe variability in swallow motor pattern is dependent on the phase of breathing in which PiCo is stimulated.

      Biv LAR has a pause in EMG activity before the swallow begins (red arrow pointing to the pause). While Bv Non-LAR does not have this pause, rather the two behaviors converge (red arrow). In order for something to be considered an LAR the pause must be present which is why we separated these two motor patterns.

      Figure 1 - Figure Supplement 1. Why do the Authors call the lines "histograms"?

      Thank you, this has been fixed. This is a line graph of swallow frequency in relation to inspiration.

      Tables:

      In tables, data are provided as means and standard deviation. Please, specify this in the Method section.

      Thank you, the following is listed in the methods section: “All data are expressed as mean ± standard deviation (SD), unless otherwise noted.”

      Reviewer #3 (Public Review):

      In the present study, the authors investigated the effects of CIH on the swallowing and breathing responses to PICO stimulation. Their conclusion is that glutamatergic-cholinergic neurons from PICO are not only critical for the gating of post-inspiratory and swallow activity, but also play important roles in the generation of swallow motor patterns. There are several aspects that deserve the authors' attention and comments, mainly related to the study´s conclusions.

      • The authors refer to PICO as the generator of post-inspiratory rhythm. However, evidence points to this region as a modulator of post-inspiratory activity rather than a rhythmogenic site (Toor et al., 2019 - 10.1523/JNEUROSCI.0502-19.2019; Oliveira et al., 2021 - 10.1016/j.neuroscience.2021.09.015). For example, sustained activation of PICO for 10 s barely affected the vagus or laryngeal post-inspiratory activity (Huff et al., 2023 - 10.7554/eLife.86103).

      Yes, we did refer to PiCo as the postinspiratory rhythm generator as defined as Anderson et al. 2016. We base this statement on the following criteria and experiments: In Anderson et al. 2016, we demonstrate that PiCo can be isolated in vitro, that PiCo neurons are activated in phase with postinspiration, and that they are inhibited during inspiration by preBötC neurons via GABAergic mechanisms and not glycinergic mechanisms. We also demonstrate that optogenetically stimulating cholinergic neurons in the PiCo area resets the inspiratory rhythm both in vivo and in vitro. We also show that PiCo when isolated in transverse slices is autorhythmic and that PiCo, like the preBötC in transverse slices can generate respiratory rhythmic activity in vitro and independent of the preBötC. We also demonstrate that PiCo neurons are an order of magnitude more sensitive to opioids (DAMGO) than the preBötC and that local injections of DAMGO into the PiCo area in vivo abolishes postinspiration, and also abolishes the phase delay of the respiratory rhythm. None of these specific rhythmogenic properties have been studied by the Toor study or the Oliveira et al study. Hence, we do not understand why the reviewer cites these studies as evidence for modulation as opposed to rhythmogenic properties. The fact that PiCo is rhythmogenic should not be considered as an “exclusive property”. Specifically, this does not mean that PiCo is also “modulating” the swallow-breathing coordination as we have demonstrated more specifically in the Huff et al study. In the same sentence we also referred to the PreBӧtzinger complex as the inspiratory rhythm generator as defined by Smith et al 1991, and it seems that the reviewer did not object to this reference. But we would like to point out that the same criteria were used to define the preBötzinger complex as we used for PiCo, except that PiCo neurons are better defined than preBötzinger complex neurons. Dbx1 neurons are often used to characterize the PreBötC, but these neurons form a rostrocaudal and ventrodorsal column which involves also glia cells and transcends the preBötC. Glutamatergic neurons are everywhere, and so are Somatostatin or Neurokinin neurons. Moreover, the 1991 study was only performed in vitro, and did not include a histochemical analysis. We would also like to point out that the present manuscript is investigating the role of PiCo in swallow and laryngeal behaviors, and not specifically postinspiration. Thus, we are not entirely sure how this comment relates to this manuscript.

      • The optogenetic activation of glutamatergic and cholinergic neurons from PICO evoked submental and laryngeal responses, and CIH changed these motor responses. Therefore, the authors proposed that PICO is directly involved in swallow pattern generation and that CIH disrupts the connection between PICO and SPG (swallow pattern generator). However, the experiments of the present study did not provide evidence about connections between these two regions nor their possible disruption after CIH, or even whether PICO is part of SPG.

      We have edited the text to suggest PiCo modulates swallow motor sequence in addition to the coordination of swallow and breathing. We have also added that further experiments will be necessary to further investigate the connections between PiCo and SPG. But, unfortunately, compared to PiCo, the SPG is much less defined. As already stated above, it cannot be expected that a single study can address all possible open questions. Clearly, more work needs to be done outside of this study to answer all of these questions, which makes this an exciting area of research.

      • CIH affects several brainstem regions which might contribute to generating abnormal motor responses to PICO stimulation. For example, Bautista et al. (1995 - 10.1152/japplphysiol.01356.2011) documented that intermittent hypoxia induces changes in the activity of laryngeal motoneurons by neural plasticity mechanisms involving serotonin.

      Yes, we thank the reviewer for this comment and we agree that CIH effects multiple brainstem regions. We stated in the manuscript that we are measuring changes in two muscle complexes which spread among three motor neuron pools: hypoglossal nucleus, trigeminal nucleus, and nucleus ambiguus. We have added a discussion on laryngeal activity in the presence of acute bouts of extreme hypoxia, acute intermittent hypoxia, as well as chronic intermittent hypoxia.

      • To support the hypothesis that PICO is directly involved in swallow pattern generation the authors should perform the inhibition of Vglut2-ChAT neurons from PICO and then evoke swallow motor responses. If swallow is abolished when the neurons from this region are inhibited, it would indicate that PICO is crucial to generate this behavior.

      Thank you. We would like to clarify: “involvement” does not mean “necessary for”. Confusing this difference has caused much confusion and debate in the field. Just as an example: We can argue in great length whether inhibition is necessary for respiratory rhythmogenesis in vivo, but I think there is no question that inhibition is involved in respiratory rhythmogenesis in vivo. But to avoid any confusion, we have changed the text to suggest PiCo is involved in the modulation of swallow motor sequence. We agree various additional inhibition experiments are necessary to explain if PiCo is also a necessary component of the SPG, but this is not the question we have set out to address in this study. To specifically target PiCo we must not only inhibit Vglut2 neurons but neurons that express both ChAT and Vglut2. To our knowledge there are no inhibitory DREADD or opsin techniques for cre/FlpO to specifically target these neurons. As stated above, non-experts in the field do not appreciate this technical nuance. However, we have begun to develop novel techniques necessary to inhibit these specific neurons which will be published in the future.

      • In almost all the data presented, the authors observed different patterns of changes in the motor submental and laryngeal responses to PICO activation, including that animals submitted to CIH (6%) presented a "normal" motor response. However, the authors did not discuss the possible explanations and functional implications of this variability.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since we are not using any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not included any functional implications. We have added the following to the manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns.”

      • In Figure 4, the authors need to present low magnification sections showing the PICO transfected neurons as well as the absence of transfection in the ventral respiratory column. The authors could also check the scale since the cAmb seems very small.

      Thank you, added different histology images to have a more comparable cAmb. As well as added lower magnification to show absence of transfection in the VRC.

      • Finally, the title does not reflect the study. The present study did not demonstrate that PICO is a swallow pattern generator.

      We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

    1. Author Response

      eLife assessment

      This valuable study examines the activity and function of dorsomedial striatal neurons in estimating time. The authors examine striatal activity as a function of time and the impact of optogenetic striatal manipulation on the animal's ability to estimate a time interval. However, the task's design and methodology present several confounding factors that mean the evidence in support of the authors' claims is incomplete. With these limitations addressed, the work would be of interest to neuroscientists examining how striatum contributes to behavior.

      We appreciate the editorial process and are grateful for the thorough, detailed, and constructive reviews. We will respond in detail to every point raised by reviewers in a full revision.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition.

      Major strengths:

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used.

      We are glad our main points came through to the reviewer.

      Major weaknesses:

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing?

      Our hypothesis, based on prior behavioral work from our group describing that blocking striatal D1 and D2 dopamine receptors impaired interval timing (De Corte et al., 2019; Stutt et al., 2023) was D1 and D2 MSNs would have similar patterns of activity during interval timing. We will clarify this in the revision.

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances.

      Regarding the results presented in Figures 2 and 3:

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here.

      These are insightful points. We will clarify details of our PCA analysis in the revision. We include PCA for comparisons with our past work (Emmons et al., 2017, 2021; Bruce et al., 2021). Second, it is true that these components can be observed in smoothed data; however, when we generated random data using identical parameters, we found that the variance explained by PC1 was not commonly observed in random data. Third, our goal is to compare between D1 and D2 MSNs, not to interpret the PCs. We will make this explicit in our revision.

      I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.

      This is exactly the analysis shown in Figure 3D. We will clarify this in the revision.

      Relatedly, it seems that the data shown in Figure 2D doesn't support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types.

      This likely refers to Figure 3D. In the revision, we will clarify this analysis, add error bars, and note that our goal was to differentiate D2 and D1 MSNs in this analysis. We will also add to this analysis to better make the poin that D2 and D1 MSNs are distinct, contrary to our hypothesis.

      Regarding the results in Figure 4:

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data.

      This is a great point. Our goal was to fit behavioral activity, not neuronal activity; in our revision, we will do exactly what the reviewer suggests and present data of fits to neuronal activity.

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).

      Our model was inspired by the averages in Figure 2G&H; however, we will fit drift-diffusion models to individual neurons exactly as the reviewer suggests.

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition.

      We will clarify this in our revision, as this is an important point.

      Regarding the results in Figure 6:

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper.

      We agree – we will remove PC2 in Figure 6 and Figure S9 and add context to the PC analysis noting that we are including for 1) comparisons with past work, 2) our observed variance is much higher than observed in random/smoothed data, and 3) we are primarily interested in comparisons between conditions rather than interpreting the components.

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result.

      We agree, although we note D1/D2 blockade changes PC1, which explains the most variance in MSN activity. In the revision, we will show more examples and comment on the robustness of PC1, exactly as the reviewer recommends. The changes in PC1 are rather consistent.

      Also, it seems that if the authors want to claim that this manipulation lowers the drift rate. I think to make this claim, they could fit the DDM model and examine whether D is significantly lower.

      This is a great idea – we will try to do this.

      Regarding the results in Figure 7:

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this?

      We were not clear. The second classifier was predicting response time. This was confusing and we will remove it.

      Impact:

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions.

      Again, we are grateful for the constructive and very insightful comments that we look forward to clarifying in a full revision.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2-expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2-MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis.

      Strengths:

      The authors used multiple approaches including awake mice behavior training, optogenetic-assistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing.

      We appreciate the reviewer’s careful read recognizing the breadth of our approach.

      Weaknesses:

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke?

      We agree. These were presented in detail in our prior work (Bruce et al., 2021; Larson et al., 2022; and Weber et al., 2023) and work from others (Balci et al 2008; Tosun et al., 2016. However, we will work on a detailed behavioral schematic in the revision and move supplementary behavioral data in Figure S1 to the main manuscript.

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text.

      This is a great suggestion – we will do this – and clarify in the above schematic.

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch".

      We analyzed MSN activity on errors in detail Figure 6 of Bruce et al., 2021. These errors are infrequent and inconsistent – we will discuss this in the revision.

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke.

      We agree. The switch time can be vastly different on some trials, making it challenging to compare different lengths and slopes. However, we will clarify the interval as noted above, and we have a few ideas on how to do the analysis the reviewer suggests.

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity.

      We were not clear – we did this analysis exactly the reviewer suggested. We are not pooling any data – instead – as we state on line 620 – we are using linear-mixed effects models to account for mouse-specific and neuron-specific variance. This approach was developed with our statistics core for exactly the reasons the reviewer suggested. Furthermore, we will add to this analysis demonstrative that it is resistant to outliers. Finally, we will include measures of effect size noting that it is a medium to large effect.

      It’s a helpful idea to plot data individually by mice, and we will do so in the revision.

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison, and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity?

      We can certainly include a longer baseline. We can clarify in the revision that mice initiate trials at the rear nosepoke, and this is what initiates the task cues and the temporal interval.

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window.

      This is a great idea, and we have some ideas on how to adapt the GLM analysis to perform this analysis.

      Reviewer #3 (Public Review):

      Summary:

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions.

      Strengths:

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model.

      We are grateful for the reviewer’s consideration of our work and recognizing the strengths of our approach.

      Weaknesses:

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals.

      This is certainly valid, and we will include these points in the revision.

      The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels.

      We are glad that the reviewer raised this. We will add to this analysis demonstrative that it is resistant to outliers. Finally, we will include measures of effect size noting that it is a medium to large effect. Thus, it is significantly above chance, and rather reliable, and supported by our PCA results in Figure 3C.

      In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs.

      Again, this is an important point. We are well aware of heating effects with optogenetics. For the exact reasons noted by the reviewer, we had opsin-negative controls –when the laser was on the exact same time course and parameters – in Figure S5. There were no behavioral effects in controls with identical heating and other effects of the laser. Furthermore, these effects are similar to pharmacological effects in this manuscript and in our prior work (De Corte et al., 2019; Stutt et al., 2023). We will better highlight these issues in the revision.

      Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum.

      This is a great point - we did exactly this experiment in De Corte et al, 2019 with local drug infusions. This earlier study was the departure point for this experiment, although it is challenging to combine focal pharmacological inactivation with recordings in mice (we have extensive experience with this in rats in Parker et al., 2015 and Parker et al, 2015). Furthermore, we have similar local optogenetics effects in this paper. We will include these points in the revised manuscript.

    1. Author Response

      We would like to thank the three reviewers and the eLife editors for their careful analysis of our work, and for their constructive feedback and positive evaluation. We are especially pleased to see echoed in the reviews and in the editorial assessment that our results underline the importance of taking into account glycosylation in viral evolution, immune surveillance, and in the interpretation of complex epistatic interactions. With this provisional response we would like to communicate to the editors, reviewers and to the eLife readership our intention to integrate in the paper a detailed description of the GM1os and GM2os binding site on the RBD with details on the computational approach we used. We agree that this addition will strengthen the work by making it more self-contained. Also, as suggested by the editorial team, we will provide a comprehensive discussion of published data, as a firmer foundation for our findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors develop a useful strategy for fluorophore-tagging endogenous proteins in human induced pluripotent stem cells (iPSCs) using a split mNeonGreen approach. Experimentally, the methods are solid, and the data presented support the author's conclusions. Overall, these methodologies should be useful to a wide audience of cell biologists who want to study protein localization and dynamics at endogenous levels in iPSCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors have applied an asymmetric split mNeonGreen2 (mNG2) system to human iPSCs. Integrating a constitutively expressed long fragment of mNG2 at the AAVS1 locus, allows other proteins to be tagged through the use of available ssODN donors. This removes the need to generate long AAV donors for tagging, thus greatly facilitating high-throughput tagging efforts. The authors then demonstrate the feasibility of the method by successfully tagging 9 markers expressed in iPSC at various, and one expressed upon endoderm differentiation. Several additional differentiation markers were also successfully tagged but not subsequently tested for expression/visibility. As one might expect for high-throughput tagging, a few proteins, while successfully tagged at the genomic level, failed to be visible. Finally, to demonstrate the utility of the tagged cells, the authors isolated clones with genes relevant to cytokinesis tagged, and together with an AI to enhance signal-to-noise ratios, monitored their localization over cell division.

      Strengths:

      Characterization of the mNG2 tagged parental iPSC line was well and carefully done including validation of a single integration, the presence of markers for continued pluripotency, selected offtarget analysis, and G-banding-based structural rearrangement detection.

      The ability to tag proteins with simple ssODNs in iPSC capable of multi-lineage differentiation will undoubtedly be useful for localization tracking and reporter line generation.

      Validation of clone genotypes was carefully performed and highlights the continued need for caution with regard to editing outcomes.

      Weaknesses:

      IF and flow cytometry figures lack quantification and information on replication. How consistent is the brightness and localization of the markers? How representative are the specific images? Stability is mentioned in the text but data on the stability of expression/brightness is not shown.

      To address this comment, we have quantified the mean fluorescence intensity of the tagged cell populations in Fig. S3B-T. This data correlates well with the expected expression levels of each gene relative to the others (Fig. S3A), apart from CDH1 and RACGAP1, which are described in the discussion.

      The images in Fig. 2 show tagged populations enriched by FACS so they are non-clonal and are representative of the diversity of the population of tagged cells.

      The images shown in Fig. 3 are representative of the clonal tagged populations. The stability of the tag was not quantified directly. However, the fluorescence intensity was very stable across cells in clonal populations. Since these populations were recovered from a single cell and grown for several weeks, this low variability across cells in a population suggests that these tags are stable.

      The localization of markers, while consistent with expectations, is not validated by a second technique such as antibody staining, and in many cases not even with Hoechst to show nuclear vs cytoplasmic.

      We find that the localization of each protein is distinct and consistent with previous studies. To address this comment, we have added an overlay of the green fluorescence images with brightfield images to better show the location of the tagged protein relative to the nuclei and cytoplasm. We have also added references to other studies that showed the same localization patterns for these proteins in iPSCs and other relevant cell lines.

      For the multi-germ layer differentiation validation, NCAM is also expressed by ectoderm, so isn't a good solo marker for mesoderm as it was used. Indeed, the kit used for the differentiation suggests Brachyury combined with either NCAM or CXCR4, not NCAM alone.

      Since Brachyury is the most common mesodermal marker, we first tested differentiation using anti-Brachyury antibodies, but they did not work well for flow cytometry. We then switched to anti-NCAM antibodies. Since we used a kit for directed differentiation of iPSCs into the mesodermal lineage, NCAM staining should still report for successful differentiation. In the context of mixed differentiation experiments (embryoid body formation or teratoma assay), NCAM would not differentiate between ectoderm and mesoderm. The parental cells (201B7) have also been edited at the AAVS1 locus in multiple other studies, with no effect on their differentiation potential.

      Only a single female parental line has been generated and characterized. It would have been useful to have several lines and both male and female to allow sex differences to be explored.

      We agree that it would be interesting (and important) to study differences in protein localization between female and male cell types, and from different individuals with different genetic backgrounds. We see our tool as opening a door for cell biology to move away from randomly collected, transformed, differentiated cell types to more directed comparative studies of distinct normal cell types. Since few studies of cell biological processes have been done in normal cells, a first step is to understand how processes compare in an isogenic background, then future studies can reveal how they compare with other individuals and sexes. We hope that either our group or others will continue to build similar lines so that these studies can be done.

      The AI-based signal-to-noise enhancement needs more details and testing. Such models can introduce strong assumptions and thus artefacts into the resolved data. Was the model trained on all markers or were multiple models trained on a single marker each? For example, if trained to enhance a single marker (or co-localized group of markers), it could introduce artefacts where it forces signal localization to those areas even for others. What happens if you feed in images with scrambled pixel locations, does it still say the structures are where the training data says they should be? What about markers with different localization from the training set? If you feed those in, does it force them to the location expected by the training data or does it retain their differential true localization and simply enhance the signal?

      The image restoration neural network was used as in Weigert et al., 2018. The model was trained independently for each marker. Each trained model was used only on the corresponding marker and with the same imaging conditions as the training images. From visual inspection, the fluorescent signal in the restored images was consistent with the signal in the raw images, both for interphase and mitotic cells. We found very few artefacts of the restoration (small bright or dark areas) that were discarded. We did not try to restore scrambled images or images of mismatched markers.

      Reviewer #2 (Public Review):

      Summary:

      The authors have generated human iPSC cells constitutively expressing the mNG21-10 and tested them by endogenous tagging multiple genes with mNG211 (several tagged iPS cell lines clones were isolated). With this tool, they have explored several weakly expressed cytokinesis genes and gained insights into how cytokinesis occurs.

      Strengths:

      Human iPSC cells are used.

      Weaknesses:

      i) The manuscript is extremely incremental, no improvements are present in the split-fluorescent (split-FP) protein variant used nor in the approach for endogenous tagging with split-FPs (both of them are already very well established and used in literature as well as in different cell types).

      Although split fluorescent proteins and the endogenous tagging methodology had been developed previously, their use in human stem cells has not been explored. We argue that human iPSCs are a valuable model for cell biologists to study cellular processes in differentiating cells in an isogenic context for proper comparison. Many normal human cell types have not been studied at the cellular/subcellular level, and this tool will enable those studies. Importantly, other existing cell lines required transformation to persist in culture and represent a single, differentiated cell type that is not normal. Moreover, the protocols that we developed along with this methodology (e.g. workflows for iPSC clonal isolation that include automated colony screening and Nanopore sequencing) will be useful to other groups undertaking gene editing in human cells. Therefore, we argue that our work opens new doors for future cell biology studies.

      ii) The fluorescence intensity of the split mNeonGreen appears rather low, for example in Figure 2C the H2BC11, ANLN, SOX2, and TUBB3 signals are very noisy (differences between the structures observed are almost absent). For low-expression targets, this is an important limitation. This is also stated by the authors but image restoration could not be the best solution since a lot of biologically relevant information will be lost anyway.

      The split mNeonGreen tag is one of the brighter fluorescent proteins that is available. The low expression that the reviewer refers to for H2BC11, ANLN, TUBB3 and SOX2 is expected based on their predicted expression levels. Further, these images were taken with cells in dishes using lower resolution imaging and were not intended to be used for quantification. As shown in the images in Figures 3H, when using a different microscope with different optical settings and higher magnification, the localization is very clear and quantifiable without needing to use restoration (e.g., compare H2BC11 and ANLN). Using microscopes with high NA objectives, lasers and EMCCD or sCMOS cameras with high sensitivity can sufficiently detect levels of very weakly expressing proteins that can be quantified above background and compared across cells. It is worth noting that each tag may be studied in very different contexts. For example, ANLN will be useful for studies of cytokinesis, while the loss of SOX2 expression and gain of TUBB3 expression may be used to screen for differentiation rather than for localization per se. The reason for endogenous tagging is to study proteins at their native levels rather than using over-expression or fixation with antibodies where artefacts can be introduced. Endogenous tags tag will also enable studies of dynamic changes in localization during differentiation in an isogenic background as described previously.

      Importantly, image restoration is not required to image any of these probes! We use it to demonstrate how a researcher can increase the temporal resolution of imaging weakly-expressed proteins for extended periods of time. This data can be used to compare patterns of localization and reveal how patterns change with time and during differentiation. Imaging with fewer timepoints and altered optical settings will still permit researchers to extract quantifiable information from the raw data without requiring image restoration.

      iii) There is no comparison with other existing split-FP variants, methods, or imaging and it is unclear what the advantages of the system are.

      We are not sure what the reviewer means by this comment. In the future, we plan to incorporate an additional split-FP variant (e.g., split sfCherry) in this iPSC line to enable the imaging of more than one protein in the same cell. However, the split mNeonGreen system is still amenable for use with dyes with different fluorescence spectra that can mark other cellular components, especially for imaging over shorter timespans. In addition to tagging efficiency, the main advantage of split FPs is its scale, as demonstrated by the OpenCell project by tagging 1,310 proteins endogenously (Cho et al., 2022). We developed protocols that facilitate the identification of edited cell lines with high throughput. We also used multiple imaging methods throughout the study that relied on the use of different microscopes and flow cytometry, demonstrating the flexibility of this tagging system. Even for more weakly expressing proteins, the probe could be sufficiently visualized by multiple systems. Such endogenous tags can be used for everything from simply knowing when cells have differentiated (e.g., loss of SOX2 expression, gain of differentiation markers), to studying biological processes over a range of timescales.

      Reviewer #3 (Public Review):

      The authors report on the engineering of an induced Pluripotent Stem Cell (iPSC) line that harbours a single copy of a split mNeonGreen, mNG2(1-10). This cell line is subsequently used to take endogenous protein with a smaller part of mNeonGreen, mNG2(11), enabling the complementation of mNG into a fluorescent protein that is then used to visualize the protein. The parental cell is validated and used to construct several iPSC lines with endogenously tagged proteins. These are used to visualize and quantify endogenous protein localisation during mitosis.

      I see the advantage of tagging endogenous loci with small fragments, but the complementation strategy has disadvantages that deserve some attention. One potential issue is the level of the mNG2(1-10). Is it clear that the current level is saturating? Based on the data in Figure S3, the expression levels and fluorescence intensity levels show a similar dose-dependency which is reassuring, but not definitive proof that all the mNG2(11)-tagged protein is detected.

      We have not quantified the levels of mNG21-10 expression directly. However, the increase in fluorescence observed with highly expressed proteins (e.g., ACTB) supports that mNG21-10 levels must be sufficiently high to permit differences among endogenous proteins with vastly different expression levels. To ensure high expression, we used a previously validated expression system comprised of the CAG promoter integrated at the AAVS1 locus, which has previously been used to provide high and stable transgene expression (e.g. Oceguera-Yanez et al., 2016). We acknowledge that it is difficult to confirm that all of the endogenous mNG211-tagged protein is ‘detectable’.

      Do the authors see a difference in fluorescence intensity for homo- and heterozygous cell lines that have the same protein tagged with mNG2(11)? One would expect two-fold differences, or not?

      To answer this question, we measured the fluorescence intensity of homozygous and heterozygous clones carrying smNG2-anillin and smNG2-RhoA. We found homozygous clones that were approximately twice as bright as the corresponding heterozygous clones (Fig. S4H and I). This suggests that the complementation between mNG21-10 and mNG211 occurs efficiently over a range of mNG211 expression, since anillin is expressed weakly and RhoA is expressed more strongly in iPSCs. However, we also observed some homozygous clones that were not brighter than the corresponding heterozygous clones, which could be due to undetected byproducts of CRISPR or clonal variation in protein expression.

      Related to this, would it be favourable to have a homozygous line for expressing mNG2(1-10)?

      Our heterozygous cell line leaves the other AAVS1 allele available for integrations of other transgenes for future experiments. While a homozygous line could express more mNG2(1-10), it does not seem to be rate-limiting even with a highly-expressed protein like beta-actin, and we are not sure that it is necessary. The value gained by having the free allele could outweigh the difference in mNG2(1-10) levels.

      The complementation seems to work well for the proteins that are tested. Would this also work for secreted (or other organelle-resident) proteins, for which the mNG2(11) tag is localised in a membrane-enclosed compartment?

      The interaction between the 1-10 and 11 fragments is strong and should be retained when proteins are secreted. It was recently shown that secreted proteins tagged with GFP11 can be detected when interacting with GFP1-10 in the extracellular space, albeit using over-expression (Minegishi et al., 2023). However, in our work, the mNG21-10 fragment is cytosolic and we have only explored proteins localized to the nucleus or the cytoplasm similar to Cho et al., (2022). By GO annotation, 75% of human proteins are present in the cytoplasm and/or nucleus, which still covers a wide range of proteins of interest. Future versions of our line could include incorporating organelle-targeting peptides to drive the large fragment to specific, non-cytosolic locations.

      The authors present a technological advance and it would be great if others could benefit from this as well by having access to the cell lines.

      As discussed below, some of the resources are already available, and we are working to make the mNG21-10 cell line available for distribution.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is methodological, the main achievement is the generation of a stable iPSC with the split Neon system available for the scientific community. Although it is technically solid, the judgement of this reviewer is that the manuscript should be considered for a more specialised/methodological/resource-based journal.

      Indeed, we have submitted this article under the “tools and resources” category of eLife, which publishes methodology-centered papers of high technical quality. We felt this was a good venue for the audience that it can reach compared to more specialized journals that may be more limited in scope. For example, our system will be a useful resource for cell biologists and they are more likely to see it in eLife compared to more specialized journals.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors present a technological advance and it would be great if others can benefit from this as well. Therefore access to the materials (and data) would be valuable (the authors do a great job by listing all the repair templates and primers).

      We have added several pieces of data and information to the supplementary materials, as described below.

      For instance:

      What is the (complete/plasmid) sequence of the AAVS1-mNG2(1-10) repair plasmid? Will it be deposited at Addgene?

      The plasmids used in this paper are now available on Addgene, along with their sequences [ID 206042 for pAAVS1-Puro-CAG-mNG2(1-10) and 206043 for pH2B-mNG2(11)].

      The ImageJ code for the detection of colonies is interesting and potentially valuable. Will the code be shared (e.g. at Github, or as supplemental text)?

      The ImageJ macro has been uploaded to the CMCI Github page (https://github.com/CMCI/colony_screening). The parameters are optimized to perform segmentation on images obtained using a Cytation5 microscope with our specific settings, but they can be tweaked for any other sets of images. The following text has been added to the methods section: “The code for this macro is available on Github (https://github.com/CMCI/colony_screening)”.

      The cell line with the mNG2(1-10) as well as other cell lines can be of interest to others. Will the cell lines be made available? If so, can the authors indicate how?

      We are in the process of depositing our cell line in a public repository. This process may take some time for quality control. For now, the cells can be made available by requesting them from the corresponding authors.

      (2) How well does the ImageJ macro for detection of the colonies in the well work? Is there any comparison of analysis by a human vs. the macro?

      In our most recent experiment, the colony screening macro correctly identified 99.5% of wells compared to manual annotation (83/84 positive wells and 108/108 negative wells). For each 96-well plate, imaging takes 25 minutes, and it takes 7 minutes for analysis. Despite a few false negatives, we expect this macro to be useful for large-scale experiments where multiple 96-well plates need to be screened, which would take hours manually.

      (3) The CDH labeling was not readily detected by FACS, but was visible by microscopy. Is the labeling potentially disturbed by the procedure (low extracellular calcium + trypsin?) to prepare the cell for FACS?

      It is not clear why the CDH labelling was not detected by FACS. As the reviewer suggests, there could be several reasons: E-cadherin could be broken down by the dissociation reagent (Accutase), or recycled into the cell following the loss of adhesion and the low extracellular calcium in PBS. However, the C-terminal intracellular tail of E-cadherin was tagged, which should not be affected by Accutase. Moreover, recycling into the cell should still result in a detectable fluorescent signal. Notably, the flow cytometry experiments were done as quickly as possible after dissociation to minimize the time that E-cadherin could be degraded or recycled. We also resuspended the cells in MTeSR Plus media instead of PBS, and compared cells grown on iMatrix511 to those grown on Matrigel in case differences in the extracellular matrix affected Ecadherin expression. Another possibility is that the microscopy used for detection of E-cadherin in cells involved using a sweptfield livescan confocal microscope with high NA objective, 100mW 488nm laser and an EMCCD camera with high sensitivity, and perhaps this combination permitted detection better than the detector on the BD FACSMelody used for FACs.

      (4) The authors write that the "Tubulin was cytosolic during interphase" which is surprising (and see also figure 3H), as I was expecting it to be incorporated in microtubules. May this be an issue of insufficient resolution (if I'm right this was imaged with 20x, NA=0.35 and so the resolution could be improved by imaging at higher NA)?

      Indeed, as the reviewer points out, our terminology (cytosol vs. microtubule) reflects the low resolution of the imaging for the cell populations in dishes and the individual alpha-tubulin monomers being labelled with the mNG211 tag, which are present as cytoplasmic monomers as well as polymers on microtubules. However, even in this image (Fig. 2C), the mitotic spindle microtubules are visible as they are so robust compared to the interphase microtubules. Notably, when we imaged cells from the cloned tagged cell line using a microscope designed for live imaging with a higher NA objective (see above), endogenous tagged TUBA1B was even more clearly visible in spindle microtubules, and was weakly observed in some microtubules in interphase cells, although they are slightly out of focus (Fig. 3H). If we had focused on a lower focal plane where the interphase cells are located and altered the optical settings, we would see more microtubules.

      (5) It would be nice to have access to the Timelapse data as supplemental movies (.e.g from the experiments shown in Figure 4).

      We have added the movies corresponding to the timeplase images as supplementary movies (Movies S1-6), with the raw and restored movies shown side-by-side.

      (6) In Figure 3B, the order of the colors in the bar is reversed relative to the order of the legend. Would it be possible to use the same order? That makes it easier for me (as a colorblind person) to match the colors in the figure with that of the legend.

      We have modified the legend in Fig 2B and 3B to be in the same order as the bars.

    1. Author Response

      We are deeply grateful for the highly professional analysis of our work by the Journal Editor and Reviewers. Here is our provisional response to some of the reviewer comments. In our response, we would like to address two comments that were common to all Reviewers' responses. We will thoroughly address all of the Reviewers' comments in the final version of the paper.

      Incomplete analysis of maturational changes of striato-nigral connections.

      In the initial study, we showed that chronic inhibition of striosomal neurons with the DREADD approach during early postnatal development leads to decreased functional innervation of dopaminergic cells by striosomes in adulthood. We have shown that by two approaches: (1) analysis of miniature inhibitory post-synaptic currents (mIPSCs) and (2) analysis of GFP and gephyrin puncta densities around dopaminergic cells. The results from these experiments strongly suggest a decrease in inhibitory drive to dopaminergic neurons of substantia nigra pars compacta, yet we agree that only GFP puncta density can be considered as a direct evidence for weakened striatonigral connections. Reviewers indicated that additional direct measurements of striatonigral synaptic efficacy would be needed to strengthen our conclusions. We completely agree with this statement and will evaluate the possibility of doing the suggested experiments, using optogenetic stimulation of striosomal inputs to dopaminergic neurons.

      Inconsistent description of Ca2+ imaging experiments.

      Unfortunately, there was a general misunderstanding in interpreting the Ca2+ imaging methods description. All our experiments were done so that baseline Ca2+ oscillations and oscillations in the presence of a drug were recorded in the usual ACSF (containing 3 mM KCl) at the patch-clamp setup chamber. So, conditions were exactly the same as for cell-attached and whole-cell recordings. At the end of each experiment, ACSF containing 8 mM KCl was applied. This high-KCl condition was used to calculate the total number of viable cells reacting to elevated potassium concentrations, and this number was taken as 100 %. Therefore, the percents displayed in the paper represent the actively oscillating cells in common ACSF (3 mM KCl), counted as a percent of the total number of cells that responded to the following high potassium stimulation (8 mM KCl). The formula was: (Number of active cells in 3 mM KCl / number of viable cells active at 8 mM KCl)*100.

    1. Author Response

      We appreciate your constructive feedback on our manuscript entitled “Deletion of sulfate transporter SUL1 extends yeast replicative lifespan via reduced PKA signaling instead of decreased sulfate uptake” (ID: eLife-RP-RA-2023-94609). Your comments/suggestions are very helpful for improving our manuscript. In particular, we feel additional experiments and analysis suggested by the reviewers will help strengthen our argument that Sul1 deletion mutant extends lifespan via decreased PKA signaling, instead of via decreased sulfate uptake. Below we outline our response to the reviewer's comments/suggestions and the plans for additional experiments and analysis.

      (1) Our current model is that lifespan extension following SUL1 knockout depends on the PKA signaling pathway but not sulfate transport. To further substantiate this, we plan to conduct further transcriptome sequencing and dynamic sulfate uptake experiments using WT, Sul1D and Sul1E427Q strains. If our model is correct, we expect that PKA signaling pathway will be more repressed in Sul1D strain than in Sul1E427Q strain, but the sulfate transport will be similar in both strains. This will add strong evidences supporting the model in addition to the lifespan data.

      (2) The reviewer mentioned the disparities observed between the lifespan of WT in Figure 1B and other experimental assays. Although it is known that lifespan for WT varies considerably from experiment to experiment (thus the need for WT control for every lifespan measurement), we agree it is important to make a solid conclusion that Sul1E427Q does not extend lifespan. We plan to measure the lifespan of more cells for the mutant strains illustrated in Figure 1B and update the data and charts.

      (3) Other issues, for example, the small images of Msn2/4 in the nucleus, grammar and formatting errors, and the lifespan data of double (Sul1/Msn4) mutants will be addressed in the revised version of the manuscript after we performed the additional experiments/analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      (1) It is unclear whether the authors took into consideration the contribution of nuclear blebs for nuclear volume measurements. This would be particularly relevant in situations of very strong confinement. Blebs were previously shown to affect volume (Mistriotis et al., JCB 2019). One could argue that the decreased nuclear volume was due to the increased blebbing observed in very strong confinements.

      As stated in the main text: “[Nuclear Blebs] had a limited contribution to the increase in nuclearprojected area, as the increase remained significantly different even if protrusions were dismissed to compute the projected area (Fig S3C)”. In addition, a decrease in the nuclear volume was also observed for slight and intermediate confinement (height = 7 and 9 µm), while in these two conditions, no blebs are observed.

      (2) From their experimental setup, it is unclear whether the reduced nuclear volume observed after confined cell division arises from a geometrical constraint or is due to an intrinsic nuclear feature. One could argue that cells exiting mitosis under confinement have clustered chromosomes and, therefore, will have decreased volume. This would imply that the nucleus is not "reset" but rather that a geometrical constraint is forcing nuclei to be smaller. One way to test this would be to follow individual cells under confinement, let them enter mitosis, and then release the confinement. If, under these conditions, the daughter nuclei are smaller, then it supports their model. If daughter nuclei recover to their initial value, then it´s simply due to a geometrical constraint that forces the clustering of chromosomes and the reassembly of the NE in a confined space.

      We agree with the reviewer. As stated in the discussion, “For now, the mechanisms involved remain elusive”, and “Our results call for an in-depth analysis of the molecular pathways at play”. The experiments suggested by the reviewer are definitely important experiments that we plan to carry out. Indeed, it is important to know if cells that were ‘born’ under confinement will retain smaller nuclei in the next generation if confinement is released, or whether the next generation will recover their initial larger nuclei.

      (3) The authors claim that the nucleus adapts to confinement based on evidence that the nucleus no longer shrinks in the second division following the first division. I would argue no further decrease is possible because the DNA is already compacted in the smallest possible volume. If indeed nuclei are in a new homeostatic state as the authors claim, then one would expect nuclei to remain smaller even after confinement is removed. This analysis is missing.

      As mentioned above, we agree that “deconfinement experiments” are indeed important. Nevertheless, we respectfully want to point out that the DNA is not compacted to its maximum level during confinement.

      First, we observed that the nuclei of the second generation of cells born in confinement no longer shrink for all investigated confinement conditions, including for slight confinement (height of 9 µm, corresponding to an initial nuclear deformation of 41%), where DNA is less confined compared to the very strong confinement condition (height of 3 µm, corresponding to an initial nuclear deformation of 70%).

      Second, the total uncompressible volumetric fraction of a cell is smaller than 30% (Roffay et al. PMID: 34785592, Cell Biology by the Numbers ISBN: 9780815345374) this allows a nucleus to be compressed to over 70% of its size, as we observed in the extreme scenario.

      (4) Also, if the authors want to claim that this is a mechanism used for cancer cells to adapt to confined situations as the title says, they need to show that normal, near-diploid cells do not behave in the same way. This analysis is missing.

      We agree with the reviewer. For the revised version, we have planned to analyze cell response to confinement using the RPE-1 cell line, as a model of a diploid and untransformed cell line. This will be important experiments to know if the nuclear mechanism identified in the HT-29 cell line is also at stake for normal cells.

      (5) Authors state that "Loss of nuclear blebs is clearly linked to mitosis, suggesting that nuclear volume and nuclear envelope tension are tightly coupled, and supports the hypothesis that mitosis is a key regulator of nuclear envelope tension". I have a few issues with the way this sentence is written. Firstly, one could say that all nuclear structures (and not only blebs) are lost during mitosis because the nucleus disassembles. Hence, the new homeostatic state could be determined by envelope reassembly after mitosis and not mitosis itself. Thirdly, how can mitosis be a key regulator of nuclear envelope tension when the nucleus is disassembled during the process? These require clarification.

      We agree with the reviewer that the formulation used required clarification that will be made in the revised version: for now, we only have evidence that nuclear volume regulation is at stake at mitosis. The most probable hypothesis is that confinement perturbed NE reassembly after mitosis, and that this perturbed reassembly leads to a change in nuclear volume. Complementary experiments are needed to test such a hypothesis, using cell lines stably expressing LAP2/LAP2b-GFP for instance. It is however delicate experiments that will require a dedicated study on its own.

      Secondly, I don´t understand why the loss of nuclear blebs suggests that volume and tension are tightly coupled.

      Nuclear Blebs appear once nuclei have reached a critical NE tension (Srivastava, et al PMID: 33662810). The fact that cells “born” under confinement have no nuclear blebs means that their nuclei are no longer under tension. This is a direct consequence of the decrease in nuclear volume, implying a coupling between volume and tension.

      (6) The authors claim that, unlike previous studies (Lomakin et al), this work shows a "gradual nuclear adaptation". From their results, this is difficult to conclude simply because they do not analyse cPLA2 levels. This is solely based on indirect evidence obtained from cPLA2 inhibition. A gradual adaptation would mean that based on the level of confinement we would expect to have increasingly higher levels of cPLA2 (and therefore nuclear tension).

      We thank the reviewer for his/her comment. Indeed, we have no direct evidence of gradual cPLA2 recruitment in our study, as we did not analyze cPLA2 levels.

      However, of note, in our study, nuclear volume and tension adaptation occur in the entire range of confinement height (from 3 to 9 µm), with a decrease in nuclear volume inversely correlated with the imposed initial nuclear deformation (fig S2C). On the contrary, in Lomakin et al., for HeLa cells, a threshold of 5 µm confinement is needed to trigger a cell motility response mediated by cPLA2. Such a difference suggests that other parameters are used as a confinement readout by cells during the reassembly of the NE after mitosis.

      (7) The authors should refrain from saying that the mechanism behind DNA repair is coupled to the nuclear adaptation they show. There are several points regarding this statement. Firstly, increased DNA damage could be due to nuclear ruptures imposed by confinement at 2h. In fact, the authors show leakage of NLS from the nucleus after confinement (Figure S3A). Secondly, the decrease in DNA damage at 24h could be because these nuclei did not rupture. How can they ensure that cells with low DNA damage at 24h had increased DNA damage at 2h? Finally, one needs to confirm if the nuclei they are analysing at 24h did undergo a round of cell division previously. From the evidence provided, the authors cannot conclude that DNA damage regulation is occurring in confined cells. Moreover, cell cycle arrest is a known effect of DNA damage. Cells with high damage at 2h most likely are arrested or will present with increased mitotic errors (which the authors exclude from their analyses).

      We need to clarify our analysis workflow: it was only in live experiments that we excluded cells with abnormal cell division, as cell division was visible in the timelapse. For immuno-staining analysis on fixed samples, all non-apoptotic cells were taken into account in the analysis. The decrease in DNA damage observed at 24h thus applies to all cells under confinement. There is a clear difference between 2h and 24h in the 2AX immunostaining (that is used as a proxy for DNA damage): whereas at 2h almost all cells have several foci (10-15 foci per cells on average fig. 3H), the number of foci in the entire cell population decreases to 1-2 foci per cell at 24h. The population at 24h mainly includes cells that have undergone a round of cell division, with >80 % of normal cells, as quantified in Fig. 3 E. In the revised version, we will include as a supplementary figure, a quantification of the percentage of cells having more than 5 foci at 2h and 24h, as well as large field of views for -2AX immunostaining to illustrate the distribution.

      Reviewer #2 (Public Review)

      One major limitation is that all experiments are performed in a single cell line, HT-29 human colorectal cancer cells, which has an unusual nuclear envelope composition as it has no lamin B2, low lamin B1 levels, and contains a p53 mutation. Because lamins B1 and B2 play important functions in protecting the nuclear envelope from blebs and confinement-induced rupture, and p53 is crucial in the cellular DNA damage response, it remains unclear whether other cell lines exhibit similar adaptation behavior.

      We agree that including other cell lines would help generalize our findings. It would be interesting in the future to analyze if a similar regulation exists for other cell types. In particular, as stated in the discussion, it would be very interesting to investigate whether this nuclear adaptation is universal, or if it is a consequence of a dysregulation in a specific cancer pathway. Our current manuscript is relevant as it uncovers the existence of this highly interesting phenomenon.

      Investigating if other cell types have the same capacity to adapt would provide insights into the molecular mechanisms involved. In the revised version, we specifically plan to analyze nuclear response under prolonged confinement in 2 types of cells :(1) normal cells with near diploid characteristics (RPE-1 cell line, as a model of a diploid and untransformed cell line); (2) other colorectal cancer cell lines presenting higher levels of lamin B2 and B1, and no P53 mutation (HCT-116).

      Furthermore, although the time-lapse experiments suggest that reduction in nuclear volume occurs primarily during mitosis, the authors do not address whether prolonged confinement, even in the absence of apoptosis, could also result in cells adjusting their nuclear volume, or alternatively normalizing nuclear envelope tension by recruiting additional membrane from the endoplasmic reticulum, which is continuous with the nuclear membranes.

      Even if we cannot completely ruin the hypothesis raised by the reviewer, we respectfully want to stress that if additional membrane from the endoplasmic reticulum were recruited, we should observe an increase in nuclear volume at S/G2, which is the case only for the strongest imposed confinment (h=3 µm, corresponding to an initial nuclear deformation of 70 % Figure S2E). It should be however very interesting in the future to directly assess nuclear envelope tension and to follow with high resolution live experiments the eventual recruitment of additional membrane.

      Regarding the proposed role of cPLA2, previous studies have shown that cPLA2 recruitment to the nuclear membrane, which is essential to mediate its nuclear mechanotransduction function, requires both an increase in nuclear membrane tension and intracellular calcium. However, the current study does not include any data showing the recruitment of cPLA2 to the nuclear membrane upon confinement, or the disappearance of nuclear membrane-associated cPLA2 during prolonged confinement, leaving unclear the precise function and dynamics of cPLA2 in the process.

      We agree with the reviewer that it would be very informative to analyze the recruitment of cPLA2 in live experiments. We plan to do this in future experiments using cPLA2 immunostaining at different time points or the cPLA2-mKate construct. This will be the subject of a dedicated study, together with possible changes in nuclear pores size and organization, as well as nuclear tension analysis. For this article, we plan to add the analysis of the effect of cPLA2 inhibition in live experiments.

      Lastly, it remains unclear (1) whether the reduction in nuclear volume is caused by a reduction in nuclear water content, by chromatin compaction, e.g. associated with an increase in heterochromatin, or through other mechanisms, (2) whether the change in nuclear volume is reversible, and if so, how quickly,

      We thank the reviewer for his/her comment. This point was also mentioned by Reviewer #1. It is important to know if cells that were ‘born’ under confinement will retain smaller nuclei in the next generation if confinement is released, or whether the next generation will recover their initial larger nuclei. We plan to perform such “deconfinement” experiments and add the results in the revised version. In addition, we also plan to investigate in more detail the DNA compaction state during confinement.

      and (3) what functional consequences the substantial reduction in nuclear volume has on nuclear function, as one would expect that this reduction would be associated with a substantial increase in nuclear crowding, affecting numerous nuclear processes.

      We agree with the reviewer that such a reduction in nuclear volume would most probably affect numerous nuclear processes that would be highly interesting to decipher in the future. Especially, as pointed out in the discussion, “the regulation of nuclear size identified in this study could have important consequences on resistance to classical chemotherapeutic treatments that target proliferation”. This question merits an entire study and is outside the scope of our current manuscript.

      Reviewer #3 (Public Review)

      (1) One essential consideration that goes unaddressed is whether the nuclear volume alone is changing under compression (resulting in a higher nuclear to cytoplasmic ratio) or if the cell volume is changing and the nuclear volume is following suit (no change in the N:C ratio). Depending on which of these is the case, the overall model would likely shift. In particular, interpreting the effect of disrupting myosin II activity given its different distribution at the cortex in response to the higher confinement would be influenced by which of these conditions are at play.

      We agree with the reviewer. As stated in the discussion, “the nuclear to cytoplasmic volume ratio, which is constant within a given population, is most likely to be impacted by confinement and changes in nuclear envelope tension (24, 45, 46), and might be at play in the regulation we describe herein”.

      As mentioned in the results section, “the distance between the cell membrane and the nuclear envelope was significantly reduced with confinement (Fig. 1D, Fig. S1B) and accompanied by the relocalization of the contractility machinery (Phosphorylated Myosin Light Chain (p-MLC) staining) from above the nucleus to the side, indicating a cortex rearrangement (Fig. S1C)”. For the revised version, we plan to investigate if such relocalization is accompanied by a change in the nuclear to cytoplasmic ratio using the p-MLC and nuclei immunostaining performed at 2h and 24h under the entire range of confinement investigated.

      (2) -A key approach used and interpreted by the investigators is an assessment of the folding of the "inner lamin envelope", which they derive from an image analysis routine of lamin staining that they developed and argue reflects "nuclear envelope tension". I am not convinced of the robustness of this approach or what it mechanistically reveals. It may or may not reflect the contour of the inner nuclear membrane, which (perhaps) is the most relevant to the authors' interpretation of nuclear envelope tension. Given the major contribution of this data to the model, which is based on the "unfolding" of the nuclear envelope, an orthogonal approach (e.g. electron microscopy - which one needs to truly address the high-frequency undulations of the nuclear envelope) is needed to support the larger conclusions.

      We agree with the reviewer that the precise measurement of NE surface area is challenging because of the NE folds, and that our approach is provides semi-quantitative information. Higher-resolution approaches would be necessary to investigate that point in more details, using 3D super-resolution. However, we want to point out that even with our limited resolution, the differences observed in lamin A/C staining are striking (Fig. 3A): while lamin folds are completely absent at 2h under strong confinement, inner lamin folds are massively observed at 24h, showing a pattern very similar to the control condition. In the revised version, we will add more representative images to strengthen that our analysis is representative of our observations.

      (3) The authors argue that nuclear tension is lost after mitosis in the confined devices because nuclear volume has decreased. While a smaller nuclear volume might indeed translate to less compressive force from the device on the nucleus, one would imagine that the chromosomes still have to be accommodated and that confining them in a smaller volume could increase the tension. Although arguable, the potential alternative possibilities suggest that actual measurements of nuclear envelope tension are needed to robustly test the model. The authors cite the observation that blebs are less prevalent after mitosis as additional support for this model, but this is expected as nuclear envelope breakdown and reformation will "reset" the nuclear contour while the appearance of blebs at mitotic entry is essential a "memory" of all blebs and ruptures over the entire preceding cell cycle.

      We agree with the reviewer that assessing the nuclear envelope tension would enable a better description of the underlying process. It will be the subject of a dedicated study, together with possible changes in nuclear pore size and organization, as well as the analysis of cPLA2 recruitment.

      The proposed model in the current study is for the moment simply a geometrical model. Given the simplicity of the model, the fit with our experimental points is striking.

      (4) Representative images for the pharmacological perturbations other than blebbistatin are notably absent - only the analyzed data are presented in the manuscript or the supplemental material. How these perturbations (e.g. to cPLA2) also affect the cortex is important to interpret the data given the point raised above. Orthogonal approaches would also strengthen the conclusions (for example, the statement that "nuclear adaptation observed during mitosis requires nuclear tension sensing through cPLA2" requires more evidence to be convincing - it is not sufficiently supported by the data presented). Even if this is the case, the authors acknowledge that cPLA2 is likely not the answer to the adaption observed under the lower degrees of confinement. Thus, the mechanisms underlying the adaptive changes to nuclear volume remain enigmatic.

      We thank the reviewer for this insightful comment, and we plan to add representative images for the pharmacological perturbation in the revised version of the manuscript.

      (5) One more consideration that seems to go without comment is that the cells under confinement do not appear to successfully complete cytokinesis (Fig. 5b). At a minimum this seems like a major perturbation to cell physiology and needs to be more fully discussed by the authors as playing a role in the observed changes in nuclear volume.

      We agree that in the image chosen for Fig. 5b, cytokinesis does not seem to be complete. This is not representative of the entire cell population as 80% of the cell population showed a normal phenotype under very strong confinement with no drug (Fig. 5C and 3E, as well as fig S3D for a representative large field of view). Live experiments using the FUCCI cell lines also show that cells are capable of making several complete divisions under confinement (Fig. 2). Complementary experiments under pharmacological treatments and confinement are planned to extend our analysis of such processes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a valuable finding on the possible use of vilazodone in the management of thrombocytopenia through regulating 5-HT1A receptor signaling. The evidence supporting the claims of the authors is solid, with the combined use of computational methods and biochemical assays. The work will be of broad interest to scientists working in the field of thrombocytopenia.

      Public Review:

      Reviewer #1 (Public Review):

      Summary:

      This is well-performed research with solid results and thorough controls. The authors did a good job of finding the relationship between the 5-HT1A receptor and megakaryocytopoiesis, which demonstrated the potential of vilazodone in the management of thrombocytopenia. The paper emphasizes the regulatory mechanism of 5-HT1A receptor signaling on hematopoietic lineages, which could further advance the field of thrombocytopenia for therapeutic purposes.

      Strengths:

      This is comprehensive and detailed research using multiple methods and model systems to determine the pharmacological effects and molecular mechanisms of vilazodone. The authors conducted in vitro experiments using HEL and Meg-01 cells and in vivo experiments using Zebrafish and Kunming-irradiated mice. The experiments and bioinformatics analysis have been performed with a high degree of technical proficiency. The authors demonstrated how vilazodone binds to 5-HTR1A and regulates the SRC/MAPK pathway, which is inhibited by particular 5-HTR1A inhibitors. The authors determined this to be the mechanistic underpinning for the effects of vilazodone in promoting megakaryocyte differentiation and thrombopoiesis.

      Weaknesses:

      (1) Which database are the drug test sets and training sets for the creation of drug screening models obtained from? What criteria are used to grade the results?

      Response: Thank you for your thoughtful comment. The database is built by our laboratory. Firstly, we collected 39 small molecule compounds that can promote MK differentiation or platelet formation and 691 small molecule compounds that have no obvious effect on MK differentiation or platelet formation to buiid the datbase. Then, the data of the remaining 713 types of small molecule compounds were utilized as the Training set, and the Molecular Descriptors of 2 types of active and 15 types of inactive small molecule compounds were randomly picked as the Validation set. With regard to the activity evaluation criteria, the prediction score for each molecule was between 0 and 1, and the model decision was made with a threshold of 0.5. The molecule with a score above the 0.5 threshold was identified as a megakaryopoiesis inducer (1).

      Reference:

      (1) Mo Q, Zhang T, Wu J, et al. Identification of thrombopoiesis inducer based on a hybrid deep neural network model. Thromb Res. 2023;226:36-50. doi:10.1016/j.thromres.2023.04.011

      (2) What is the base of each group in Figure 3b for the survival screening of zebrafish? The positivity rate of GFP-labeled platelets is too low, as indicated by the quantity of eGFP+ cells. What gating technique was used in Figure 3e?

      Response: We are deeply grateful for the insightful feedback you have provided regarding Figure 3 and the assessment of zebrafish model. We used 50 zebrafish embryos per group to evaluate VLZ toxicity, and we think this is a suitable and fair baseline. Our gating procedure is clearly depicted in the resulting diagram. Since our goal was to evaluate the fluorescence intensity quantitatively, we isolated the entire zebrafish cell. Since the amount of eGFP+ in various zebrafish tissues found in other literature is likewise quite low and we are unsure of the typical eGFP+ threshold for zebrafish (1, 2), we think this finding should be fair given that each group's activities in the experiment were conducted in parallel.

      Reference:

      (1) Yang L, Wu L, Meng P, et al. Generation of a thrombopoietin-deficient thrombocytopenia model in zebrafish. J Thromb Haemost. 2022; 20(8): 1900-1909. doi:10.1111/jth.15772

      (2) Fallatah W, De Silva IW, Verbeck GF, Jagadeeswaran P. Generation of transgenic zebrafish with 2 populations of RFP- and GFP-labeled thrombocytes: analysis of their lipids. Blood Adv. 2019;3(9):1406-1415. doi:10.1182/bloodadvances.2018023960

      (3) In Figure 4C, the MPV values of each group of mice did not show significant downregulation or upregulation. The possible reasons for this should be explained.

      Response: Thank you for your thoughtful comment. Megakaryocytes build pseudopodia, which form extensions that release proplatelets into the bone marrow sinusoids. Proplatelets convert into barbell-shaped proplatelets to form platelets in an integrin αIIbβIII mediated process (1-2). Platelet size is established by microtubule and actin-myosin-sceptrin cortical forces which determine platelet size during the vascular formation of barbell proplatelets (3). Conversion is regulated by the diameter and thickness of the peripheral microtubule coil. Proplatelets can also be formed from proplatelets in the circulation (4). Megakaryocyte ploidy correlates with platelet volume following a direct nonlinear relationship to mean platelet volumes (5). Usually there is an equilibrium between platelet generation and clearance from the circulation (normal turnover) controlled by thrombopoietin. When healthy humans receive thrombopoietin, their platelet size decreases (6). Proplatelet formation is dynamic and influenced by platelet turnover (7) which increases upon increased platelet consumption and/or sequestration. In our study, the MPV values of each group of mice did not show significant downregulation or upregulation, from our point of view, there are several possible reasons for these results.

      (1) Mice in a radiation-damaged state may result in a decrease in platelet count, but at the same time stimulate the bone marrow to release young and larger platelets, thus keeping the MPV relatively stable.

      (2) After radiation injury, bone marrow cells were suppressed, resulting in a decrease in the number of platelets produced, but MPV remained unchanged, possibly because the direct effects of radiation on the bone marrow caused thrombocytopenia, but not necessarily the average platelet size.

      Reference:

      (1) Thon JN, Italiano JE. Platelet formation. Semin Hematol. 2010(3):220-226. doi: 10.1053/j.seminhematol.2010.03.005.

      (2) Larson MK, Watson SP. Regulation of proplatelet formation and platelet release by integrin alpha IIb beta3. Blood. 2006(5):1509-1514. doi: 10.1182/blood-2005-11-011957.

      (3) Thon JN, Macleod H, Begonja AJ, et al., Microtubule and cortical forces determine platelet size during vascular platelet production. Nat. Commun. 2012(3):852. doi: 10.1038/ncomms1838.

      (4) Machlus KR, Thon JN, Italiano JE Jr. Interpreting the developmental dance of the megakaryocyte: a review of the cellular and molecular processes mediating platelet formation. Br. J. Haematol. 2014(2):227-36. doi: 10.1111/bjh.12758.

      (5) Bessman JD. The relation of megakaryocyte ploidy to platelet volume. Am. J. Hematol. 1984(2):161-170. doi: 10.1002/ajh.2830160208.

      (6) Harker LA, Roskos LK, Marzec UM, et al., Effects of megakaryocyte growth and development factor on platelet production, platelet life span, and platelet function in healthy human volunteers. Blood. 2000(8):2514-2522. doi: 10.1182/blood.V95.8.2514.

      (7) Kowata S, Isogai S, Murai K, et al., Platelet demand modulates the type of intravascular protrusion of megakaryocytes in bone marrow. Thromb. Haemost. 2014(4):743-756. doi: 10.1160/TH14-02-0123.

      (4) The PPI diagram and the KEGG diagram in Figure 6 both provide a possible mechanism pathway for the anti-thrombocytopenia effect of vilazodone. How can the authors analyze the differences in their results?

      Response: We are appreciated your valuable comments. PPI (Protein-Protein Interaction) refers to the interaction between proteins. Inside cells, proteins interact with each other to perform various biological functions, influencing cell signaling, metabolic pathways, cell cycle, and more. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database that integrates information on genomes, chemicals, and biological systems. In pharmacoinformatic, KEGG pathways are often used to understand the molecular mechanisms of specific diseases or biological processes. KEGG contains the interrelationships between genes, proteins, and metabolites, helping to reveal key nodes in biological processes. PPI information can be integrated with data from KEGG pathways, such as metabolic and signaling pathways, to gain a more comprehensive understanding of the role of protein-protein interactions in cellular processes and biological functions. For example, by analyzing nodes in the PPI network, proteins associated with a specific disease can be identified, and further examination of these proteins' locations in KEGG pathways can reveal molecular mechanisms underlying the onset and development of the disease. However, this method also has some limitations:

      Uncertainty (1): The construction of protein-protein interaction networks and drug interaction networks involves many assumptions and speculations. The edges of these networks may be based on experimental data but can also rely on bioinformatics predictions. Therefore, the accuracy of predictions is limited by the quality and reliability of the data used during network construction.

      Insufficient data (2): Despite the availability of a large amount of bioinformatics data for network construction, interactions between some proteins and drugs may still lack sufficient experimental data. This data insufficiency can result in inaccuracies in network predictions.

      Dynamics and temporal-spatial changes (3): The dynamics and temporal-spatial changes in biological systems are crucial for drug effects. Pharmacoinformatic may struggle to capture these changes as it often relies on static network representations, overlooking the temporal and dynamic nature of biological systems.

      Reference:

      (1) Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics. 2020(1):442. doi: 10.1186/s12859-020-03773-2.

      (2) Zhang S, Zhao H, Ng MK. Functional module analysis for gene coexpression networks with network integration. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015(5):1146-1160. doi: 10.1109/TCBB.2015.2396073.

      (3) Cinaglia P, Cannataro M. A method based on temporal embedding for the pairwise alignment of dynamic networks. Entropy (Basel). 2023(4):665. doi: 10.3390/e25040665.

      (5)-HTR1A protein expression is measured only in the Meg-01 cells assay. Similar quantitation through western blot is not shown in other cell models.

      Response: Your insightful criticism and recommendation to use different cell models in order to obtain a more accurate depiction of 5-HTR1A protein expression are greatly appreciated. We completely concur that using this strategy would greatly increase the validity of our research. However, establishing a primary megakaryocyte model requires specialized expertise and technical resources, which unfortunately are not readily available to us within the given timeframe. Nevertheless, we acknowledge the limitations of Meg-01 cells, which may exhibit distinct properties compared to true megakaryocytes. To mitigate this concern, we have ensured robust experimental design and rigorous data analysis to interpret our findings within the context of these model cell lines. We believe our results still provide valuable insights into megakaryocyte differentiation and address an important biological question.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to understand the mechanism of how a drug candidate, VLZ, works on a receptor, 5-HTR1A, by activating the SRC/MAPK pathway to promote the formation of platelets.

      Strengths:

      The authors used both computational and experimental methods. This definitely saves time and funds to find a useful drug candidate and its therapeutic marker in the subfield of platelets reduction in cancer patients. The authors achieved the aim of explaining the mechanism of VLZ in improving thrombocytopenia by using two cell lines and two animal models.

      Weaknesses:

      Only two cell lines, HEL and Meg-01 cells, were evaluated in this study. However, using more cell lines is really depending on the workflow and the grant situations of the current research team.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. We fully agree that CD34+ hematopoietic stem/progenitor cells or primary megakaryocytes would provide a more accurate representation of in vitro megakaryopoiesis compared to HEL and Meg-01 cells, which possess limited potential for this process. We acknowledge that our current study did not include experiments with these preferred cell models. This is because our laboratory is still actively developing the technical expertise and resources required for establishing and maintaining primary megakaryocyte and CD34+ cell cultures. Despite the limitations of the current study, we believe the results using HEL and Meg-01 cells provide valuable preliminary insights into the potential effects of VLZ on megakaryocyte differentiation. We are actively working to overcome these limitations and plan to incorporate these more advanced models in our future investigations.

      Reviewer #1 (Recommendations For The Authors):

      I think the authors can enhance the mechanism study by developing more reliable models and methodologies. The connection to clinical research should be strengthened at the same time.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. Despite the limitations, we are committed to expanding our research in the future by incorporating your suggestion and establishing a primary megakaryocyte model to further validate our findings and strengthen our conclusions. At the same time, we wholeheartedly concur with your suggestion to combine clinical research. Unfortunately, VLZ is not a first-line treatment for depression in China, and getting blood samples from the matching number of patients for analysis is a challenge. To give additional experimental support for the medication, we have attempted to improve the data in vivo as much as feasible, including by implementing the intervention in normal mice. Our findings should also contribute to the theoretical underpinnings of this medication and aid in its practical application.

      Reviewer #2 (Recommendations For The Authors):

      Issues the authors need to address:

      Figure 7: Why the band intensity of GAPDH in b or e is much greater than that in f, g, or h?

      Response: Thank you for your careful observation and insightful comment regarding Figure 7. Because the concentration of each batch of protein samples is different, sometimes the GAPDH band strength is increased by the large loading volume. Other factors that may influence the GAPDH band strength include the instrument's contrast adjustment during exposure and the use of different numbers of holes for electrophoresis. Meanwhile, the original three replicate results of all WB results will be provided in the supplementary materials.

      Finally, we sincerely thank you for providing us with this opportunity to make a further revision and modification of our manuscript, and your valuable and scientific comments are useful for the great improvement of our manuscript!

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We wish to thank the reviewers for the time taken to appraise the manuscript and the helpful feedback to improve it. We have taken onboard the suggested feedback and incorporated it into the revision. The findings of the revised manuscript are unchanged. Below is a point-by-point response to specific comments.

      Public reviews

      Reviewer 1

      Thank you to reviewer 1 for the thorough and insightful review of our manuscript. We are pleased that the strengths of our research, particularly the use of whole-genome bisulfite sequencing, the combination of animal and human data, and the investigation of a potential dietary intervention were recognized. We are confident that these aspects contribute significantly to the value and originality of our work.

      We acknowledge the concerns regarding the statistical rigor of the study, particularly the sample size and data analysis methods. We would like to address these points in more detail:

      Sample size: While we agree that a larger sample size would be ideal, the chosen sample size (n=4 per group) is consistent with other murine whole-genome bisulfite sequencing experiments in the field. We have carefully considered the cost-benefit trade-off in selecting this approach. In the revision we discuss the potential limitations of this sample size.

      Data analysis: We acknowledge the inconsistencies in the study reporting and have committed to improving the clarity in the revision. We carefully reviewed the concerns regarding the use of causal language and the interpretation of differences in our results. In some cases, the use of causal language is justified by the intervention study design. We also believe other explanations like stochastic variation affecting the same genomic regions in different tissues, are exceedingly unlikely from a statistical viewpoint. In the revision we have adopted a balanced approach to the language.

      Confounders: We acknowledge the importance of accounting for potential confounders such as birthweight, alcohol exposure and sex. The pups selected for genome analysis were matched for sex and on litter size as a proxy for in utero alcohol exposure. This careful selection of mice for genome analysis was intentionally guided to mitigate potential confounding.

      Statistical rigour: We acknowledge the importance of multiple testing correction in the genome-wide analysis. We used the DSS method of Feng et al (PMID: 2456180) which employs a two-step procedure for assessing significance of a region. Instead of a single p-value for the whole DMR, DSS uses the area statistic to rank candidate regions and control the false discovery rate through shrinkage estimation methods. This approach reduces the risk of reporting false positives due to multiple testing across numerous CpG sites. It is similar in respects to employing local FDR correction at 0.05 level, with an additional minimum effect size threshold applied, and particularly suited to experiments where the number of replicates is low. In the revision we have committed to improving the clarity of the reporting of statistical methods.

      Reviewer 2

      Thank you to reviewer 2 for the comprehensive and valuable feedback on our manuscript. We take your concerns about the generalizability of our findings and the interpretation of certain results seriously. We would like to address your specific criticisms in detail:

      Generalizability and Human Data: We agree that the generalizability of mouse models to human conditions has limitations. However, our study focused on understanding the early molecular alterations caused by moderate PAE, which can be more effectively modelled in a controlled environment like mice. To clarify this, we have strengthened the manuscript by emphasizing the focus on moderate PAE in the title and throughout the paper.

      Transcriptome Analysis: We recognize the importance of investigating the functional consequences of PAE-induced DMRs and agree that transcriptome analysis would be highly valuable. We are currently planning to conduct future transcriptomic studies to understand the link between DMRs and gene expression.

      Species-Specificity and DMR Enrichment: We acknowledge the likelihood of species-specific PAE effects. Our finding of enrichment of DMRs in non-coding regions was consistent with observations from the Lussier study of FASD. We agree there is further work to do and now highlight this in the discussion.

      Tissue Sample Locations: Due to technical restrictions of processing newborn mouse tissue, we are unable to enhance the manuscript with specific tissue regions sampled.

      Interpretation of Shared Genomic Regions: We appreciate your point about the alternative explanation for the shared genomic regions between brain and liver. Our interpretation is that regions identified in the alcohol group only affected equally in both tissues are likely established stochastically (as a result of the exposure) in the early embryo and then maintained in the germ layers. We have revised to suggest this is the most likely explanation and we acknowledge a more detailed examination in more tissues would be warranted for proof.

      Additional Feedback

      Reviewer 1

      Introduction

      • Line 65 - alcohol consumption is not always preventable and these statements further increase the stigma associated with FASD. A better way to say this would be "a leading cause of neurodevelopmental impairments".

      We have implemented this suggestion in revised manuscript.

      • The studies cited in lines 87-89 are somewhat outdated, as several more recent studies with better sample sizes have been published in recent years. I would recommend citing more recent publications in addition to these studies. Similarly, the authors should also cite Portales-Casamar et al., 2016 (Epigenetic & Chromatin) for the validation in humans, as it was the original study for those data.

      We have added a citation for the study mentioned by Portales-Casamar et al. (2016) in the revised manuscript.

      • Lines 95-95 - the authors should elaborate further on the "encouraging results" from choline supplementation studies, as these details may help interpret the findings from their own study.

      In the revised manuscript, we replaced “encouraging results” with “results suggesting a high methyl donor diet (HMD) could at least partially mitigate the adverse effects of PAE on various behavioural outcomes”.

      • Minor point: DNA methylation is preferable to "methylation" alone when not referring to specific CpGs or sites, as methylation can also refer to protein or RNA methylation.

      “Methylation” has been replaced with “DNA methylation” in revised manuscript

      Results

      • Line 118 - HMD should be defined here.

      HMD defined in revised manuscript

      • The figures in the main manuscript and supplemental materials are not in the same order as they are presented in the text.

      We apologise for this and thank the reviwer for their attendtion to detail. In the revision we have corrected the order of figures to match the text.

      • It is concerning that the H20-HMD group had lower baseline weights, which could impact the findings from these analyses. Please discuss how these differences were accounted for in the study design and analyses.

      We appreciate the reviewer's concern about the lower baseline weight in the H20-HMD group. We agree that this difference could potentially affect our findings. However, we want to emphasize that total weight gain during pregnancy was statistically similar across all groups by linear mixed effect model. Additionally, all dams were within the healthy weight range for their strain. While we cannot completely rule out any potential influence of baseline weight, we believe the similarity in weight gain and the healthy range of all dams suggest that the in-utero experience of pups regarding weight-related factors was likely comparable across groups.

      • I have some concerns regarding the cutoffs used to identify the DMRs, particularly given the small N and number of tests. The authors should report the number of DMRs that meet a multiple testing threshold; if none, they should use a more stringent threshold than p<0.05, as one would expect 950,000 CpGs to meet that threshold by chance (19,000,000 CpGs x 0.05). The authors should also report the number of DMRs tested, as this will be a more appropriate benchmark for their analyses than the number of CpGs (they should also report the specific number here).

      We appreciate the reviewer's concerns regarding the DMR cut-offs. We agree that clarifying the methods and justifying our choices is crucial. Our implementation of the DSS method for defining DMRs employs a local FDR p<0.05 cut-off, with additional delta beta threshold of 5%. We have clarified this in the methods section of the revised manuscript . We want to emphasize that the local FDR approach effectively mitigates the concern of chance findings by adjusting for multiple comparisons across the genome. Line 414-420 in the revised methods contains the following amended text

      “Differentially methylated regions (DMRs) were identified within each tissue using a Bayesian hierarchical model comparing average DNA methylation ratios in each CpG site between PAE and non-PAE mice using the Wald test with smoothing, implemented in the R package DSS (46). False-discovery rate control was achieved through shrinkage estimation methods. We declared DMRs as those with a local FDR P-value < 0.05 based on the p-values of each individual CpG site in the DMR, and minimum mean effect size (delta) of 5%”

      • I also have concerns about the delta cutoff for their DMRs. First, it is not clear if this cutoff is set for a single CpG or across the DMR (even then, it is not clear if this is a mean, median, max, min, etc.) Second, since the authors analyzed CpGs with 10X coverage, they can only reliably detect a delta of 0.1 (1/10 reads).

      Thank you for raising this important point. In the revision we have clarified the effect size cutoff reflects the mean effect across CpGs within the DMR as follows (line 418)

      “We declared DMRs as those with a local FDR P-value < 0.05 based on the p-values of each individual CpG site in the DMR, and minimum mean effect size (delta) of 5%”

      We chose the mean as it provides a comprehensive representation of the overall methylation change within the region, while ensuring all individual CpGs used in the analysis had at least 10x coverage. It is not true that we can only detect a delta of 1/10 reads, the mean effect is the relative difference in means between groups and is not dependent on the underlying sequencing depth.

      • Prenatal alcohol exposure is known to impact cell type proportions in the brain, which could lead to differences in DNAm patterns. The authors should address this possibility in the discussion, as well as examine their list of DMRs to determine if they are associated with specific brain cell types. The possibility of cell type differences in the liver should also be discussed.

      We agree with the reviewer that PAE-induced alterations in cell type proportions can influence DNA methylation patterns. While isolating specific cell types in our current study's brain and liver samples was not achievable due to tissue limitations, we acknowledge this as a limitation and recognize the need for further investigations incorporating single-cell or cell type-specific approaches in the discussion.

      • It is interesting, but maybe not surprising, that more DMRs were identified in the liver compared to the brain. This finding would warrant some additional interpretation in the discussion.

      We appreciate and agree that this finding indeed warrants further interpretation. We have added the following sentence into the discussion section of the revised manuscript that provides some potential factors behind this observation.

      Lines 263 “Indeed, most of the observed effects were tissue-specific, with more perturbations to the epigenome observable in liver tissue, which may reflect the liver’s specific role in metabolic detoxification of alcohol. Alternatively, cell type composition differences between brain and liver might explain differential sensitivity to alcohols effects”.

      • Lines 148-149 - I disagree about the enrichment of decreased DNAm in brain DMRs, as 52.6% is essentially random chance. The authors should also include a statistical test here, such as a chi-squared test, to support this statement.

      We agree that a revised interpretation is warranted. The updated manuscript has been amended as follows: “Lower DNA methylation with early moderate PAE in NC mice was more frequently observed in liver DMRs (93.5% of liver DMRs), while brain DMRs were almost equally divided between lower and higher DNA methylation with early moderate PAE (52.6% of brain DMRs had lower DNA methylation with early moderate PAE).”

      • Similarly, I would recommend the authors use increased/decreased DNAm, rather than hypermethylated/hypomethylation, as the latter terms are better suited to DNAm values near 100% or 0%.

      The use of hyper/hypo methylation is still considered common and well understood even for moderate changes. We agree the use of increased/decreased is more inclusive for a broader audience, so we have amended all references accordingly in the main text.

      • Lines 153-155 - please report the statistics to support these enrichment results. A permutation test would be well suited to this analysis.

      The reporting of statistics related to the enrichment test has now been amended to read “Overlap permutation tests showed liver DMRs were enriched in inter-CpG regions and non-coding intergenic regions (p < 0.05), while being depleted in all CpG regions and genic regions except 1to5kb, 3UTR and 5UTR regions, where there was no significant difference (Figure 2f).”

      • Line 156 - "overwhelming enrichment" is a very strong statement considering the numbers themselves.

      Omitted “overwhelming” in revised manuscript. Revised manuscript states: “Using open chromatin assay and histone modification datasets from the ENCODE project, we found enrichment (p < 0.05) of DMRs in open chromatin regions (ATAC-seq), enhancer regions (H3K4me1), and active gene promoter regions (H3K27ac), in mouse fetal forebrain tissue and fetal liver (Table 2).”

      • Lines 165-167 - Please describe the analyses and metrics used to determine if the DNAm differences were mitigated in the HMD groups. As it stands, it is not clear if they are simply not significant, or if the delta was decreased. In terms of a figure, a scatter plot of the deltas for these DMRs would be better suited to visualizing these changes.

      To determine whether DMRs were mitigated we simply applied the same statistical testing procedure on the subset of PAE DMRs in the group of mice exposed to the HM diet. The sample size is the same, and the burden on multiple testing is reduced as we did not test the entire genome. We believe our interpretation stands although we have urged caution in the discussion as follows (line 319)

      “Another key finding from this study was that HMD mitigated some of the effects of PAE on DNA methylation. Although a plausible alternative explanation is that some of the PAE regions were not reproduced in the set of mice given the folate diet, our data are consistent with preclinical studies of choline supplementation in rodent models (34, 35) (36). Moreover, a subset of PAE regions were statistically replicated in subjects with FASD, suggestive or robust associations. Although our findings should be interpreted with caution, they collectively support the notion that alcohol induced perturbation of epigenetic regulation may occur, at least in part, through disruption of the one-carbon metabolism.”

      • Given the lenient threshold to identify DMRs, it is possible that PAE-associated DMRs are simply false positives and do not "replicate" in a different subset of animals. One way to check this would be to determine whether there are any differences between mitigated/unmitigated DMRs and the strength of their initial associations. Should the mitigated DMRs skew towards higher p-values and lower deltas, one might consider that these findings could be false positives.

      We appreciate the reviewer's concern about potential false positives due to the chosen DMR identification threshold. We reiterate the DMR calling thresholds were adjusted for local FDR; however, we acknowledge the need for further validation. We haven't observed this trend of mitigated DMRs having higher p-values and lower deltas, but we have replicated some PAE DMRs in independent human datasets and found support for their biological plausibility in the context of PAE.

      • Related to the HMD analyses, I am concerned that the EtOH-HMD group consumed less alcohol, which could manifest in the PAE-induced DMRs disappearing, unrelated to the HMD exposure. The authors should comment on whether the pups were matched for ethanol exposure and include sensitivity analyses that include ethanol level as a covariate to confirm that their results are not simply due to decreased alcohol exposure.

      We appreciate the reviewer's concern regarding the lower alcohol consumption by Dams in the EtOH-HMD group and its potential impact on DMRs. We agree that consistent in utero exposure is crucial for reliable results. Our pup selection for genomic analysis involved matching litter size as a proxy for in utero exposure, so even through the average alcohol consumption was lower for the EtOH-HMD group, we matched pups across treatment groups based on litter size as a proxy for alcohol intake levels, excluding pups with significantly different exposure levels. We agree more robust methods including direct measurement of blood alcohol content would improve the study. We have now incorporated this into the discussion of the revised manuscript on lines 351: “Additionally, we employed an ad-libitum alcohol exposure model rather than direct dosing of dams. Although the trajectories of alcohol consumption were not statistically different between groups, this introduces more variability into alcohol exposure patterns, and might might impact offspring methylation data”

      • Lines 172 - please be more specific about the neurocognitive domains tested.

      In the revision we have included more detail about the neurocognitive domains tested (originally mentioned in the results) in the methods as follows:

      “These tests included the open field test (locomotor activity, anxiety) (38), object recognition test (locomotor activity, spatial recognition) (39), object in place test (locomotor activity, spatial recognition) (40), elevated plus maze test (locomotor activity, anxiety) (41), and two trials of the rotarod test (motor coordination, balance) (42)”

      • Line 191 - please report the tissue type used in the human study, as well as the method used to estimate cell type proportions.

      We stated in the results section that buccal swabs were used in both human cohorts.

      We added to the revised manuscript that cell type proportions were estimated using the EpiDISH R package.

      • Related to validation, it is unclear whether the human-identified DMRs were also validated in mice, or if the authors are showing their own DMRs. Please also discuss why DMRs might not have been replicated in AQUA.

      We used human data sets to validate observations from our murine model, focusing on regions identified in our early moderate PAE model. This is now explicitly state on line 209 of the revision:

      “We undertook validation studies by examining PAE sensitive regions identified in our murine model using existing DNA methylation data from human cohorts to address the generalizability of our findings.”

      “In the section entitled ‘Candidate Gene Analysis..’ we used our murine data sets to reproduce previously published associations that included regions identified in both animal and human studies. We posit the lack of replication of our early moderate PAE regions in AQUA is explained in part by species-specific differences and considering the striking differences in effect size seen in regions that did replicate in FASD subjects, the exposure may need to be of sufficient magnitude and duration for the effects seen in brain and liver to survive reprogramming in the blood. The AQUA cohort is largely enriched for low to moderate patterns of alcohol consumption.

      • Line 197 - please provide a citation for the ethanol-sensitive regions. There are also several existing DNAm analyses in brain tissues from animal models that should be included as part of these analyses, as several have shown brain-region and sex-specific DMRs related to prenatal alcohol exposure. These contrasts might help the authors further delineate the effects of prenatal alcohol in their model and expand on current literature to explain the deficits caused by alcohol exposure.

      Our candidate gene/region selection was informed by a systematic review of previously published human and animal studies reporting associations between in utero exposure to PAE and offspring DNA methylation. We synthesized evidence across several models, tissues and methylation platforms to arrive at a core set of reproducible associations. Line 481 of the methods now includes a citation to our systematic review which details our selection criteria.

      Discussion

      • Line 211 - This is a strong statement for one hypothesis. It is also possible that different cell types have similar responses to prenatal alcohol exposure. In this scenario, perturbations need not arise before germ layer separation. The authors should soften this causal statement.

      We appreciate this point although given the genome size relative to the size of the DMRs we have detected, the chance that different cell types would respond similarly in exactly the same regions seems exceedingly rare. We posit a more likely explanation is early perturbations in the embryo are established stochastically as a result of the exposure (supported by the interventional design) and maintained in the differentiating tissues. We agree further work is needed to prove this, specifically in a wider set of tissues from multiple germ layers so we have amended the discussion as follows:

      “These perturbations may have been established stochastically because of alcohol exposure in the early embryo and maintained in the differentiating tissue. Further analysis in different germ layer tissues is required to formally establish this.”

      • Lines 222-224 - I completely agree with this statement. However, the authors had the opportunity to examine dosage effects in their model as they measured alcohol-levels from the dams. At the very least, I would recommend sensitivity analyses in their DMRs to assess whether alcohol level/dosage influences their results.

      Although a great suggestion to improve the manuscript, we did not have opportunity to examine dosages by design as we selected mice for genome analysis with matched exposure patterns. It would be fascinating to conduct a sensitivity analysis.

      Methods:

      • Please include the lysis protocol.

      Thank you for picking up this error in our reporting. We have now included the following details in the methods which improve the reproducibility of this study: “Ten milligrams of tissue were collected from each liver and brain and lysed in Chemagic RNA Tissue10 Kit special H96 extraction buffer”.

      • Please include the total reads for each sample and details of the QC pipeline, including filtering flags, quality metrics, and genome build.

      Thank you for suggesting improvements to our reporting which improve the reproducibility of this study. We have included a new supplementary tableTab of sequencing statistics and details of the quality metrics. Please note the genome build is explicitly stated in the methods already.

      • Please make your code publicly available to ensure that these analyses can be replicated.

      Thank you for this suggestion. A data availability statement has now been included in the revision and code will be made available upon request

      • Why were Y chromosome reads included in the dataset?

      Y chromosomal reads were not included in the DMR analysis. Amended “We filtered the X chromosomal reads” to “We filtered the sex chromosomal reads” in revised manuscript.

      • Please provide the number of total CpGs available for analysis.

      Added sentence into results section of revised manuscript: “A total of 21,842,961 CpG sites were initially available for analysis.” We also clarified that the ~19,000,000 CpGs were analysed following coverage filtering.

      • Please provide the parameters for the DMR analysis and report how the p-values and deltas were calculated.

      We have addressed this in previous comments

      • The supplemental materials for the human data are missing.

      Thank you for picking up this oversight. The revision now includes an additional data supplement which details the analysis of the human data sets for interested readers.

      Tables and figures

      • Table 1. It is not clear how the DMRs for this table were selected. The exact p-values and FDR should also be reported in this table. The number of CpGs in these DMRS should also be reported.

      Table 1 includes select DMRs that were consistently detected in both brain and liver tissue. These are particularly of interest as they represent regions highly sensitive to alcohol exposure. We agree that exact reporting of p-values would be ideal. Instead of a single p-value for the whole DMR, DSS uses the area statistic to rank candidate regions and control the false discovery rate (FDR) through shrinkage estimation methods. In the revision we have now included region size and number of CpGs in table 1.

      • Table 3. Please include p-values for the DMR analyses.

      As above we report the area-statistic which is an equivalent measure to assess evidence for differential methylation.

      • Figure 2 (Figure 4 in revised manuscript). Please report the N for these analyses. It also seems that the pairwise t-tests were only compared to the H20-NC, which does not provide much insight into the PAE group. The relevance of the sexP analysis to the present manuscript is also unclear.

      Figure 2 is now Figure 4 in the revision and the sample size has been included in figure legend. We compared all groups to the control group (H20-NC) as we aimed to determine any differences in intervention groups from the control.

      We apologies for lack of clarity around the ‘sex P’ terminology. This refers to the p-value for the main effect of sex on the behavioural outcome. We agree it lacks relevance since the regression models were adjusted for sex. In the revision we have updated the methods as follows (line426) and removed references to sex P

      “To examine the effect of alcohol exposure on behavioural outcomes we used linear regression with alcohol group (binary) as the main predictor adjusted for diet and sex.”

      • Figure 3ef (Figure 2ef in revised manuscript). It is unclear how the regions random regions were generated. A permutation test would be relevant to determine whether there are any actual enrichment differences.

      As stated in methods section: “DMRs were then tested for enrichment within specific genic and CpG regions of the mouse genome, compared to a randomly generated set of regions in the mouse genome generated with resampleRegions in regioneR, with equivalent means and standard deviations.”

      • Figure 5. Please include the gene names for these DMRs, as well as their genomic locations. It would also be relevant to annotate these plots with the max, min, and mean delta between groups.

      Thank you, we considered this however the DMRs are not in genes so we cannot apply a gene label. The locations are reported on the x-axis and the statistics are shown in Table 3.

      • Figure S1b and S2c- It is quite worrisome that the PAE-HMD group drank less throughout pregnancy than their PAE counterparts. Please discuss how this was addressed in the analyses.

      We appreciate the reviewer's concern regarding the lower alcohol consumption in the PAE-HMD group and its potential impact on DMRs. We agree that consistent in-utero exposure is crucial for reliable results. Although the total amount of liquid consumed over pregnancy was lower in this group, they started with a lower baseline and the trajectory was not statistically different compared to other groups.

      We have now incorporated this into the discussion section of the revised manuscript on lines 336: “Additionally, we employed an ad-libitum alcohol exposure model rather than direct dosing of dams. Although the trajectories of alcohol consumption were not statistically different between groups, this introduces more variability into alcohol exposure patterns, and might might impact offspring methylation data.”

      • Figure S1cd. See my comments about Figure 2.

      Suggested changes have been incorporated.

      • Figure S2d. it is not clear to what the statistics presented in this panel refer. Please clarify and discuss the implications of dietary intake differences on your findings.

      Added sentence to caption in revised manuscript: “Statistical analysis involved linear mixed-effects regression comparing trajectories of treatment groups to H2O-NC baseline control group.”

      • Figure S3. See my comments about Figure 2.

      Suggested changes have been incorporated

      • Figure S4. I am confused by the color legend, as it seems both colors are PAE. I also do not see how any regions show increased or decreased DNAm in PAE based on this plot (also no statistics are presented to support these conclusions).

      The plot is intended to show there are no gross changes in methylation when averaged across all CpGs within different regulatory genomic contexts. Statistics are not included as it is intuitive from the plot that the means are the same. We have updated the figure legend which now reads

      “Figure S4. No evidence for global disruption of methylation by PAE. The figure shows methylation levels averaged across CpGs in different regulatory genomic contexts. Neither brain tissue (A & B), nor liver tissue (C & D) were grossly affected by PAE exposure (blue bars). Bars represent means and standard deviation.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the editor and reviewers’ careful and professional assessment of this manuscript. We are delighted with the reviewers’ instructive comments and suggestions. We have tried to address the raised points comprehensively. The reviewers’ scrutiny has helped us immensely to discuss and present our work extensively and properly. We are grateful for the reviewers’ efforts and insights. The detailed responses are listed here.

      Recommendations for the authors

      (1) The intuition behind the model is not properly explained, i.e., the derivation of Eqs. 1-2 and the biological meaning of the AA/OO logic modes. A different notation could be helpful.

      We thank the reviewers for this comment, and agree that the interpretation of our model in manuscript was indeed in need of improvement. We have incorporated this suggestion into the manuscript. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      In general, considering the diverse audience including those with experimental background, we feel that it is essential to present this manuscript in a more digestible manner. We therefore retain the entire derivation of Eqs. 1-2 in the supplementary method. We have added a qualitative introduction to model derivation and molecular biological significance underlying different logic motifs (AND-AND/OR-OR) in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 161-167 (see below).

      “X and Y are TFs in the CIS network. n1 and n2 are the coefficients of molecular cooperation. k1-k3 in Eq1 and k4-k6 in Ep2 represent the relative probabilities for possible configurations of binding of TFs and CREs. (Fig2.A). d1 and d2 are degradation rates of X and Y, respectively. Here, we considered a total of four CRE’s configurations as shown in Figure 2A (i.e., TFs bind to the corresponding CREs or not, 22=4). Accordingly, depending on the transcription rates (i.e., r0x, r1, r2, r3 in Eq1, similarly in Eq2) of each configuration, we can model the dynamics of TFs in the Shea-Ackers formalism[1, 2].

      Thus, the distinct logic operations (AND/OR) of two inputs (e.g., activation by X itself and inhibition by Y) can be further implemented by assigning corresponding profile of transcription rates in four configurations (Fig2.A). From the perspective of molecular biology, the regulatory logics embody the complicated nature of TF regulation that TFs function in a context-dependent manner. Considering the CIS network, when X and Y bind respective CREs concurrently, whether the expression of target gene is turned on or off depends on the different regulatory logics (specifically, off in the AND logic and on in the OR logic; Fig2.A). Notably, instead of exploring the different logics of one certain gene[3, 4], we focus on different combinations of regulatory logics due to dynamics in cell fate decisions is generally orchestrated by GRN with multiple TFs.”

      (2) More clearly specify the used parameters and how these are chosen. This would be helpful to get a more quantitative grasp of the conditions that they compare.

      We appreciate the reviewers pointing out unspecified parts in the main text. We have now included related discussion in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”).

      We would like to highlight that the Boolean models with different logic motifs (Fig. 2B) explicitly display the difference of state spaces (i.e., attractor basin). Moreover, as the focus of this work is on the role of regulatory logics in cell fate decisions, we ponder that it is rational to specify the geometry of the landscape based on the hint from Boolean models. Therefore, we reason that it is intuitive and reliable to assign values to used parameters by mapping our ODE models (Eqs. 1-2) to corresponding Boolean models qualitatively (refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”). In producing Figure 2-5, setting of parameters was performed in a heuristic way without particular searching. However, to draw general conclusions, like the "trade-offs between progression and accuracy" and the presence of the fully-connected stage, we sampled a substantial number of sets parameters to ensure statistically robust findings.

      (3) Include the explanation of how the nullclines and basins shown in the figures (e.g., Fig. 2C, Fig. 4C, Fig. 4F, etc.) are calculated.

      We thank the reviewers for this suggestion. We have incorporated this into the legend of corresponding figures when first mentioned in the main text. Please refer to Page 7 of the revised manuscript, lines 217-223 (see below).

      “Fig2.C:

      (C) State spaces of the AND-AND (top panel) and OR-OR (bottom panel) motifs in ODE models. Dark and red lines represent nullclines of respectively. Stable steady states (SSS) are denoted as orange dots. Unstable Steady States (USSs) are denoted as white dots. Each axis represents the concentration of each transcription factor, which units are arbitrary. Blue, green and purple areas in state spaces indicate attractor basins representing LX, S and LY, respectively. Color of each point in state space was assigned by the attractors they finally enter according to the deterministic models (Eq1, Eq2). These annotations were used for the following Figure 3-7.”

      (4) Clarity on the decisions in the work is needed. For example, the "introduction" of asymmetry of the noise levels (as stated in line 215) appears completely arbitrary. The reason behind it can be guessed in the following paragraph, but the reader shouldn't have to guess.

      We agree entirely with the reviewers’ comment. Indeed, this should have been stated more explicitly. The motivation for incorporating asymmetry in the noise levels stems from our endeavor to mimic the inherent biological variability in gene expression within a cell population. We have adjusted the manuscript to better convey the motivation for investigating asymmetric noise level. Please refer to Page 8 of the revised manuscript, lines 237-238 (“In biological systems, it is unlikely that the noise level of different genes is kept perfectly the same.”).

      (5) Arbitrary and/or out-of-context jargon is used throughout the manuscript, making it hard to read and follow what the authors mean in some cases. For example, "temporal fully-connected stage" is used for the first time in line 290, and the term is not explained either in the main text or in the manuscript. Similarly, the reference to a Boolean-like and Boolean model (line 163 and Figure 1) without clarifying if this is just an analogy or if a formal model is built, nor the utility and implications of this comparison. Another problem related to jargon occurs on line 291, where the authors talk about "parameter sensibility", but such analysis (as it is normally understood in the field) is never performed; the authors perform a parameter exploration and make some general conclusions about the parameter space, but that is different than a parameter sensitivity analysis.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      Regarding the jargon of "temporal fully-connected stage", we realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 11 of the revised manuscript, lines 323.

      We thank the reviewers for pointing out the lack of clarity concerning the Boolean models. We have now amended the manuscript to make this implicit expression explicit. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). Specifically, we employed the Boolean models (Fig.2B) as the reference to assist us to heuristically evaluate the applicability of used parameters in the ODE models. Therefore, the Boolean models are built formally, and corresponding updated rules are listed in Fig.2A (refer to the middle row in the table called “Logic Function”, now also noted in the legend of Fig.2B, Page 7, lines 213-214). Nevertheless, we do utilize the analogy between the attractor basins from Boolean models and ODE models (refer to Fig.2B-C). Accordingly, we used the term “Boolean-like” to describe the landscape presented by the continuous models (Eqs. 1-2; refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”).

      We appreciate the reviewers for this valuable comment, and agree that the usage of “parameter sensibility” was in need of adjustment. We have now amended the manuscript. Please refer to Page 10 of the revised manuscript, lines 318-321 (see below).

      “To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif).”

      (6) Probably related just to the language clarity (i.e., the abuse of jargon), but we don't understand the conclusion on lines 296-298.

      We thank the reviewers for this comment. We have adjusted the manuscript accordingly. Please refer to Page 11 of the revised manuscript, lines 323-327 (see below). And we hope that the reviewers agree with our attempt at mapping into the particular stage in cell fate decisions from the point of landscape.

      “Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      (7) The so-called "solution landscape" in Figure 4E needs to be better explained.

      We thank the reviewers for this comment. We have introduced the concept of solution landscape, which is a pathway map consisting of all stationary points and their connections, in lines 196-198 of the revised manuscript (see below).

      “Furthermore, we introduced the solution landscape method. Solution landscape is a pathway map consisting of all stationary points and their connections, which can describe different cell states and transfer paths of them [82-84].”

      In Figure 4E, we added detailed explanation of the solution landscape for the AND-AND motif. Specifically, it describes a hierarchical structure including one 2-saddle (yellow triangle), three 1-saddles (crimson X-cross sign), and three attractors (green dot). The layer of 1-saddles is represented by a blue translucent plane, and the bottom layer is the flow field diagram. The connections from 2-saddle to 1-saddles and from 1-saddles to the attractors are represented by red and blue lines, respectively. The arrow and color of the heatmap correspond to the flow direction and the length of the acceleration at each point in the state space.

      (8) Table S1 is not properly annotated, and then it is impossible to interpret how it supports the observations in the paragraph in lines 342-342.

      We appreciate the reviewers’ useful feedback. We have refined the annotations of all tables in our manuscript (Table S1-3). Please refer to “Supplementary Table” in resubmitted files.

      Specifically, we randomly collected 6,231 sets of parameters for the AND-AND motif and 6,682 sets for the OR-OR motif (k1-k6 in Eq1 and Eq2; refer to Page 6 of the revised supplementary method, see below).

      “First, to collect parameter sets with 3 SSSs, we used Latin hypercube sampling (LHS) to screen k-series parameters symmetrically (i.e., k1 = k4, k2 = k5, k3 = k6) ranging from 0.001 to 5 both in the AND-AND and OR-OR motifs. We ultimately collected 6,231 sets for the AND-AND motif and 6,682 sets for the OR-OR motifs (Table S1).”

      To analyze the sequence of vanishing SSSs, we further filtered parameter sets with 2 SSSs remained as increasing ux (corresponding to Eq3 in the revised manuscript, Page 10, lines 293). We then got a collection of 6,207 sets for the AND-AND motif and 6,634 sets for the OR-OR motif. Based on these parameter settings, we checked if the observations (refer to Page 13, lines 377-378, “The distinct sequences of attractor basin disappearance as ux increasing can be viewed as a trade-off between progression and accuracy.”) are artifacts of particular parameter choice.

      (9) The flow in Section 5 needs to be reorganised. For instance, it is not clear which question the authors are addressing in line 395, or how the proposed approach answers the question stated in lines 381-382.

      We greatly thank the reviewers for pointing this out, and acknowledge that the Section 5 was definitely in need of improvement. We have now amended the manuscript to make this implicit understanding explicit. Please refer to Page 15 of the revised manuscript, lines 426-430 (see below).

      “In prior sections, we systematically investigated two logic motifs under the noise- and signal-driven modes in silico. With various combinations of logic motifs and driving forces, features about fate-decision behaviors were characterized by computational models. Next, we questioned whether observations in computation can be mapped into real biological systems. And how to discern different logic motifs and driving modes is a prerequisite for answering this question.

      To end this, we first evaluated the performance of different models, specifically in simulating the process of stem cells differentiating towards LX (Fig6.A).”

      (10) There are two important weak points for the successful classification of the regulatory logic of real gene expression data as presented in the manuscript: (1) the small number of time-points in the datasets and clear peaks in gene expression heterogeneity cannot be identified, and (2) it is not always clear whether cell differentiation really exclusively relies on a CIS network, and which genes constitute it. These limitations should be solved or at least discussed in the manuscript.

      We thank the reviewer for this comment. First, we agree entirely that analysis of datasets with more time points will be more amenable to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). We have also extended the discussion to include above points to explicitly note the limitations regarding the used datasets. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      In regards to second point, we do acknowledge that the CIS network may not always be the core module for every fate-decision case (but to our knowledge, this can be assumed in many cases, especially in binary tree-like pattern). For applicability and potential relevance to our intended readership, we developed the models and draw our conclusions primarily based on the CIS topology for its representativeness. We intend to incorporate diverse topologies (like mutual activation with self-activation, Feed-Forward Loop, etc.) in our computational framework presented here in near future. Additionally, we have incorporated this point into the discussion in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 766-769 (see below).

      “Notwithstanding the fact that the CIS network is prevalent in fate-decision programs, there are other topologies of networks that serve important roles in the cell-state transitions, like feed-forward loop, etc. The framework presented in this work should further incorporate diverse network motifs in the future.”

      As referred by the reviewers, even if given the CIS network, we may not sure about which genes constitute it in some cases. We agree that further extension of our framework to mining key regulators is an interesting question. We also note that we have become very enthusiastic about recent work that shows how to nominate core factors from high-throughput data[8, 9]. Of note, in the last section of our manuscript titled “The chemical-induced reprogramming of human erythroblasts (EBs) to induced megakaryocytes (iMKs) is the signal-driven fate decisions with an OR-OR-like motif”, we leveraged patterns of temporal expression variance to filter out key regulators (Fig7.F and H). We thus underline the potential of mining genes comprising core GRN circuits through expression variance. Nevertheless, as the focus of the present paper is on the role of regulatory logic in cell fate decisions, we feel it is beyond the scope of the present article to continue the development of our results on this point. Instead, we have included discussion of case that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (11) The models used in Figure S5 are never clearly described.

      We thank the reviewers for pointing this out. We have now introduced the settings of the models used in Figure S5 more clearly in the legend (see below).

      Two logic motifs with the noise-driven mode (FigS5.A, see below):

      Author response image 1.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Simulation was preformed 1000 times for each pseudo-time point, with each temporal state (from left to right) recorded as a dot on the plot. Top panel: Noise level of X (σx) is set to 0.21, and σy is 0.09. Bottom panel: Noise level of Y (σy) is set to 0.21, and σx is 0.09. Red arrow represents the direction of fate transitions of S to LX. Other than adding a white noise, parameters were identical with those in Figure 2C.”

      Two logic motifs with the signal-driven mode (FigS5.B, see below):

      Author response image 2.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.06. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.09 (0, 0.045, 0.09, from left to right). Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.05. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.24 (0, 0.12, 0.24, from left to right). Red arrow represents the direction of fate transitions of S to LX. Other model’s parameters were identical with those in Figure 2C.”

      (12) Up until Section 5, "noise levels" have been used to refer to an input/parameter in the model. Here it is assumed as an emergent property. Are the authors talking about the variance in expression (e.g., see line 398)? Is it defined as the coefficient of variation? Clarity is essential to interpret the observations in this section, e.g., "different driving modes change in the patterns of noise rather than expression levels" (lines 399-400).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation. For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly, and hope this revision will be helpful for interpreting our result. Please refer to Page 15 of the revised manuscript.

      (13) "Pulse-like behaviour" is used in an arbitrary way, not as it is normally used in the field. Moreover, we consider this jargon expression does not contribute to the understanding of the paper. (The authors probably meant "discrete transitions" vs "gradual transitions".)

      We appreciate the reviewers’ valuable feedback regarding our use of the term “Pulse-like behavior”. We agree with the reviewers’ statement, and acknowledge that terminology of noise level’s patterns between different driving modes (noise-driven vs signal-driven; refer to Section 5 in our manuscript) was in need of improvement.

      Upon comprehensive consideration, we primarily decided to adopt the terms “monotonic transitions” and “nonmonotonic transitions” to recapitulate the trends of noise level, underlining the distinct temporal noise’s patterns in cell fate decisions brought by two driving forces in a more contrastive way. We anticipate that current jargon expressions will be beneficial for interpreting our work. Please refer to Page 15 of the revised manuscript.

      (14) The temporal resolution of the scRNAseq datasets that the authors used is too low to unambiguously distinguish a discrete pattern of gene expression heterogeneity from a rising profile. This limitation needs to be at least acknowledged in the text. Alternatively, the authors might want to identify more recent datasets with higher time resolution.

      We appreciate the reviewers’ insightful suggestions. We agree that analysis of datasets with higher time resolution will be more unambiguous to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). Nevertheless, we recognize this limitation should be mentioned in the paper. So, we have also extended the discussion to include above points. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      (15) In the case of embryonic stem cell differentiation, an additional complication is that this protocol yields heterogeneous cell type mixtures, whereas the authors' simulations usually are designed to give differentiation towards a single cell type. This difference makes it difficult to compare measures of gene expression heterogeneity between simulations and the experimental system to infer regulatory logic questionable.

      We thank the reviewers for this valuable comment and realize that we were not clear enough in the manuscript regarding the case of embryogenesis. In the biological system devised by Semrau et al[10], mouse embryonic stem cells (mESCs) differentiates into two lineages simultaneously, just as mentioned by the reviewers. We noticed this additional complication and performed other simulations in two logic motifs with increasing noise level of gene X and Y, as presented in Fig.S6E (see below).

      Author response image 3.

      “(E) Time courses on the coefficient of variation in expression levels of X and Y genes in silico during differentiation under the noise-driven mode. Initial values were set to the attractors of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.14. Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.1. Stochastic simulation was preformed 1000 times for each pseudo-time point.”

      Given the noise-driven mode, we further employed the expression pattern of Gbx2-Tbx3 circuit to heuristically infer the logic motif.

      (16) In contrast to the hematopoiesis example, the authors do not focus on a specific gene regulatory circuit with the ESC dataset. How their approach is possible on genome-wide data needs to be discussed.

      We thank the reviewers for this comment. Indeed, the core GRN orchestrating the fate-decision process reported by Semrau et al[10] is not fully elucidated. We here focus on the Gbx2-Tbx3 circuit (Fig.6H, Fig.S6D). These two TFs were filtered out from 22 candidate TFs and suggested as potential key regulators in the original paper[10]. Accordingly, at this point we followed the original paper’s statement.

      In regards to extension into biological systems without specific gene regulatory circuits, we have included discussions about the possibility that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (17) [In supplemental material, pp.1] Possible typo: "In our word, we considered a GRN comprised...".

      Thanks for spotting this typo. We have amended it in the revised supplemental method (refer to Page 1 of the revised supplementary method).

      (18) [In supplemental material, pp.1] In Eqs. (1), the notation for the function HX([X]) implies that HX only depends on X, leaving the combinatorial regulation out. HX([X],[Y]) would be more general and accurate.

      Thanks for pointing this out. We have incorporated this suggestion into the manuscript. Please refer to Page 1 of the revised supplementary method.

      (19) [In supplemental material, pp.1] There are several works that have shown that the Hill coefficient is rarely representative of the number of binding elements. The model can be more general. See, for example, «Santillán, Moisés. "On the Use of the Hill Functions in Mathematical Models of Gene Regulatory Networks." Mathematical Modelling of Natural Phenomena 3, no. 2 (October 22, 2008): 85-97. https://doi.org/10.1051/mmnp:2008056.» and «Nam, Kee-Myoung, Rosa Martinez-Corral, and Jeremy Gunawardena. "The Linear Framework: Using Graph Theory to Reveal the Algebra and Thermodynamics of Biomolecular Systems." Interface Focus 12, no. 4 (June 10, 2022): 20220013. https://doi.org/10.1098/rsfs.2022.0013.»;

      We thank the reviewer for drawing our attention to this and highlighting the above works. Indeed, this is important information to include in the manuscript. We have incorporated this suggestion into the revised supplemental method (refer to Page 1 of the revised supplementary method). These references have now been included in the revised supplemental method (refer to references [2]-[3]).

      (20) [Minor] The configuration labels can be confusing, especially the AA, which is rather an AND NOT gate.

      We thank the reviewers for this comment. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      (21) [Minor] Very low printing quality in Figure 1.

      Thanks for the feedback regarding the printing quality of Figure 1. We have made the necessary adjustments to improve its quality. We have also ensured that all other figures in the manuscript meet the required standards.

      (22) [Minor] We suggest including a quantitative scale for the bias in Fig. 3E.

      Thanks, we have incorporated this suggestion into the manuscript.

      (23) [Recommendation] Authors could also evaluate the cell fate decision processes as mutations or other perturbations affect a regulatory network.

      We appreciate the reviewers for this valuable recommendation. We agree with the reviewers that further involving new cases would be helpful, especially those mutation-driven disease-related fate-decision processes, such as neutropenia in chemotherapy. However, given the considerable effort towards searching for appropriate datasets, we carefully decide not to make this change.

      (24) [Recommendation] The authors could include some discussion of the likely impact of the work on the field and the utility of the methods and data to the community. For example, understanding the fluidity of the epigenetic landscape and the regulatory forces behind cell fate decisions can be of great importance in designing synthetic gene regulatory circuits.

      We greatly appreciate the reviewers pointing this out. In the original manuscript, we intentionally limited the length of the discussion to make the whole story more focus. We thank the reviewers for their insightful suggestions regarding the content of discussion. We have incorporated this suggestion into the revised manuscript. Please refer to Page 25, lines 751-757 (see below).

      “Recently, synthetic biology has realized the insertion of the CIS network in mammalian cells. One of the prerequisites for recapitulating the complex dynamics of fate transitions in synthetic biology is systematical understanding of the role of GRNs and driving forces in differentiation. And the logic motifs are the essential and indispensable elements in GRNs. Our work also provides a blueprint for designing logic motifs with particular functions. We are also interested in validating the conclusions drawn from our models in a synthetic biology system.”

      In addition, a longstanding question of our interest in cell fate decisions is what contributes the distinctive development cross species, like human, mice and so on forth. However, in addition to protein coding sequences, regulatory interactions between genes (i.e., activation and inhibition) also exhibit conservation as reported in recent work of multi-species cell atlas [11], and it is generally acknowledged that gene regulatory networks (GRNs) orchestrate fate-decision procedures. Namely, conserved regulatory programs further bring us a conserved topology of core GRNs. Thus, the logics of regulation, as another vital element in GRNs, is naturally under the spot light (related to the introduction, lines 99-120 of the revised manuscript). Nevertheless, to our knowledge, regulatory logic in cell fate decisions has received only scant attention. We hope that our elucidation of the role of logic motifs in cell fate decisions will attract more inquiries in community into GRN’s regulatory logic.

      Public reviews

      In this manuscript, Xue and colleagues investigate the fundamental aspects of cellular fate decisions and differentiation, focusing on the dynamic behaviour of gene regulatory networks. It explores the debate between static (noise-driven) and dynamic (signal-driven) perspectives within Waddington's epigenetic landscape, highlighting the essential role of gene regulatory networks in this process. The authors propose an integrated analysis of fate-decision modes and gene regulatory networks, using the Cross-Inhibition with Self-activation (CIS) network as a model. Through mathematical modelling, they differentiate two logic modes and their effect on cell fate decisions: requires both the presence of an activator and absence of a repressor (AA configuration) with one where transcription occurs as long the repressor is not the only species on the promoter (OO configuration).

      The authors establish a relationship between noise profiles, logic-motifs, and fate-decision modes, showing that defining any two of these properties allows the inference of the third. They also identify, under the signal-driven mode, two fundamental patterns of cell fate decisions: either prioritising progression or accuracy in the differentiation process. The authors apply this analysis to available high-throughput datasets of cell fate decisions in hematopoiesis and embryogenesis, proposing the underlying driving force in each case and utilising the observed noise patterns to nominate key regulators.

      The paper makes a substantial contribution by rigorously evaluating assumptions in gene regulatory network modelling. Notably, it extensively compares two model configurations based on different integration logic, illuminating the consequences of these assumptions in a clear, understandable manner. The practical simulation results effectively bridge theoretical models with real biological systems, adding relevance to the study's insights. With its potential to enhance our understanding of gene regulatory networks across biological processes, the paper holds promise. Its implications extend practically to synthetic circuit design, impacting biotechnology. The conclusions stand out, addressing cell fate decisions and noise's role in gene networks, contributing significantly to our understanding. Moreover, the adaptable approach proposed offers versatility for broader applications in diverse scenarios, solidifying its relevance beyond its current scope.

      We thank the reviewers for their enthusiasm for our work, and appreciate the professional, insightful and encouraging assessment.

      However, the manuscript in its current form also has some important weaknesses, including the lack of clarity in the text and the questionable generality of specific observations.

      We thank the reviewers for this comment. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      For instance, even when focusing on the CIS network, the effect of alternative model implementations is not discussed. Notably, the input signals are only considered as an additive effect over the differential equations, while signals can potentially affect each of the individual processes.

      We agree with the reviewers’ comment that signals may affect at each level of the central dogma, including transcription, translation, etc. Further, we have also included additional section titled “limitation of this study” on this point in the revised manuscript, and explicitly point to the potential limitations of our models. Please refer to Page 25 of the revised manuscript, lines 769-771 (see below).

      “In addition, for simplicity and intuition, we here considered signals as uncoupled and additive effects in ODE models, due to feasible mapping in real biological systems, such as ectopic overexpression.”

      The proposed model allows for a continuum of interactions/competition between transcription factors, yet only very restrictive scenarios are explored (strict AND/OR logic operations).

      We thank the reviewers for this comment, and appreciate them sharing the potential for further generalization of our framework. Indeed, in addition to logic operations, our framework is able to be applied to all two-node circuits (34=81 in total), including mutual activation with self-activation. As the focus of this work is to illustrate the role of logic motifs in cell fate decisions, we mainly concentrated on two classical, intuitive and representative (at least to us) logic operations AND/OR in the context of the CIS network. Nonetheless, we already have four combinations to consider (two logic motifs and two driving forces). And we feel that the currently involved scenarios have properly fulfilled our need to manifest the role of logic motifs. Hence, we carefully decided not to further explore more logic operations in this work. Instead, we have included additional section titled “limitation of this study” in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 760-762.

      “Although our framework enables the investigation of more logic motifs, we chose two classical and symmetrical logic combinations for our analysis. Future work should involve more logic gates like XOR and explore asymmetrical logic motifs like AND-OR.”

      Moreover, how the model parameters are chosen throughout the paper is not clear. Similarly, the concentration and times are not clearly specified, making their comparison to experimental data troublesome.

      We thank the reviewers for this comment. Regarding how to specify parameters in our model, we have now revised the manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). In terms of concentration and time, we acknowledge that their units are arbitrary compared to a real experimental system. We now have noted this point in the legend of corresponding figures (Fig2.C, Fig3.B&D, Fig6.B-C, Fig7.E).

      We would like to highlight that our entire work is organized in a model-driven fashion (also called top-down). We did not fine-tune the sets of parameters used in our model to specifically match the experimental data. Actually, it is also a longstanding challenge in computational biology since experimental datasets are usually insufficient to specify the parameters in a dynamical model. So, in general, it is inevitable to involve more assumptions such as non-Markov process[12, 13] and may lead to artifacts. Thus, we decided to draw qualitative conclusions (e.g., trends over time) from a quantitative model with sampling of parameter sets. Hence, we did not intentionally tailor our models to fit different datasets (i.e., all models used in our work share same basic setting of parameters), mapping into real biological systems in a top-down manner.

      Regarding clarity, how the general model (equations 1-2) transforms into the specific cases evaluated in the paper is not clearly stated in the main text, nor are the positive and negative effects of individual transcription factors adequately explained. Similarly, in the main text and Figure 2, the authors refer to a Boolean model. However, they do not clearly explain how this relates to the differential equation model, nor its relevance to understanding the paper.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have adjusted the manuscript accordingly and made the necessary adjustments to improve its clarity.

      Additionally, the term "noise levels" is generally used to refer to noise introduced in the "noise-driven" analysis (i.e., as an input or parameter in the models). Nonetheless, it is later claimed to be evaluated as an intrinsic property of the network (likely referring to expression level variability measured by the coefficient of variation).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation.

      For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly.

      Finally, some jargon is introduced without sufficient context about its meaning (e.g., "temporal fully-connected stage").

      Regarding the jargon of "temporal fully-connected stage", we have realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 10-11 of the revised manuscript, lines 316-327 (see below).

      “Notably, in the AND-AND motif we observed a brief intermediated stage before S attractor disappears, where all three fates are directly interconnected (Fig4.C 2nd panel and D 2nd panel, Fig.4E). To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif). Unlike the indirect attractor adjacency structure mediated by S attractor (Fig2.D), the solution landscape with fully-connected structure facilitates transitions between any two pairs of fates. Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      Additionally, proper discussion of previous work is also missing. For instance, the dynamics of the CIS network investigated by the authors have been extensively characterised (see e.g., Huang et al., Dev Biol, 2007), and how the author's results compare to this previous work should be discussed. In particular, the central assumptions behind the derivation of the model proposed in the manuscript must be assessed in the context of previous work.

      Thanks for pointing this out. We have extended the discussion to include above points. We have also discussed and cited the work of Huang mentioned above. Please refer to Page 22, lines 644-647 in the revised manuscript (see below).

      “One of the most representative work is that Huang et al. [14] modeled the bifurcation in hematopoiesis to reveal the lineage commitment quantitatively. Compared to simply modularizing activation or inhibition effect by employing Hill function in previous work, our models reconsidered the multiple regulations from the level of TF-CRE binding.”

      References

      (1) Ackers, G.K., A.D. Johnson, and M.A. Shea, Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci U S A, 1982. 79(4): p. 1129.

      (2) Shea, M.A. and G.K. Ackers, The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. Journal of Molecular Biology, 1985. 181(2): p. 211-230.

      (3) Hunziker, A., et al., Genetic flexibility of regulatory networks. Proc Natl Acad Sci U S A, 2010. 107(29): p. 12998-3003.

      (4) Kittisopikul, M. and G.M. Suel, Biological role of noise encoded in a genetic network motif. Proc Natl Acad Sci U S A, 2010. 107(30): p. 13300-5.

      (5) Brand, M. and E. Morrissey, Single-cell fate decisions of bipotential hematopoietic progenitors. Curr Opin Hematol, 2020. 27(4): p. 232-240.

      (6) Zhang, Y., et al., Hematopoietic Hierarchy - An Updated Roadmap. Trends Cell Biol, 2018. 28(12): p. 976-986.

      (7) Arinobu, Y., et al., Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages. Cell Stem Cell, 2007. 1(4): p. 416-27.

      (8)Kamimoto, K., et al., Dissecting cell identity via network inference and in silico gene perturbation. Nature, 2023. 614(7949): p. 742-751.

      (9) Hammelman, J., et al., Ranking reprogramming factors for cell differentiation. Nat Methods, 2022. 19(7): p. 812-822.

      (10) Semrau, S., et al., Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat Commun, 2017. 8(1): p. 1096.

      (11) Li, J., et al., Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics, 2022. 54(11): p. 1711-1720.

      (12) Stumpf, P.S., F. Arai, and B.D. MacArthur, Modeling Stem Cell Fates using Non-Markov Processes. Cell Stem Cell, 2021. 28(2): p. 187-190.

      (13) Stumpf, P.S., et al., Stem Cell Differentiation as a Non-Markov Stochastic Process. Cell Syst, 2017. 5(3): p. 268-282 e7.

      (14) Huang, S., et al., Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol, 2007. 305(2): p. 695-713.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents some valuable information regarding the molecular mechanisms controlling the regeneration of pancreatic beta cells following induced cell ablation. However, the study lacks the critical lineage tracing result to support the conclusion about the origin of the regenerated beta cells. The results of the pharmacological manipulation of CaN signaling are also incomplete. In particular, these manipulation are not cell-specific, making it difficult to interpret and thus genetic approach is recommended.

      Public Reviews:

      Reviewer #1 (Public Review):

      Induction of beta cell regeneration is a promising approach for the treatment of diabetes. In this study, Massoz et.al., identified calcineurin (CaN) as a new potential modulator of beta cell regeneration by using zebrafish as model. They also showed that calcineurin (CaN) works together with Notch signaling calcineurin (CaN) to promote the beta cell regeneration. Overall, the paper is well organized, and technically sound. However, some evidence seems weak to get the conclusion.

      Reviewer #2 (Public Review):

      This work started with transcriptomic profiling of ductal cells to identify the upregulation of calcineurin in the zebrafish after beta-cell ablation. By suppressing calcineurin with its chemical inhibitor cyclosporin A and expressing a constitutively active form of calcineurin ubiquitously or specifically in ductal cells, the authors found that inhibited calcineurin activity promoted beta-cell regeneration transiently while ectopic calcineurin activity hindered beta-cell regeneration in the pancreatic tail. They also showed similar effects in the basal state but only when it was within a particular permissive window of Notch activity. To further investigate the roles of calcineurin in the ductal cells, the authors demonstrated that calcineurin inhibition additionally induced the proliferation of the ductal cells in the regenerative context or under a limited level of Notch activity. Interestingly, the enhanced proliferation was followed by a depletion of ductal cells, suggesting that calcineurin inhibition would exhaust the ductal cells. Based on the data, the authors proposed a very attractive and intriguing model of the role of calcineurin in maintaining the balance of the progenitor proliferation and the endocrine differentiation. However, the conclusions of this paper are only partially supported by the data as some evidence from the data remains suggestive.

      (1) In the transcriptomic profiling, genes differentially regulated in the ablated adults could be solely due to the chemical effects of metronidazole instead of the beta-cell ablation. A control group without ins:NTR-mCherry but treated with metronidazole is necessary to exclude the side effects of metronidazole.

      We believe that it is unlikely that the differential regulation observed is due to metronidazole rather than the beta cell loss. This experimental strategy as proven successful in well-published studies to identify regulators of beta cell regeneration in the zebrafish larvae. Importantly, the candidates identified in these studies were subsequently functionally validated in mammalian models (Lu et al. 2016, Karampelias 2021). Moreover, in our study, we also used another chemical compound, the nifurpirinol (Bergemann et al., 2018), to ablate the beta cells. Regardless of whether we employed metronidazole or nifurpirinol for beta cell ablation, our results consistently indicate a notable involvement of calcineurin. Of note, the nifurpirinol molecule is commonly used in fishkeeping without toxicity reported on the global health of the fish.

      (2) Although it has been shown that the pancreatic duct is a major source of the secondary islets in the pancreatic tail in previous studies, there is no direct evidence showing the cyclosporin A-induced cells share the source in this manuscript. Without any proper lineage tracing work, the origin of those cyclosporin A-induced cells cannot be concluded.

      Our experimental setting is similar to the one described in Ninov et al. 2013, where lineage tracing experiments demonstrate an increase of beta cell formation in the pancreatic tail that originate from the pancreatic ducts. In our study, we performed the same experiment with the addition of CsA and showed more ductal cell proliferation (Figure 5G) followed by a 19% increase of beta cell regeneration compared to nonregenerative conditions (Figure 2B). It is unlikely that the additional 19% of regenerated beta cells under CaN inhibition come from another source than the 68% first.

      On the other hand, the acinar cells cannot be consider as another source of regenerated beta cell as they are not able to form beta cells unless they are artificially reprogrammed (Maddison et al., 2012). Therefore the only other potential source of regenerated beta cell is the endocrine compartment. However at the stage where we performed beta cell ablation, there are no endocrine cell in the pancreatic tail. Moreover, there are no evidence that secondary islets could come from the principal islet, they are tightly associated with the ducts and differentiate form ductal cell (Mi et al., 2023).

      Importantly, we demonstrated that overexpression of CaN specifically in the pancreatic ducts prevents beta cell regeneration. CaN effect is therefore intrinsic to the ducts. Moreover, we showed that CsA increase beta cells formation when Notch signalling is repressed. Given that Notch signalling is known to act on the ductal cell population, this strongly suggests again that CsA exacerbate beta cells formation from the ducts.

      All of these compelling evidences strongly support the notion that the cyclosporininduced beta cells originate from the ductal cells.

      (3) It is interesting to see an increase of beta cells in the primary islet after cyclosporin A treatment (Supplemental Fig 2B). However, it remains unclear if their formation shares the same mechanism with the newly formed beta cells in the pancreatic tail.

      There are indeed several source of beta cell regeneration in the primary islet. However, a recent study showed that the contribution of alpha cell to regeneration is minor and the main contributors are ductal and sst1.1 cells (Mi et al., 2023). In our previous publication, we indeed showed that a major source of beta cell in the principal islet is the delta 1.1 cell population. Those sst1.1 cells begin to express insulin and therefore are named ‘bihormonal’ (Carril et al., 2022). We tested if this population is impacted by CsA treatment and we showed below that CsA does not affect bi-hormonal cell formation (Figure 2D supplemental). These new results suggest that the CsA mediated increase of beta cells in the principal islet arise from the ductal cells as observed in the tail. These results were added in the manuscript as Figure 2D supplemental.

      Author response image 1.

      Tg (sst1.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of bi-hormonal cells in the principal islet at 6dpf.

      (4) The conclusion of the effect of cyclosporin A on the endocrine progenitors (Line 175) is not convincing because the data cannot distinguish the endocrine progenitors from the insulin-expressing cells. Indeed, Figure 2E shows that neurod1+ cells are fewer than ins+ cells (Figure 2D) in the pancreatic tail at 10 dpt, suggesting that all or at least the majority of neurod1+ cells are already ins+.

      The neurod1+ cells population indeed included both endocrine progenitor cells and differentiated endocrine cells. However, we would like to point out that the timing of the analysis is essential to reach our conclusion. When we treat with CsA, we show an increase of neurod1+ cells already at 4dpt. At this time point, no hormone- producing cell can yet be detected (Figure 2E). Those additional neurod1+ cell are therefore endocrine progenitors and not beta cells. This result shows that CaN inhibition induces pro-endocrine cell formation in regenerative conditions.

      At 10dpt, the neurod1+ cells population includes beta cells as well as endocrine progenitor cell. We agree that the way the data are presented in figure 2D and 2E can be confusing. Those 2 figures come form 2 separated experiments, the number of beta cell in figure 2D can therefore not be compared to the number of Neurod1+ cell in figure 2E. Indeed, from one experiment to another the efficiency and rate of regeneration can vary, independently of calcineurin. To clarify, we added the number of beta cells regenerated in the experiment of figure 2E (see Author response image 2 in red). As you can see in this experiment, regeneration was a bit slower than usual.

      Author response image 2.

      Tg (neurod1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of GFP+ cells (in grey, pink, dark grey and green), and mCherry+ cells for the condition ablated + CsA in red from 2 to 10 dpf.

      (5) Figure 5D shows a significant loss of nkx6.1+ cells in the combined treatment group but there is no direct evidence showing this was a result of differentiation as the authors suggested. This cell loss also outnumbered the increase in ins+ cells (Figure 4D). The cell fates of these lost cells are still undetermined, and the authors did not demonstrate if apoptosis could be a reason of the cell loss.

      Firstly, as you can notice on the graphs, we encountered a very high variability between individuals within the same condition. We decided to show this variability by presenting the raw data. This high variability could partially explain the differences that you underline. Moreover, we would like to point out that independently of CaN inhibition the progenitor loss (nkx6.1+ cell) outnumber the gain of beta cells. Indeed, in average there is a loss of 29% (41 GFP+) of the nkx6.1+ cells and a gain of only 6 beta cells after Notch inhibitory treatment. The other progenitors cells being differentiated into other endocrine cell types (pro-endocrine, alpha, delta). In the combined treatment (Notch and CaN inhibitors), we decreased the number of progenitors cell by 50%, i.e 21% (20 cells) more than without CaN inhibitor. However, we increased the number of regenerated beta cells by two fold (6 cell to 12 cells). In brief, the important progenitors cell loss could be explained by precocious differentiation in the pro-endocrine and endocrine cells type. It is therefore normal than the number of beta cells regenerated do not match the progenitors cell number loss and this in presence or absence of CaN inhibition.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The evidence to indicate the proliferating ductal cell differentiate into beta cell is weak. They should use linkage tracing, or other marker genes immunostaining to confirm that.

      The experiment from the Figure 5 A-D is a short term tracing experiment and should have been presented as such in the manuscript. After LY411575 (Notch inhibitor) and CsA treatments at 3dpf, we exposed the larvae to EdU at 4dpf during 8 hours (Figure 5A). We showed that EdU is incorporated in dividing ductal cells at 4dpf (Figure 5C) ant that 2 days later there are newly form beta cells that are EdU+.(see Author response image 3) To reinforce our conclusion, the image below will be added to the manuscript.

      Author response image 3.

      Tg (nkx6.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with both CsA 1µM and LY411575 5µM. At 4dpf, the larvae were exposed to EdU 4mM during 8 hours, before analysis at 6 dpf.

      (2) To inhibition of CaN and Notch pathway, they just used the pharmacological approaches, genetical approaches should be used to get stronger evidence.

      We employed two distinct inhibitors specifically targeting calcineurin (CsA and FK506) for CaN inhibition. While these inhibitors have distinct chemical structures and potential non-specific effects, they both yield the same result of increased beta cell formation under Notch repression (see Figure 4D and Figure 4B in the supplementary data). This convergence of outcomes strongly suggests that the observed effect is primarily attributable to the specific inhibition of calcineurin.

      Furthermore, we complemented our inhibitor-based approach with a genetic strategy involving CaN overexpression (see Figure 3). Notably, the overactivation of CaN resulted in a reduction of beta cell regeneration. Given that this genetic approach generated an effect contrary to that achieved with the inhibitors, it provides robust support for our model, which postulates that calcineurin plays a critical role in the regulation of beta cell regeneration (see Figure 3, panels C-E).

      As for Notch inhibition, previous published data from our laboratory compared the effects of Notch inhibitor (LY411575) and genetic approaches (mib mutant and transgenic line) on pro-endocrine cell (ascl1b+) and ductal cell (nkx6.1+) formation. This study showed that both Notch inhibitor (LY411575) and Notch repression using genetic approaches recapitulate the same effect: an induction of pro-endocrine cells formation. The specificity of this inhibitor being validated (Ghaye et al., 2015), we did not consider the need of a genetic approach.

      (3) The most enriched pathways among the up-regulated genes were DNA replication and cell cycle, which suggested that these genes are more important for the duct cell proliferation, how is Calcineurin related to these pathways, such as regulating the genes important for proliferation?

      The transcriptomic data presented in this manuscript suggest that the ductal cells undergo a strong proliferative response after beta cell ablation. This is in accordance with our experimental data showing activation of ductal proliferation after beta cell ablation (Ghaye at al., 2015) and data from this manuscript (Figure 1 I-J).

      Calcineurin is a well-known regulator of the cell cycle, and can either promote or repress the cell cycle depending on the cell type. For example, stressing the cell provokes an entry of calcium and subsequently a CaN activation which result in cell cycle arrest (Leech et al. 2020). Nevertheless, depending the cell type, CaN can be either necessary or deleterious to cell proliferation (Goshima et al. 2019; Masaki and Shimada 2022). The intriguing dual role of CaN in cell cycle is well illustrated in β cell regeneration. While CaN should be repressed to enable ductal progenitor amplification and subsequent endocrine differentiation, CaN is then necessary for β cell function and for their replication (Dai et al. 2017; Heit et al. 2006). Moreover, CaN is related to cellular senescence and CaN function is important for proper fin regeneration in zebrafish.

      (4) It is hard to understand why they pick up the pathway of cellular senescence signature for the duct cell progenitor neogenesis? Moreover, among these senescence genes, many genes are cell cycle regulators.

      In response to beta cell ablation, the ductal cells undergo a strong proliferative response, as shown in our previous data (Ghaye 2015). It was therefore not surprising that many differentially expressed genes are cell cycle regulators. On the other hand, the cellular senescence signature was surprising. Indeed, senescence is usually associated with cell cycle arrest and aging. However, recent studies showed that cellular senescence is required for proper development and regeneration. We therefore wanted to investigate this pathway and more particularly the function of calcineurin, which can either promote or repress the cell cycle in different cell types (see comment above).

      (5) The RNA-seq data obtained from adult fish, while the authors use larvae to explore the CaN functions, it may have different conclusion using adult fish. Moreover, it is unclear whether the CaN increased when the beta cell ablated in young larvae.

      We decided to first perform functional experiment in the larvae as this model unable the quantification of beta cell regeneration from the ducts in the pancreatic tail. However, to validate our results in non-developmental stages, we perform experiments in juveniles (2 months old) and adults. CsA treatments in juveniles zebrafish recapitulated the same results that in larvae (Figure 2B and Figure 6A-C). Moreover, we showed that CaN overactivation delayed glycemia recovery after ablation adults (Figure 6D-E), which is in accordance with an impaired regeneration. Altogether, these results strongly suggest that CaN act as regulator of beta cell regeneration both in the juvenile/adult and larval stages.

      Concerning the expression of CaN in the zebrafish larvae, we tried to detect the level of CaN in the different experimental conditions by in situ hybridization. However, we were not able to detect it using this technique. We also tried immunostaining with antiphospho-nfact3 ser165 polyclonal antibody (Invitrogen) but this antibody does not seem to work in zebrafish. Finally, we tried to sort ductal cell at larval stage to perform a transcriptomic analysis but we were unable to collect enough ductal cells to proceed further. Indeed our staining experiment showed that there are only around 150 ductal cells (nkx6.1+, Figure 5D) at this stage.

      (6) The beta cell regeneration in the young larvae usually recovers within ~ 5 days in principle islet. Please also show the beta cell number (PI) during the beta cell recovery after ablation.

      We did show beta cell regeneration in the principal islet in Figure 2A-B supplemental. While new beta cells appears quickly in this islet (Carril, Massoz, Dupont et al., 2023), the principal islet has not yet fully recover at 5dpt.

      (7) Since the studies did not show the CaN level in Fig.3, it is hard to know that the CaN is exactly expressed.

      In the figure 3B, using Tg(hsp70:GFP-CaNCA), it is indeed not possible to see CaN expression at 10 dpt as the heat shocks induce only transiently CaNCA overexpression. However, the transient expression was detected in live shortly after the heat shocks. On the other hand, with the transgenic line Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4), in which GFPCaNCA is continuously expressed allowing us to show CaNCA expression in the pancreatic ducts (Figure 3).

      (8) In Fig.6 D and 6E, did these drug treatments change the glucose level in nonablated fish?

      As you can see below, the CaN inhibitor, CsA does not affect the glycemia of the fish in non-regenerative conditions.

      Author response image 4.

      Glycemia of non-ablated fish, 3 days after drug treatment.

      (9) The logic of writing in Results is very hard to understand.

      We proofed read the paper in an effort to clarify it.

      Minor concerns,

      (1) Make a scheme for ablation and RNA-seq, and indicate the age of the fish used in Fig. 1.

      We added the scheme in Figure 1 supplemental.

      (2) In Fig. 1G, two arrows indicated mCherry+ cells is hard to see in the non-ablated fish.

      One arrow was indeed mislocated, we moved the arrow and try to improve the intensity of red. However, the only cells are indeed small and can be difficult to see.

      (3) In Fig.6, it is hard to know that the arrows indicated islets are small islets (up to 5 cells), how they compared with big islets and defined as small islet. Moreover, some of these islets are almost invisible.

      We now show a close up of a portion of the pancreatic tail and show the beta cells with arrows only in this picture, to enhance clarity.

      Reviewer #2 (Recommendations For The Authors):

      (1) This manuscript needs more proofreading and polishing to increase its readability.

      We proofread the manuscript and change some paragraph for more clarity.

      (2) The extensive use of words like "modulate" or "regulate" sometimes makes the text ambiguous as the effect is not stated directly and clearly.

      We re-wrote some parts of the text and try to avoid using “regulate” as often.

      However, as we used both repression and over-activation of CaN, we still use words as regulate to stipulate general conclusions on the function of CaN.

      (3) The list of individual differentially regulated genes after the beta-cell ablation in the RNAseq seems missing. This list could be interesting and helpful for other researchers. We added it.

      (4) In Figure 1D, "modulated" genes are shown but were they all upregulated like those in Figure 1A? The modulation should be indicated more clearly (e.g. up- or down-regulated) in the figure. The authors can use different colours to illustrate that.

      Done.

      (5) Is Figure 2D showing the same data extracted from Figure 2B? Does Figure 2D add any information to the data?

      No, it does not add data. We actually add the Figure 2D for a better visualisation of the increase at 10dpt.

      (6) In the y-axis of Figure 3E, it should be "mCherry".

      It already is. We did check all the axis again to be sure it is correct.

      (7) Line 219, "Figure 4E supplemental" instead of "Figure 4D supplemental"

      Done.

      (8) Line 266, "ablated juveniles" instead of "ablated larvae"

      Done. Thank you for noticing these mistakes.

      (9) In Figure 6A, many mCherry+ cells are hardly visible and there are some greyish white signals in the images that are supposed to show the mCherry channel only. What are those grey signals?

      There is no channel showing grey on the picture, I improved the overall quality of this pictures and show close up to improve the figure.

      (10) In Figure 6D and 6E, CaNCA overexpression had a significant effect on the glycemia. But did the overexpression affect the beta cell formation or regeneration? We showed that CaNCA overexpression did not affect beta cell formation in absence of regeneration in the larvae (Figure 3E). Moreover, it does not affect the glycemia of the fish in non-regenerative conditions (Author response image 5). As for regenerative conditions, CaN overexpression decreased the regeneration in the larvae (Figure 3E).

      Author response image 5.

      Glycemia of Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4) fish, overexpressing CaNCA, compared to controls fish, in non-regenerative conditions.

      (11) The role of calcineurin seems transient (e.g. Figure 2B and 4E) and does not play a significant role in long term. It would be interesting to see if long-term/repeated treatments of calcineurin inhibitors and overexpression/knockout of important members of calcineurin signaling would affect the pool of progenitors in long term.

      We were also interested in the consequences of CaN overexpression on the long term. Our overexpression tool Tg(UAS:CaNCA) allow to address this question, as CaN is overexpress permanently. We assessed the structure of the ducts and the number of beta cells in transgenic larvae and did not see any defects of the ducts whether in regenerative context or not. On the other hand, we showed in this manuscript that CaN effect is specific to regenerative conditions. As a consequence, it is not likely that repeated treatments long after the ablation would continue to affect beta cell formation and the progenitors pool.

    1. Author Response

      eLife assessment

      We appreciate the assessment carried out by the editorial team at eLife. Therefore, we plan to review the methods section in order to make the statistical analysis more comprehensible for each of the displayed figures.

      Public reviews

      Reviewer 1

      We would like to express our gratitude to Reviewer 1 for providing a thorough summary of our work and highlighting its strengths. With regards to the weaknesses, we are committed to improve the manuscript by performing the necessary changes. First, we will specify the exact p-value in all cases.

      Regarding the discussion section, we acknowledge the feedback regarding its potential confusion. In line with the reviewer's suggestion, we will reduce the literature review and highlight our findings.

      Finally, for the preprint we did not include cofounders such as HIV infection and ethnicity as our study population did not exhibit viral infections and comprised only Hispanic individuals. We will make a more thorough description of the population of study and address these characteristics explicitly in both the methods section and the initial part of the results.

      Reviewer 2

      We appreciate and thank reviewer 2 for the commentaries. Although it is true that several papers have described the role of microbiome in COVID-19 severity, we firmly believe that our current work stands out.

      There is not much information related to this association in mediterranean countries, especially in the south of Spain. In addition, most of the studies only describe microbiota composition in stool or nasopharyngeal samples separately, without investigating any potential relationships between them as we do.

      (1) We agree with the reviewer idea of a limited sample size. We faced the challenge of collecting the samples during the peak of COVID-19 pandemia. Thus, doctors and nurses were overwhelmed and not always available for carrying out patient recruitment following the inclusion criteria. Despite these constraints, we ensured that all included samples met our specified inclusion criteria and were from subjects with confirmed symptomatology.

      In addition, our main goal was to identify whether severity of the disease could be assessed through microbiota composition. Therefore we did not include a healthy group. Despite not having a large N, our results should be reproducible as they are supported by statistical analysis.

      (2) We thank reviewer commentary, and since our original sentence may have lacked clarity, we intend to modify it to ensure it conveys the intended meaning more effectively.

      Nonetheless, we remain confident in the significance of our findings. Not only have we found correlation between microbiota and COVID severity, but we have also described how specific bacteria from each condition is associated with key biochemical parameters of clinical COVID infection.

      (3) We appreciate the feedback provided by the reviewer. In this case, we have performed 16S analysis due to its cost-effectiveness compared to metagenomic approaches. Furthermore, 16S analysis has undergone refinements that ensure comprehensive coverage and depth, along with standardized analysis protocols. Unlike 16S, metagenomic approaches lack software tools such as QIIME that facilitate standardization of analysis and, thus, reduce reproducibility of results.

      (4) We sincerely appreciate this insightful suggestion. simply listing associations between both microbiomes and COVID-19 severity could not be enough, we intend to discuss how microbiota composition may be linked to the mechanisms underlying COVID-19 pathogenesis in our discussion.

      (5) We are grateful for the constructive criticism and intend to rewrite our abstract to enhance clarity. Additionally, we will thoroughly review all figures and their descriptions to ensure accuracy and comprehensibility.

      Reviewer 3

      We acknowledge the annotations made by reviewer 3 and are committed to addressing all identified weaknesses to enhance the quality of our work. Our idea is to modify the methods section and figures to make them easier to understand.

      Specifically, in the case of Figure 1, we recognize an error in the description of the Bray-Curtis test. We appreciate the commentary and we will make the necessary changes. Moreover, there is another observation related to Figure 1 description. We are going to modify it in order to gain accuracy.

      For figure 2 we are planning to add a supplementary table showing the abundance of detected genus. Nevermind, we will also update the manuscript text to provide clarification on how we obtained this result.Regarding the clarification about "1% abundance," we want to emphasize that we are referring to relative abundance, where 1 represents 100%. To avoid confusion, we will explicitly state this in both the methods section and figure descriptions. Besides, it is true that the statistical test employed for the analysis is not mentioned in the figure description and we recognize that the image may be difficult to interpret. Therefore, we will modify the text and a supplementary table displaying the abundance and p values is going to be added.

      Furthermore, we agree with the reviewer's suggestion to investigate whether the bacteria identified as potential biomarkers for each condition are specific to their respective severity index or if there is a threshold. Thus, we will reanalyze the data and include a supplementary table with the abundance of each biomarker for each condition. We will also place greater emphasis on these results in our discussion.

      Finally, in response to the reviewer's suggestion, we are going to go through the nasopharyngeal-fecal axis part in the discussion. It is well described that COVID-19 induces a dysbiosis in both microbiomes.

      Consequently, we understand that the ratio we have described could be an interesting tool for assessing COVID severity development as it considers alterations in both environments. However, we acknowledge that there may be room for improvement in clarifying the significance of this intriguing finding and its implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This comprehensive study provides valuable information on the cooperation of Ikaros with Foxp3 to establish and regulate a major portion of the epigenome and transcriptome of T-regulatory cells. However, the characterization is incomplete in that incontrovertible evidence that these are intrinsic features regulating biological function and not outcomes of the inflammatory micro-environment of the genetically manipulated mice is missing.

      Public Reviews:

      This study investigates the role of Ikaros, a zinc finger family transcription factor related to Helios and Eos, in T-regulatory (Treg) cell functionality in mice. Through genome-wide association studies and chromatin accessibility studies, the authors find that Ikaros shares similar binding sites to Foxp3. Ikaros cooperates with Foxp3 to establish a major portion of the Treg epigenome and transcriptome. Ikaros-deficient Treg exhibits Th1-like gene expression with abnormal expression of IL-2, IFNg, TNFa, and factors involved in Wnt and Notch signaling. Further, two models of inflammatory/ autoimmune diseases - Inflammatory Bowel Disease (IBD) and organ transplantation - are employed to examine the functional role of Ikaros in Treg-mediated immune suppression. The authors provide a detailed analysis of the epigenome and transcriptome of Ikaros-deficient Treg cells.

      These studies establish Ikaros as a factor required in Treg for tolerance and the control of inflammatory immune responses. The data are of high quality. Overall, the study is well organized, and reports new data consolidating mechanistic aspects of Foxp3 mediated gene expression program in Treg cells.

      Strengths:

      The authors have performed biochemical studies focusing on mechanistic aspects of molecular functions of the Foxp3-mediated gene expression program and complemented these with functional experiments using two models of autoimmune diseases, thereby strengthening the study. The studies are comprehensive at both the cellular and molecular levels. The manuscript is well organized and presents a plethora of data regarding the transcriptomic landscape of these cells.

      Response: We thank the reviewers for their careful review and feedback on our manuscript. We appreciate that the reviewers and editors recognize the strength and comprehensive nature of our in vivo, cellular, biochemical, and genome-wide molecular studies, which are well-organized in the manuscript. The acknowledgment of the complementary functional experiments in two models of inflammatory disease is also encouraging.

      Weakness:

      The authors claim that the mice have no pathologic signs of autoimmune disease even at a relatively old age, yet mice have an increased number of activated CD4+ T cells and T-follicular helper cells (even at the age of 6 weeks) as well as reduced naïve T-cells. Thus, immune homeostasis is perturbed in these mice even at a young age and the eXect of inflammatory microenvironments on cellular functions cannot be ruled out. Further, clear conclusions from the genome-wide studies are lacking.

      Response: We agree with the reviewers' comment regarding the absence of overt autoimmune pathologies in Ikzf1-fl/fl-Foxp3-Cre+ mice, despite the increased frequency of activated CD4+ T cells, TFH cells, and apparent perturbation of lymphocyte homeostasis, even at a young age. It is noteworthy that while Ikaros is implicated in various autoimmune diseases, our specific mouse model in which Ikaros expression is lost only in Tregs, may not lead to a strong autoimmune phenotype in part due to the controlled environment of an extra-clean, pathogen-free animal facility. This aligns with a related study by Ana et al (2019, J. Immunol: doi:10.4049/jimmunol.1801270) in Ikzf1-fl/fl-dLck-Cre+ mice with loss of Ikaros expression in all mature CD4+ T cells, including Tregs, that exhibit no overt signs of overt autoimmune disease. Moreover, our transcriptomic studies reveal that increased expression of inflammatory genes in Ikzf1-deficient Treg is coupled with the simultaneous upregulation of genes with positive roles in Treg function. This balance suggests a compensatory mechanism within Ikaros-deficient Tregs that maintains their suppressive function until encountering an inflammatory immune challenge, which eventually leads to loss of Treg suppressive function in Treg-specific Ikaros-deficient mice. Our studies clearly show that Ikaros has cell-intrinsic eXects in Treg that also lead to cell-extrinsic eXects mediated by secreted factors that are likewise regulated by Ikaros. This can be said about the function of any transcription factor in any cell type. Our data clearly support the conclusion from the genome-wide studies that Ikaros plays a major role in establishing the active chromatin landscape, gene expression profile, and function of regulatory T cells in mice.

      The following recommendations consolidate the views of the three reviewers of the manuscript.

      The experiments suggested and, in some instances, fresh analysis, are thought necessary, so that the evidence of Ikaros-Foxp3 interactions regulating T-regulatory cell biology is comprehensive and solid. We hope the comments are useful to strengthen the comprehensive analysis reported in this submission.

      The primary concern is that the indications of inflammation in the mice (see points 1 & 2 below) do not reflect in the experiments or consequent conclusions. The gap in the data should be addressed by testing these interactions in an appropriate context for which suggestions are included.

      Please note that the title of the manuscript may be modified to reflect the use of mice as the system of study for this work.

      (1) The evidence of inflammation (increased CD4 and T follicular cells) reported in the work requires new experiments to rigorously examine the relationship between Ikaros and Foxp3 to rule out the possible impact of the (inflammatory) microenvironment of the mice (Please see: Zemmour et al., Nat. Immunology 22, 607, 2021). Two possible experimental systems in mice are suggested.

      a) The use of heterozygous female mice, which should be phenotypically normal due to the presence of 50% normal Treg. Or,

      b) The generation of bone chimeras between wild-type and deficient mice using congenic markers.

      Response: We agree that immune dysregulation that develops in the mice with age or during an inflammatory insult due to loss of Ikaros function in the Treg lineage is an important part of the phenotype of the animals. Our studies show that loss of Ikaros function in Treg influences the gene expression program such that Treg now produce inflammatory cytokines and ligands capable of engaging receptors expressed on Treg and other cells. This likely results in autocrine and paracrine signaling that induces further metabolic and gene expression diXerences not observed in wild-type mice. Indeed, we report in the manuscript that a sizable fraction of the diXerentially expressed genes do not appear to be direct Ikaros targets, but rather are downstream of Ikaros target genes such as Il2, Ifng, Notch, and Wnt. The mosaic experiments suggested will be a useful topic of future studies. Importantly, we argue that no gene expression study involving modulation of transcription factor activity in an organism- or cell-based system can be designed to measure only the direct eXects of that transcription factor in a manner isolated from any indirect, downstream eXects on the expression of other genes. We suggest that our current data remain highly valuable, as they reveal real and relevant biology in physiologic in vivo systems that do not depend upon the use of heterologous models. The fact that loss of Ikaros has an eXect not only on its direct targets, but on gene programs driven in turn by the indirect eXects of Ikaros-regulated factors, has been acknowledged in the manuscript.

      (2) Figs. 7 and S5 show accumulation of CD4 cells (activated, memory, Tfh, Tfr) in LNs and spleens of the Ikaros KO over time. This is accompanied by elevated Igs but without overt autoimmune disease. KO Tregs had equivalent suppressive activity as WT Tregs against WT TeX in vitro. However, TeX from KO mice were resistant to the suppressive eXects of WT or KO Tregs. The authors interpret this as due to the increased percentage of memory cells within the KO TeXs, although they did not formally prove this point. Figs. 9 and S6 show that Ikaros KO mice are unable to be tolerized for cardiac allograft survival using two diXerent standard tolerogenic regiments. The rejecting allografts are accompanied by increased T-cell infiltration and upregulation of inflammatory genes. The authors suggest there is increased alloantibody, but alloantibody does not seem to have been measured.

      Response: We are currently exploring in more detail the dysregulation of humoral immunity in the Ikzf1-deficient Treg model and plan to report these results in a future study.

      (3) Linked to the above, a comparison of the chromatin occupancy of Ikaros in resting and activated Tregs would inform on whether and how Ikaros occupancy changes with the activation status of Tregs. Since the authors use in vitro stimulation for RNAseq and ATAC seq, ChIP seq analyses under these matching conditions will greatly add to the quality of the study. Since "Foxp3-dependent", ie. diXerential gene expression in the Foxp3GFPKO cells (PMID: 17220874) gene expression has been shown to be not entirely the same as Treg signature (i.e. gene expression or Tregs compared to Tnv), it will be worth correlating Ikaros, Foxp3 co-occupied genes and the corresponding fate of their expression with Foxp3-dependent and independent Treg signature gene sets.

      Response: The prior study by Gavin et al. referred to above used duplicate samples instead of the standard three or more replicates required for a robust diXerential analysis of gene expression. The two samples in this study are variable, and no statistically significant diXerential gene expression was found between the experimental groups when we subjected these data to current analysis methods. For this reason, we have elected not to compare these prior data with our current data, which are robust, reproducible, and analyzed using current statistical methods. Furthermore, the mice used for the prior study develop a fatal inflammatory disease (scurfy) and therefore the Treg examined in this study would be subject to a much stronger extrinsic inflammatory environment than the Treg in our study, as our mice show no overt disease even with age.

      Further, the consequence of the cooperation between the two transcription factors that can be inferred from the experiments in the study remains unclear. It is suggested that the authors could first consider the ChIP seq data from Foxp3, Ikaros co- and diXerentially occupied genes, and then correlate with the ATAC seq and gene expression data to comment on the consequence of this cooperation.

      Response: We find that Ikaros binding at a given region has a strong eXect on accessibility, as reported in the manuscript, but that Foxp3 occupancy has less consequence, consistent with a prior study suggesting that Foxp3 largely utilizes the open chromatin landscape already present in the conventional CD4 T cell lineage (PMID:23021222). Our data suggest that the dominant eXect of Ikaros on Foxp3 is at the level of chromatin occupancy.

      (4) In the comparative analyses of Ikaros and Foxp3 co-occupied regions and gene expression outcome, the authors mention "A total of 4423 Foxp3 binding sites were detected in the open chromatin landscape of wild-type Treg (Supplementary Table 9), and this ChIP-seq signal was enriched at accessible Foxp3 motifs." It is unclear whether the authors focused on the ATAC seq data and only examined the open chromatin regions for this analysis. In that case, it is unclear why. More so because the Ikaros footprint is more apparent in regions where accessibility is reduced upon deletion of Ikaros.

      Response: Foxp3 has been shown to bind primarily at open chromatin shared between Tconv and Treg, unlike the pioneer activity of other Fox family members (PMID: 23021222, biorXiv https://www.biorxiv.org/content/10.1101/2023.10.06.561228v2.full.pdf). Consistent with this, we found the majority of peaks were in open chromatin. The motif analysis is quantitative, not binary, and takes into account Foxp3 binding sites at regions considered open in either condition, which is why we can see enrichment of Foxp3 motifs at sites going from more open to less open in the absence of Ikaros.

      (5) Comments on figures:

      The authors use MFI repeatedly in many of the figures for quantitation of antigen expression. This is misleading as several of the target antigens are normally expressed on a subpopulation of cells, e.g., Eos. Percent positive and MFI would be more relevant. Cytokine production should be presented by intracellular staining (e.g., IL-2, IFNg) as Elisa data does not allow one to determine the percentage of abnormally producing cells.

      Response: We show both ICS and ELISA in this paper, preferring ELISA because it is much more quantitative than ICS.

      Suppl. Fig. 1c - the panels do not correspond precisely to the legend or the text. At least one panel is missing. In Supp fig 1c, the authors plotted eXector Tregs, which are by definition CD62LloCD44hi, but the Y axis says CD44hiCD62Lhi. Is this a typo? Also on page 4, describing this data the authors mentioned Tfr, but the data is not shown in the Supp fig 1c.

      Response: We thank the reviewer for catching these mistakes. We have corrected the typo in the figure panel for Supplementary Figure 1c. Follicular Treg data are indeed presented in Figure 7h, not Supplementary Figure 1, and we have corrected the text.

      Fig. 2, which lists the diXerent categories of diXerentially expressed genes, it will be helpful if the authors add two columns indicating fold change and FDR values.

      Response: These values are included in Table S1

      Fig. 3c, the resolution of the histograms in the inset should be enhanced.

      Fig. 3d, a histogram of representative CTV dilution plots, and an explanation of how the quantifications were done may be included.

      Fig. 3e - not well labeled. Are these fold changes? Enrichments? Number of gene elements within the GO term that are aXected? Something else?

      Fig. 3f - presented out of sequence. The data are a little hard to understand as the color scale is so subtle and the colors so close to one another that it is not entirely clear which gene expressions are increased vs decreased. Other than the simple statement that the Ikaros KO causes numerous changes, there does not seem to be a more consistent message from this data panel.

      Fig. 4a, in addition to the bar graphs, it will be better to show the plots in a histogram, gated on Foxp3+ Tregs in WT and KO groups, with representative MFI indicated on top. The resolution of the scatter plots in this figure, as well as some others throughout the manuscript, may be improved. Please increase the resolution wherever necessary.

      Fig. 4b should include representative plots for cytokine production gated in Tconv (CD4+Foxp3-) cells.

      Figs. 5a-h, S2-3a-d, and Suppl. Tables S4-8 show a comprehensive ATAC-seq and ChIP-seq analysis of genes and chromatin occupied or regulated by Ikaros, comparing Tconv vs Treg, stimulated vs naïve, and WT vs KO cells. It is a comprehensive tour-de-force analysis, again showing the major eXects of Ikaros on the entire Treg landscape of gene regulation.

      Fig. S5h-j should be explained or labeled in more detail. The fonts are too small to read, even at 200% magnification; and the cell and gene comparisons are not entirely clear.

      Supp. Fig. S3e is not referred to in the text.

      Fig. S4a is very diXicult to read; the font and plotted points are too small.

      Response: We have improved the clarity of the figures where necessary. We also indicate in the figure legends that full gene lists are to be found in the supplementary tables.

      Page 8, "Regions that exhibit reduced accessibility in Ikzf1 cko compared to wild-type Treg are enriched for the binding motif for Ikaros and the motif for TCF1 (Figure 5g).... ". Is this Fig. 5i or 5g?

      Response: This statement is correct and is referring to data depicted in Figure 5g.

      In Fig 6e, Flag-Ik7 is not visible in any of the inputs. The co-IP between Foxp3 and Runx1 (presumably a positive control) is not eXicient in this experimental condition. Co-IP experiments performed in primary cells upon retroviral transduction of the tagged proteins to confirm observations in cell lines are suggested.

      Response: Runx1 is shown to co-precipitate with Foxp3 as expected, although the band is not intense, and the data depicted are representative of 3 experiments. Ik7 was included in this transient transfection experiment as a redundant control, and the referee is correct that Ik7 did not express well in this experiment and cannot be seen in this exposure. We showed these blots intact in the spirit of not digitally altering the data, and because the low Ik7 expression did not impact our ability to demonstrate specific co-precipitation of Foxp3 with full length Ikaros (Ik1). The images include nearly the entire mini-blots, and we have added molecular weight markers for clarity. As indicated in the legend, the cytokine and ChIP data in 6f are from a separate model of retrovirally Foxp3/Ik7transduced T cells that we and others have used in multiple prior studies (e.g. Thomas JI 2007, Thomas JI 2010). The interpretability of these experiments is not impacted by the transient transfection data from figure 6e. It should be noted that a prior study by Rudra et al. that is cited and referred to in the manuscript used a similar approach to also establish that Foxp3 and Ikaros form a complex in cells.

      In Fig 6f, the authors state that Foxp3 overexpression in CD4 cells results in promoter occupancy of both IL2 and IFNg, however, data shows only IL2. Also in 6f, Foxp3 overexpression reduces IL2 and IFNg secretion, measured by ELISA, which is recovered by IkDN. However, the eXect of Foxp3 along with WT Ikaros (which should not modulate, and if anything, further repress IL2, IFNg production) is not shown.

      Response: The reviewer is correct that ectopic expression of Ikaros leads to repression of cytokine gene expression, which we and others have shown in prior studies. Because the focus of this study was on loss of Ikaros function in Treg, we did not elect to overexpress full-length Ikaros. However, we completely agree that Ikaros GOF in Treg is an important topic for future studies.

      Fig. 7e-g, how is %suppression calculated? Can representative CTV dilution plots for the suppression assays be shown?

      Response: Cell division was quantified as described previously (see ref 50), and percent suppression represents the reduction in cell division measured by Tconv in the presence of Treg compared to in the absence of Treg. This has been clarified in the methods section.

      In Fig 8 and the supplementary figures the representative colon pictures (Fig. S6a-c) do not show convincing diXerences in colon morphology even though all the other histology and clinical parameters are clear. Are the figures mislabeled?

      In Fig 8c-e and other histology figures scale bars should be shown.

      Fig. 8c-e, the Alcian blue staining among the groups appears similar; perhaps this is due to the low power magnification.

      Response: We have edited this figure for clarity

      Additional comments:

      Fig 10 is explained in the discussion section for the first time. The authors may want to consider including this when introducing Ikzf1 ChIPseq data for the first time in the study.

      Response: The reviewer raises a valid point but we have elected to retain the current organizational structure of the manuscript.

      A more complete characterization of the activated conventional cells including both CD4+ and CD8+ T cells for cytokine production during aging may be considered, as it is highly likely that abnormalities in cytokine production will be observed.

      Response: We agree and are planning additional such experiments in future studies focusing on in vivo models of tolerance.

      The failure of suppression of T cell proliferation which the authors claim is due to the presence of activated memory T cells can be better documented by using naive responder cells from the cKO mice.

      Response: We agree and are planning additional such experiments in a future study focusing on further aspects of cellular immunobiology impacted by Ikaros, but we will give preference to in vivo models of tolerance in such studies.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (recommendations for the authors):

      Additional suggestions for improvement are noted below:

      (1) Additional 1. Lns 261-262, as well as abstract: The term 'aerobic fermentation' is not accurate in the context of this manuscript. This terminology should be reserved for conditions where lactate production is observed under optimal aerobic conditions. This is not the case in this study. More lactate was observed in the agr mutant only when cells were grown under microaerobic conditions, where some level of fermentation would be expected to be active (esp. if nitrate is not provided in media).

      We modified the text by deleting reference to the “aerobic” fermentation as suggested by the reviewer:

      Line 93 (abstract): “Deletion of agr increased both respiration and aerobic fermentation but decreased ATP levels and growth, suggesting that Δagr cells assume a hyperactive metabolic state in response to reduced metabolic efficiency.”

      Line 184: “Collectively, these data suggest that Δagr increases respiration and aerobic fermentation to compensate for low metabolic efficiency.”

      (2) Additionally, the authors' statement, 'The tendency of Δagr cells to forgo the additional ATP yield from acetate production in favor of NAD+-generating lactate (23, 24) underscores the importance of redox balance in Δagr cells,' appears contradictory to the data presented in Fig 5, where the Δagr mutant demonstrates an approximately threefold increase in acetate production during exponential growth compared to the wild-type strain. A clarification or adjustment in the manuscript may be necessary to ensure consistency and accurate interpretation.

      In glucose-fermenting S. aureus, pyruvate can serve as an electron acceptor, generating lactate from lactate dehydrogenases. Acetyl-CoA production proceeds via the pyruvate formate-lyase reaction, which converts pyruvate to formate rather than CO2 and thus does not consume oxidized NAD+. Thus, at a general level, the tendency of fermenting cells to forgo the additional ATP yield from acetate production in favor of NAD+-generating ethanol synthesis underscores the importance of redox balance when respiration is suboptimal. This is especially true for fermenting Δagr strains, as evidenced by increased lactate production compared to their relatively ATP replete wild-type parental strains. However, in the interest of clarity, we removed the sentence in question, because it is not necessary and potentially confusing, and because the additional context it requires would detract from the manuscript by disrupting its sense of narrative and brevity.

      (3) Ln 277-285: There still are errors in how this paragraph is worded. What the authors stated in the 'response to the reviewers' (question 13) and the changes they made in the text are different. Here again, the response to question 13 suggested the following, "Collectively, these observations suggest that a surge in NADH production and reductive stress in the Δagr strain induces a burst in respiration, but levels of NADH are saturating, thereby driving fermentation in the presence of oxygen." That bit of it where the authors suggest that fermentation was activated because NADH was saturating is only true under microaerobic conditions and not under oxygen rich conditions.

      Reviewer #1 (comment under Review): Data presented in Figure 5 suggest the opposite - a surge in NADH accumulation leading to a decrease in the NAD/NADH ratio, rather than a surge in the 'consumption' of NADH. Clarifying this point in the manuscript would ensure accurate representation of the findings.

      Responses to Comments 3 and a comment in the Review have been combined.

      Line 280: We thank the Reviewer for their attention to detail in picking up our error in response to question 13 related to the difference in the revised text and “response to reviewers”. We modified the text accordingly.

      “Microaerobic conditions and “consumption”: We have modified the wording and fixed the error with respect to “consumption” as pointed out by the reviewer (strikethrough/underlined):

      Line 285: “Collectively, these observations suggest that a surge in NADH consumption accumulation and reductive stress in the Δagr strain induces a burst in respiration, but levels of NADH are saturating, thereby driving fermentation under microaerobic conditions in the presence of oxygen.”

      Reviewer #2 (recommendations for the authors):

      (1) The authors are requested to revise 'we expected a lower NAD+/NADH' in line 280 to 'we expected a higher NAD+/NADH.' Additionally, what was the glucose concentration in TSB media?

      NAD+/NADH: We thank the Reviewer for their attention to detail in picking up our error. Our responses to Reviewer 1, Comment 3 above addresses this issue.

      Glucose: We modified the Methods as suggested.

    2. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (recommendations for the authors):

      The following are comments that the authors may wish to address or clarify:

      (1) The claim that respiration and fermentation occur concurrently in the agr mutant during aerobic growth is not strongly supported by the evidence presented…. However, since neither lactate production nor a difference in the NAD+/NADH ratio between the wild type and agr mutant was observed, it is challenging to assert that fermentation is occurring. Relying solely on a gene expression signature indicative of fermentation is, in my view, inadequate to conclusively establish that aerobic fermentation is taking place.

      Lactate production. The data we provide in Figure 5-E of the original manuscript (Figure 5-C in the revised manuscript) indicates that lactate production is lower in the wild-type compared to the Δagr mutant.

      The exact focus of Reviewer 1’s concern is not clearly specified, but may have been referring to how the result was described in the text:

      “Although the stimulatory effect of the agr deletion on production of the fermentation product lactate was not observed in optimally aerated broth cultures after growth to late exponential growth phase, it was confirmed for organisms grown in broth under more metabolically demanding, suboptimal aeration conditions (Figure 5E). Overall, these results are consistent with transcription-level up-regulation of respiratory and fermentative pathways in agr-deficient strains.”

      The greater sensitivity of suboptimal aeration conditions is unsurprising and relates to a low rate of fermentation during the vigorous aeration (shaking at 250 rpm) conditions commonly used to grow S. aureus. To clarify the point, we modified the text to provide additional context as follows:

      Line 271: “Although the stimulatory effect of the agr deletion on production of the fermentation product lactate was not observed in optimally aerated broth cultures after growth to late exponential growth phase, it was confirmed for organisms grown in broth under more metabolically demanding, suboptimal aeration conditions (limitations in the rate of respiration when oxygen is limiting are expected to increase overall levels of fermentation) (Figure 5C). Overall, these results are consistent with transcription-level up-regulation of respiratory and fermentative pathways in agr-deficient strains.” NAD+/NADH ratio. Extended studies of the NAD+/NADH ratio, requested by Reviewer 1 under Comments 12 and 13, document an effect of the Δagr mutant not seen in Figure 5F in the original submission. Our responses to Comments 12 and 13 below address this issue.

      (2) The mechanisms through which the ΔagrΔrot double mutant resists H2O2 are not clearly elucidated. While the authors suggest that the ΔagrΔrot double mutant expresses several genes involved in combating oxidative stress, essential genetic studies that would validate this hypothesis have not been conducted.

      The data we provide indicate 1) that wild-type strains are tolerant to peroxide and 2) that wild-type strains are able to render inducible several known reactive oxygen species (ROS)-protective genes in the presence of peroxide in a rot-dependent manner. Δagr strains, which do not demonstrate this response, are more readily killed by peroxide. Additional data indicate that increased respiration caused by deletion of agr is associated with increased endogenous ROS. Higher levels of endogenous ROS can modulate tolerance to subsequent challenge by ROS (1). Collectively, these observations support a model of Δagr-induced hyper-susceptibility in which elevation of endogenous ROS results in a suboptimal ROS-defense response that plays a role in increased peroxide lethality.

      We prefer to test this model in future studies directed at understanding the complexities of the interaction among agr-mediated tolerance, endogenous ROS levels, and induction of protective responses in S. aureus. Culprit protective genes, alone and in various combinations, will be inactivated in Δagr mutant and wild-type strains, tested in killing assays with and without agents that mitigate endogenous ROS, and subjected to RNAseq, proteomic, and metabolomic analyses, as part of a larger program to identify factors involved in S. aureus tolerance to lethal stress.

      To clarify the issue raised by the reviewer we altered the wording in the following sentences as follows:

      Line 335: “Elevated expression of protective genes suggests that the double mutant survives damage from H2O2 better because protective genes are rendered inducible (loss of Rot-mediated repression).”

      Line 440: “Details of agr-mediated protection are sketched in Figure 10. At low levels of ROS, agr is activated by a redox sensor in AgrA, RNAIII is expressed and represses the Rot repressor, thereby rendering protective genes (e.g., clpB/C, dps) inducible via an unknown mechanism (induction, candidate protective gene(s), and their connection to endogenous ROS levels are being pursued, independent of the current report).

      (3) The reason behind the agr mutant's low metabolic efficiency, as evidenced by low levels (Fig 5A) despite enhanced respiration and acetate production, is not clearly explained. Could insights from the modeling shed light on why the ATP levels are low in the agr mutant?

      Comparative modeling of central metabolic pathways, in combination with in vitro metabolic analyses of Δagr and wild-type strains, revealed the metabolic inefficiency but cannot explain it. The basis for the metabolic inefficiency conferred by agr inactivation is unknown. The possibility that aberrant sorting of cell wall surface proteins could lead to metabolic inefficiency was raised in the Discussion where we wrote:

      “Our work supports this idea by showing that increased respiration caused by deletion of agr is associated with increased ROS-mediated lethality. The basis for the metabolic inefficiency conferred by agr inactivation is unknown. Given that Δagr mutants are unable to downregulate surface proteins during stationary phase (2, 3), it is possible that deletion of agr perturbs the cytoplasmic membrane or the machinery that sorts proteins across the cell wall. In support of this notion, jamming SecY translocation machinery of E. coli results in downstream events shared with antibiotic lethality, including accelerated respiration and accumulation of ROS (4). In this scenario, the formation of a futile macromolecular cycle may accelerate cellular respiration to meet the metabolic demand of unresolvable problems caused by elevated surface sorting.”

      For clarification, we modified the text as follows:

      Line 461: “Our work supports this idea by showing that increased respiration caused by deletion of agr is associated with increased ROS-mediated lethality. How agr deficiency is connected to the corruption of downstream processes that result in metabolic inefficiency and increased endogenous ROS levels is unknown. Given that Δagr mutants are unable to downregulate surface proteins during stationary phase (2, 3), it is possible that deletion of agr perturbs the cytoplasmic membrane or the machinery that sorts proteins across the cell wall.”

      agr has been linked to defects in peptidoglycan autolysis (5). Cho et al. (2019) found that β-lactam treatment can induce a futile cycle of peptidoglycan synthesis and degradation that has been linked to increased production of endogenous ROS (6). Thus, an alternative, nonmutually exclusive route to a futile cycle and elevated endogenous ROS levels in agr-deficient cells other than surface protein dysregulation may be via decreased cell wall cross-linking. We prefer not to include this and other speculations, because they are not necessary or revealing and because they would detract from the manuscript by disrupting its sense of narrative and brevity.

      (4) The observation that menadione can protect the agr mutant from H2O2 is perplexing. The authors propose that even though menadione generates superoxide through redox cycling, this superoxide might inhibit the TCA cycle, thereby restricting respiration, which could be advantageous for the agr mutant. To substantiate this hypothesis, it would be imperative to demonstrate that a double mutant ΔagrΔacnA exhibits long-lived protection against H2O2.

      Rowe et al. (2020) definitively showed that a burst of menadioneassociated ROS inactivates the TCA cycle in S. aureus, leading to reduced respiration and ATP production (7). Both aconitase activity and ATP levels in menadione-treated cultures were complemented by the antioxidant N-acetyl cysteine. In the present work we demonstrate, using the same experimental conditions as Rowe et al., that menadione protected the Δagr mutant from peroxide killing but had little effect on the wild-type strain. Addition of N-acetyl cysteine in the presence of menadione restored H2O2 susceptibility to the Δagr mutant and had no effect on the wild-type. Collectively, these observations support the idea that menadione inactivates the TCA cycle, leading to reduced respiration, and increased protection of the Δagr mutant from peroxide killing.

      As requested, we tested whether the ΔagrΔacnA double mutant exhibits protection against H2O2. The new data we now provide (Figure 8—figure supplement 2A) show that a ΔacnA mutation completely protected the Δagr mutant from peroxide killing after growth to late exponential growth phase, but it had little if any effect on the wild-type strain. To evaluate long-lived protection, we compared survival rates of ΔagrΔacnA mutant and Δagr cells following dilution of overnight cultures and regrowth prior to challenge with H2O2, which revealed partial protection of the Δagr mutant (Figure 8— figure supplement 2B).

      We explained these results with the following:

      Line 351: “Rowe et al. (2020) showed that menadione exerts its effects on endogenous ROS by inactivating the TCA cycle in S. aureus. To determine whether this mechanism can also induce protection in the Δagr mutant, we inactivated the TCA cycle gene acnA in wild-type and Δagr strains (Figure 8—figure supplement 2). We found that ΔacnA mutation completely protected the Δagr mutant from peroxide killing after growth to late exponential growth phase but had little effect on the wild-type strain. This finding supports the idea that TCA cycle activity contributes to an imbalance in endogenous ROS homeostasis in the Δagr mutant, and that this shift is a critical factor for Δagr hyperlethality. When we evaluated long-lived protection by comparing survival rates of ΔagrΔacnA mutant and Δagr cells following dilution of overnight cultures and regrowth prior to challenge with H2O2, ΔacnA remained protective, but less so (Figure 8—figure supplement 2). These partial effects of an ΔacnA deficiency suggest that Δagr stimulates long-lived lethality for peroxide through both TCA-dependent and TCA-independent pathways.”

      (5) Figure 10 presents a model suggesting that Rot-mediated repression of respiration is essential for long-lasting resistance to H2O2 lethality. However, the connection between decreased respiration and long-lived resistance to ROS is not evident, especially considering that the respiration rate varies over the growth phase and does not seem to align with the long-lived and steady protection provided by agr. However, the authors could investigate this by examining whether inactivating qox in the agr mutant restores its resistance to H2O2. The experiments with menadione are not particularly persuasive, as menadione could have additional effects on the cells that are not accounted for.

      As requested, we tested whether the ΔagrΔqoxC double mutant exhibits protection against H2O2. qox deficiency was hyperlethal in wild-type and Δagr strains, even with the lowest concentration of H2O2 used in our assay. Indeed, surviving cells were undetectable, precluding comparison of survival differences between wild-type and Δagr mutant strains. This striking finding can be explained by prior work highlighting the profound and pleotropic effects of qox deficiency on metabolism that involve not only control of respiration but also participation in other physiological processes such as cell growth and morphological differences. For example, in Bacillus, qox deficiency decreases TCA cycle flux and increases overflow metabolism (8). Additionally, we confirmed prior work in S. aureus showing that qox deficiency decreases growth rate and yield (9, 10), dramatically increases production of pigment that functions as an oxidation shield, and decreases hemolytic activity (11). Moreover, we found that that qox deficiency results in a striking increase (~150%) in endogenous ROS in both wild-type and agr mutant cells, likely explaining the hyperlethality phenotype. Thus, interpretation of killing assay results must account for the complex and likely reciprocal interactions among Δqox-mediated metabolic changes, agrA-mediated redox sensing, and Δagrmediated changes in metabolism. Since killing data are not necessary or revealing without this information, we prefer to address the role of qox in future studies directed at understanding the complexities of the interaction among agr-mediated tolerance, endogenous ROS levels, and induction of protective responses in S. aureus.

      (6) The repeated use of the term 'agr wild type' throughout the text is somewhat distracting. It might be clearer to simply use 'wild type,' as it is implied that this refers to the agr+ genotype.

      We modified the text by replacing 'agr wild-type' with “wild-type” as suggested by the Reviewer.

      (7) In the text, the authors imply that the extended lag phase of the agr mutant is observed solely in nutrient-limited CDM. However, Figure 1 and Figure Supplement 3A reveal that the strains were actually cultivated in CDM supplemented with glucose and Casamino acids, which makes the medium rich in both carbon and nitrogen, in addition to other nutrients present in CDM. The authors should clarify the composition of the media used and assess whether the term 'nutrient-limited CDM' is accurate in this context.

      The extended lag phase of the Δagr mutant is observable in TSB but it is more easily appreciated in CDM, perhaps owing to a larger range of carbohydrates and other nutrient types (TSB a rich and complex medium for which the composition is unknown) and a higher concentration of glucose (2.5 mM versus 2.2 mM).

      For clarification, we modified line 135 as follows:

      Line 184: “Lag-time differences between strains were more obvious in experiments using less complex, chemically defined medium (CDM)…”

      (8) Figure 1 - Figure Supplement 3C represents the growth rate in terms of [OD/min]. However, it would be more accurate to calculate the growth rate (μ) based on the change in the natural logarithm of optical density (OD) relative to the corresponding change in time, using appropriate units (preferably, h⁻¹). Additionally, the method employed for measuring growth rates should be detailed in the Materials and Methods section.

      Our responses to Reviewer 2 Minor Comment 1 below address this issue.

      (9) The resolution of the inset charts in Figure 4B is poor, and the Y-axis lacks labels. The figure legend should also specify whether the flux distribution (represented by thick black arrows in Fig 4B) is predicted for the wild type or the mutant.

      We modified Figure 4B and the legend accordingly.

      (10) On Page 9, the term "RT-PCR" should be corrected to "RT-qPCR."

      We thank the Reviewer for their attention to detail in picking up our error. We modified text accordingly.

      (11) It is ambiguous whether the agr mutant is producing more acetate, based on the information provided in Figure 5B. Since the cells might have entered the post-exponential phase at 5 hours, they could start consuming acetate. Consequently, the elevated acetate concentration in the agr mutant might result from a delay in acetate consumption rather than increased production. To discern between the production and consumption of acetate, it is essential to measure acetate concentrations at earlier time points as well as the corresponding glucose concentrations in the media. This will help ascertain when the agr mutant enters the post-exponential phase. A similar concern also exists in the case of lactate (Fig 5E) since it is not clear when lactate was measured.

      As requested, we measured acetate levels at earlier time points (1, 2, 3, 4, h of growth). New Figure 5B shows that the Δagr mutant accumulated more acetate than the wild-type strain during exponential growth at 3 h, well before entry into postexponential phase (see growth curves in Figure 1—figure supplement 1).

      In the original report, lactate levels were measured at 4 h for organisms grown under suboptimal aeration conditions (see Reviewer 1, Comment 1). When we measured lactate accumulation at 3 h it remained higher in the Δagr mutant compared to the wildtype. Likewise, acetate levels at 3 h under suboptimal aeration conditions remained elevated in the Δagr mutant compared to the wild-type. These results support the idea that inactivation of agr promotes production rather than decreased consumption of acetate and lactate in the culture medium.

      (12) In Figure 5G-H, presenting the actual NAD+ and NADH values side-by-side would facilitate a more straightforward interpretation of the data by the readers.

      (13) On Page 9, the text states that respiration and fermentation lower the NAD+/NADH ratio. However, this seems contradictory as these processes would typically increase the NAD+/NADH ratio. Furthermore, it would be beneficial for the authors to provide supporting evidence for the statement made at the beginning of Page 10, which claims that there is greater consumption of NADH in the agr mutant.

      Responses to Comments 12 and 13 were grouped together.

      We thank the Reviewer for their attention to detail in picking up our error about the NAD+/NADH ratio. The ratio is expected to be elevated by increases in respiration and fermentation, not lowered, owing to increased consumption of NADH.

      Figure 5I in the submitted manuscript indicated a small but insignificant decrease in the NAD+/NADH ratio of the Δagr mutant. Thus, the NAD+/NADH ratio remained tightly bounded, but if anything was decreased, not increased.

      We explained this finding as follows:

      Line 284: “Collectively, these observations suggest that a surge in NADH production and reductive stress in the Δagr strain induces a burst in respiration and fermentation.”

      The NAD+/NADH ratio in Figure 5F of the submitted manuscript was calculated from NADH and total (NAD+/NADH) levels. As requested, we measured individual NAD+ and NADH concentrations. We found that the decrease in the NAD+/NADH ratio of the Δagr mutant was now large, significant, and largely due to a relative increase in NADH.

      We have included these new data in a revised Figure 5 in the revised version of the manuscript and clarify the relationship among the NAD+/NADH ratio, respiration, and fermentation in the Δagr mutant by modifying the wording of the text as follows:

      Line 280: “Since respiration and fermentation generally increase NAD+/NADH ratios and since these activities are increased in Δagr strains (Figure 5C and 5E-F), we expected a higher NAD+/NADH ratio relative to wild-type cells. However, we observed an increase decrease in the NAD+/NADH ratio due to a large surge in NADH accompanied by a modest drop in NAD+ compared to wild-type. Collectively, these observations suggest that a surge in NADH production and reductive stress in the Δagr strain induces a burst in respiration, but levels of NADH are saturating, thereby driving fermentation in the presence of oxygen.

      Reviewer #2 (Recommendations For The Authors):

      (1) The RNA-seq analysis revealed that the Δagr strain exhibited increased expression of genes involved in respiration and fermentation, suggesting enhanced energy generation. However, metabolic modeling based on transcriptomic data indicated a decrease in tricarboxylic acid (TCA) cycle and lactate flux per unit of glucose uptake in the Δagr mutant. Additionally, intracellular ATP levels were significantly lower in the Δagr mutant compared to the wild-type strain, despite the carbon being directed into an acetate-generating, ATP-yielding carbon "overflow" pathway. Furthermore, growth analysis in nutrient-constrained medium demonstrated a decrease in the growth rate and yield of the Δagr mutant. Given that S. aureus actively utilizes the electron transport chain (ETC) to replenish NAD pools during aerobic growth on glucose, supporting glycolytic flux and pyruvate dehydrogenase complex (PDHC) activity while restricting TCA cycle activity through carbon catabolite repression (CCR), it is suggested that the authors analyze glucose consumption rates in conjunction with the determination of intracellular levels of pyruvate, AcCoA, and TCA cycle intermediates such as citrate and fumarate. These additional experiments will provide valuable insights into the metabolic fate of glucose and pyruvate and their subsequent impact on cellular respiration and fermentation in the Δagr mutant.

      (2) The authors highlighted the importance of redox balance in Δagr cells by emphasizing the tendency of these cells to prioritize NAD+-generating lactate production over generating additional ATP from acetate. However, the results regarding acetate and lactate production in Δagr cells during aerobic growth suggest that carbon is directed towards acetate generation rather than lactate.

      Responses to Comments 1 and 2 have been combined.

      As requested, we measured glucose consumption and intracellular levels of several different metabolites in the wild-type and Δagr mutant strain. The results are consistent with the idea that increased acetogenesis and fermentation in Δagr mutant cells contribute to increased ATP production and NAD+ recycling, respectively. These two processes appear to be relatively favored over the flux of pyruvate carbon into the TCA cycle of the Δagr mutant.

      We explained our finding as follows:

      Line 288: “To help determine the metabolic fate of glucose, we measured glucose consumption and intracellular levels of pyruvate and TCA-cycle metabolites fumarate and citrate in the wild-type and Δagr mutant strains. At 4 h of growth to late-exponential phase, intracellular pyruvate and acetyl-CoA levels were increased in the Δagr mutant compared to wild-type strain, but levels of fumarate and citrate were similar (Figure 5— figure supplement 1D-E). Glucose was depleted after 4 h of growth, but glucose consumption after 3 h of growth (exponential phase) was increased in the Δagr mutant compared to the wild-type strain (Figure 5—figure supplement 1A). These observations, together with the decrease in the NAD+/NADH ratio and increase in acetate and lactate production described above, are consistent with a model in which respiration in Δagr mutants is inadequate for 1) energy production, resulting in an increase in acetogenesis, and 2) maintenance of redox balance, resulting in an increase in fermentative metabolism, lactate production, and conversion of NADH to NAD+. Increased levels of acetate compared to lactate under optimal aeration conditions suggests that demand for ATP is in excess of demand for NAD+.”

      Future work will compare additional extracellular and intracellular (e.g., formate, ethanol, acetoin) metabolites to test these and other models using a combination of approaches (e.g., mass spectrometry, nuclear magnetic resonance, genetic deletion studies, transcriptomics) and will determine the mechanisms underlying metabolic differences in wild-type and Δagr mutant strains.

      To maintain a sense of narrative we added a new subheading after the explanation of our findings:

      Line 311: “Transcriptional changes due to Δagr mutation are long-lived and result in down-regulation of H2O2-stimulated genes relative to those in an agr wild-type.”

      (3) The authors mentioned that respiration and fermentation typically reduce the NAD+/NADH ratios, and since these activities are elevated in Δagr strains (Figure 5F-G), they initially anticipated a lower NAD+/NADH ratio compared to wild-type agr cells. However, the increase in respiration and activation of fermentative pathways leads to a decrease in NADH levels, therefore resulting in an increase in the NAD+/NADH ratio.

      We have clarified the issue with new experiments and by modifying the wording as shown in the response to Reviewer 1 Comment 13.

      (4) To improve the clarity and completeness of this work, it would be advantageous for the authors to provide specific details regarding the glucose concentration in the TSB media and the aeration conditions during growth, including the flask-tomedium ratio. These additional experimental parameters are essential for ensuring the reproducibility and comprehensiveness of the study, allowing for a more precise understanding and interpretation of the observed metabolic changes in the Δagr strain.

      We modified the Methods as suggested.

      Minor comments:

      (1) The growth rate in Figure 1-figure supplement 3 should not be presented as a simple calculation of OD/min and needs to be recalculated.

      We recalculated the growth rate and modified Figure 1 as suggested. The exponential phase was used to determine growth rate (µ) from two datapoints, OD1 and OD2 flanking the linear portion of the curve, following the equation lnOD2-lnOD1/t2-t1, as described (12).

      (2) Δrot (BS1301) should be removed from Figure 2 (A) legend as it is not presented in the panel A.

      We modified Figure 2 as suggested.

      (3) The authors should specify in the Figure 3 (D) legend that the kinetics of killing by H2O2 was performed for ΔrnaIII and ΔagrBD mixtures.

      We modified Figure 3 as suggested.

      (4) In the Figure 4 legend for (C), the statement "See Supplementary file 2 for supporting information" should be changed to "See Supplementary file 3 for supporting information."

      We modified Supplementary file name as suggested.

      References cited in responses

      (1) Brynildsen MP, Winkler JA, Spina CS, MacDonald IC, Collins JJ. 2013. Potentiating antibacterial activity by predictably enhancing endogenous microbial ROS production. Nature biotechnology 31:160-165.

      (2) Morfeldt E, Taylor D, von Gabain A, Arvidson S. 1995. Activation of alphatoxin translation in Staphylococcus aureus by the trans-encoded antisense RNA, RNAIII. EMBO J 14:4569-4577.

      (3) Novick RP, Ross HF, Projan SJ, Kornblum J, Kreiswirth B, Moghazeh S. 1993. Synthesis of staphylococcal virulence factors is controlled by a regulatory RNA molecule. EMBO J 12:3967-3975.

      (4) Takahashi N, Gruber CC, Yang JH, Liu X, Braff D, Yashaswini CN, Bhubhanil S, Furuta Y, Andreescu S, Collins JJ, Walker GC. 2017. Lethality of MalE-LacZ hybrid protein shares mechanistic attributes with oxidative component of antibiotic lethality. Proc Natl Acad Sci U S A 114:9164-9169.

      (5) Fujimoto DF, Bayles KW. 1998. Opposing roles of the Staphylococcus aureus virulence regulators, Agr and Sar, in Triton X-100- and penicillin-induced autolysis. J Bacteriol 180:3724-3726.

      (6) Cho H, Uehara T, Bernhardt TG. 2014. Beta-lactam antibiotics induce a lethal malfunctioning of the bacterial cell wall synthesis machinery. Cell 159:13001311.

      (7) Rowe SE, Wagner NJ, Li L, Beam JE, Wilkinson AD, Radlinski LC, Zhang Q, Miao EA, Conlon BP. 2020. Reactive oxygen species induce antibiotic tolerance during systemic Staphylococcus aureus infection. Nat Microbiol 5:282-290.

      (8) Zamboni N, Sauer U. 2003. Knockout of the high-coupling cytochrome aa3 oxidase reduces TCA cycle fluxes in Bacillus subtilis. FEMS Microbiol Lett 226:121-126.

      (9) Halsey CR, Lei S, Wax JK, Lehman MK, Nuxoll AS, Steinke L, Sadykov M, Powers R, Fey PD. 2017. Amino acid catabolism in Staphylococcus aureus and the runction of carbon catabolite repression. mBio 8.

      (10) Hammer ND, Reniere ML, Cassat JE, Zhang Y, Hirsch AO, Indriati Hood M, Skaar EP. 2013. Two heme-dependent terminal oxidases power Staphylococcus aureus organ-specific colonization of the vertebrate host. mBio 4.

      (11) Lan L, Cheng A, Dunman PM, Missiakas D, He C. 2010. Golden pigment production and virulence gene expression are affected by metabolisms in Staphylococcus aureus. J Bacteriol 192:3068-3077.

      (12) Grosser MR, Weiss A, Shaw LN, Richardson AR. 2016. Regulatory requirements for Staphylococcus aureus nitric oxide resistance. J Bacteriol 198:2043-2055.

    1. Author Response

      eLife assessment

      This study demonstrates mRNA-specific regulation of translation by subunits of the eukaryotic initiation factor complex 3 (eIF3) using convincing methods, data, and analyses. The investigations have generated important information that will be of interest to biologists studying translation regulation. However, the physiological significance of the gene expression changes that were observed is not clear.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e, and f) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that the global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control the translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      We would like to note that the first sentence contains a typo; the correct expression is: “…of three subunits of the eIF3 complex (d, e, and h) in mammalian cells”.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      Many experiments are on-going in this direction. The original plan was to map all the effects in general and in as much detail as possible to select a few of them for future long-term projects.

      Reviewer #2 (Public Review):

      Summary:

      mRNA translation regulation permits cells to rapidly adapt to diverse stimuli by fine-tuning gene expression. Specifically, the 13-subunit eukaryotic initiation factor 3 (eIF3) complex is critical for translation initiation as it aids in 48S PIC assembly to allow for ribosome scanning. In addition, eIF3 has been shown to drive transcript-specific translation by binding mRNA 5' cap structures through the eIF3d subunit. Dysregulation of eIF3 has been implicated in oncogenesis, however the precise eIF3 subunit contributions are unclear. Here, Herrmannová et al. aim to investigate how eIF3 subcomplexes, generated by knockdown (KD) of either eIF3e, eIF3d, or eIF3h, affect the global translatome. Using Ribo-seq and RNA-seq, the authors identified a large number of genes that exhibit altered translation efficiency upon eIF3d/e KD, while translation defects upon eIF3h KD were mild. eIF3d/e KD share multiple dysregulated transcripts, perhaps due to both subcomplexes lacking eIF3d. Both eIF3d/e KD increase the translation efficiency (TE) of transcripts encoding lysosomal, ER, and ribosomal proteins. This suggests a role of eIF3 in ribosome biogenesis and protein quality control. Many transcripts encoding ribosomal proteins harbor a TOP motif, and eIF3d KD and eIF3e KD cells exhibit a striking induction of these TOP-modified transcripts. On the other hand, eIF3d KD and eIF3e KD lead to a reduction of MAPK/ERK pathway proteins. Despite this downregulation, eIF3d KD and eIF3e KD activate MAPK/ERK signaling as ERK1/2 and c-Jun phosphorylation were induced. Finally, in all three knockdowns, MDM2 and ATF4 protein levels are reduced. This is notable because MDM2 and ATF4 both contain short uORFs upstream of the start codon, and further support a role of eIF3 in reinitiation. Altogether, Herrmannová et al. have gained key insights into precise eIF3-mediated translational control as it relates to key signaling pathways implicated in cancer.

      Strengths:

      The authors have provided a comprehensive set of data to analyze RNA and ribosome footprinting upon perturbation of eIF3d, eIF3e, and eIF3h. As described above in the summary, these data present many interesting starting points for understanding additional roles of the eIF3 complex and specific subunits in translational control.

      Weaknesses:

      • The differences between eIF3e and eIF3d knockdown are difficult to reconcile, especially since eIF3e knockdown leads to a reduction in eIF3d levels.

      We agree and discuss this problem thoroughly in the corresponding section of our study.

      • The paper would be strengthened by experiments directly testing what RNA determinants allow for transcript-specific translation regulation by the eIF3 complex. This would allow the paper to be less descriptive.

      We carried out bioinformatic analysis dealing with specific RNA determinants that is presented as the last chapter of our study. A detailed, transcript-specific analysis of these determinants is underway, however, we consider them beyond the scope for this article.

      • The paper would have more biological relevance if eIF3 subunits were perturbed to mimic naturally occurring situations where eIF3 is dysregulated. For example, eIF3e is aberrantly upregulated in certain cancers, and therefore an overexpression and profiling experiment would have been more relevant than a knockdown experiment.

      This is indeed true and so far we have generated several stable cell lines individually overexpressing selected eIF3 subunits implicated in the observed cancer phenotypes. However, this is a completely different project of one of our PhD students, which will be published as a comprehensive study when completed.

      Reviewer #3 (Public Review):

      Summary:

      In this article, Hermannova et al catalog the changes in ribosome association with mRNAs when the eukaryotic translation initiation factor 3 is disrupted by knocking down subunits of the multisubunit protein. They find that RNAs relying on TOP motifs for translation, such as ribosomal protein RNAs, and RNAs encoding proteins that modify other proteins in the ER or components of the lysosome are upregulated. In contrast, proteins encoding components of MAP kinase cascades are downregulated when subunits of eIF3 are knocked down.

      Strengths:

      The authors use ribosome profiling of well-characterized mutants lacking subunits of eIF3 and assess the changes in translation that take place. They supplement the ribosome association studies with western blotting to determine protein level changes of affected transcripts. They analyze what is being encoded by the transcripts undergoing translation changes, which is important for understanding more broadly how translation initiation factor levels affect cancer cell translatomes.

      Weaknesses:

      (1) The data are presented as a catalog of effects, and the paper would be strengthened if there were a clear model tying the various effects together or linking individual subunit knockdown to cancerous phenotypes. It is unclear what the hypothesis is for cells having more MAPK activity with less of the MAPK proteins being translated, so the main findings of the paper become observational without context.

      As the signaling pathways are very complex and there is a frequent crosstalk among them (c-Jun can be activated by the ERK pathway as well as the JNK pathway, activated ERKs can phosphorylate many different transcription factors, etc.), we opted not to investigate the reported results any further in this study. As mentioned above, we have several ongoing, long-term projects aiming to elucidate the consequences of the observed changes in protein levels as well as in the phosphorylation status of the MAPK pathway constituents. The take home message of the present study is that eIF3 subunits (d and e) have control over the expression of many proteins involved in the MAPK/ERK pathway and that there is an independent effect (already present in the downregulation of eIF3h, which does not affect the MAPK protein expression) that leads to activation of the ERK pathway, which may be a direct consequence of compromised eIF3 function in general.

      (2) The conclusions drawn are presented as very generalized other than in the last paragraph, but the experiments were only done in Hela cells. Since conclusions are being made about how translation changes affect MAP kinase signaling and there is mention in the abstract that dysregulation of these subunits is observed in cancer, at least one other cell line would need to be analyzed to provide evidence that the effects of subunit knockdown aren't cell-line specific.

      There are several notes emphasizing that the data presented in this study were obtained only in HeLa cells. We agree that further research in other cell lines will be needed to confirm that what we observed is a general phenomenon. Nonetheless, as noted in the discussion, other reports have already been published strongly indicating that this phenomenon is not unique to HeLa cells (Li et al., 2021, PMID:34520790, HTR-8/SVneo cells). We will review our conclusions and further clarify that our results so far only apply to Hela cells.

      (3) It is also unclear how replicates were performed and how many replicates were performed for several experiments. Biological replicates are mentioned, but what the authors did for biological replicates isn't defined and the description of the collection of cells for polysome/ribosome footprint/RNA seq samples makes it unclear whether the "biological replicates" are samples from separate transfections (true biological replicates) or different aliquots or wells from a single transfection (technical replicates) being run over a separate gradient. If using technical replicates, the data comparing the effects of knocking down D vs E vs H subunits are substantially weakened because subunit-specific differences could be the result of non-specific events that occurred in a transfection. It's also notable that while the pooled siRNAs will increase the potency of knockdown, it is possible that one or more of the siRNAs could have off-target effects, and analyzing individual siRNAs would be better for ensuring effects are specific.

      We can reassure this reviewer that our Ribo-seq and RNA-Seq libraries were prepared from true biological replicates, grown, and transfected at different times. In fact, for each biological replicate, we used a new aliquot of cells from cryostock from the same batch and transfected the cells with the same passage number only. Multiple biological replicates were grown and all underwent a series of control experiments (polysomes, qPCR, western blot) as described in the article. Based on the results, 3 samples were selected for Ribo-Seq library preparation and 4 for RNA-Seq. We decided to add a fourth replicate for RNA-Seq to increase the data robustness, because RNA-Seq is used to normalize FPs to calculate TE, which was our main metric analyzed in this article.

      As for the usage of the siRNA pool from Dharmacon/Horizon – our current article builds on our previous studies (Wagner et al. 2014 PMID: 24912683; Wagner et al. 2016 PMID: 27924037 and Herrmannová et al. 2020 PMID: 31863585), where we thoroughly characterized the effects of downregulation of individual eIF3 subunits on the growth, translation, composition and stability of eIF3 complex and on the 43S preinitiation complex assembly and subsequent mRNA recruitment. In all of these studies, we used the same siRNAs pools, the same cells and the same transfection protocol; therefore, we are convinced that our results are as coherent and reproducible as can possibly be. We have never noticed any off-target effects. Moreover, the ON-TARGETplus siRNA technology we employed uses a patented modification pattern that reduces the incidence of off-targets by up to 90% compared to unmodified siRNA (see the supplier's website for more information).

      (4) Many of the changes in protein levels reported by Western are subtle. Data from all western blots making claims of quantitative differences should really be quantified relative to nontreated over-loading control or total protein quantified from the gel, and presented with a degree of error from biological replicates to make conclusions about differences in protein levels between samples.

      Generally speaking, we agree with the reviewer’s opinion. In the original version of our study, we felt that it was not necessary to perform a quantification analysis to support our conclusions as it was not important whether a given protein was downregulated to, for example, 60% or 70%, as long as its amount was visibly reduced. The main message resided in the general trend, i.e. that the whole pathway is affected in a similar way. Nevertheless, in order to properly address this criticism, we will provide quantifications in the revised paper.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your comments. We will further investigate the correlation between aging and the proteins eIF2β and eIF2α and include the results in the revised version.

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      We agree with the reviewer that multiple ways exist to interpret the LC3 blotting for the autophagy assessment. Thus, we analyzed the levels of p62, an autophagy substrate, and found that milton knockdown caused elevated levels of p62 (Figure 2B). Together, these results suggest that autophagic degradation is lowered.

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for your comments. We will include more analyses of the proteomic data in the next version of our manuscript. In this study, we aimed to elucidate the mechanisms by which depletion of axonal mitochondria induces proteostasis disruption prematurely. Thus, we did not investigate the roles of differentially expressed proteins in proteostasis at 21-day-old in milton knockdown. Aging disrupts proteostasis via multiple pathways: eIF2β levels may be lowered by feedback of earlier changes or via interaction with other age-related changes at 21-day-old. We will include more discussion in the next version of our manuscript.

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation. Thank you for your comment. We understand that the phosphorylation of eIF2α is inhibitory for translation initiation, as we described in page 9, Line 312-314. We propose a model in which autophagic defects caused by milton knockdown is mediate by upregulation of eIF2β, however, we are not arguing that the translational suppression in milton knockdown is caused by a reduction in p-eIF2α. We found that milton knockdown causes an increase in eIF2β, and overexpression of eIF2β copied phenotypes of milton knockdown such as autophagic defects (Figure 5 and 6). We also found that the increase in eIF2β reduces the level of p-eIF2α (Supplemental Figure 2), thus, eIF2α phosphorylation in milton knockdown may be caused by an increase in eIF2β. However, the effects of upregulation of eIF2β on the function of eIF2 complex is not fully understood. The translational suppression in milton knockdown may be caused by disruption of eIF2 complex, while it is also possible that it is mediated by a function of eIF2β that is yet-to-be-determined, or mediated by the pathways other than eIF2. We will include more details in the revised version.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%. Thank you for pointing it out. We are sorry, it was a mistake. The gradient was actually 10-50%, and we described it wrong. We will correct it in the revised version.

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript. Thank you for pointing it out. We will replace the graph.

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. A new set of experiments with technical challenges will be required to capture the spatial resolution for the axonal translation. We will work on it and hope to achieve it in the future.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for their comments. However, my request for additional experiments to consolidate this manuscript and text changes have not been addressed (point 1 and point 2), which I believe are essential for completion of this manuscript.

      The reviewer raised the question about the relevant substrates of PARG in S-phase cells (point 1). As we explained in our previous response, the most important substrate of PARG is PARP1, since we observed increased chromatin-associated PARP1 and PARylated PARP1 in cells with PARG depletion. Moreover, PARP1 or PARP1/2 depletion rescued cell lethality caused by PARG depletion. These data strongly suggest that PARP1 is the major substrate of PARG in S phase cells. Of course, PARG may have additional substrates. In the future, we will perform proteomics experiments as suggested by this reviewer to identify additional PARG substrates, which may reveal new roles of PARG in S phase progression.

      The reviewer also suggested us to re-organize our manuscript (point 2). However, we prefer to keep the manuscript as it is, since this is how the project evolved. The other reason we would like to share with the readers is the challenge to validate KO cells. This is an important lesson we learned from this study. We hope that this will raise the awareness of hypomorphic mutant cells we often use to draw conclusions about gene functions and/or genetic interactions. We understand that the current flow of our manuscript may bring some confusion. To avoid it, we included additional explanations at the beginning of this manuscript to draw attention to the readers that our initial KO cells may not be complete PARG KO cells, i.e. they may have residual PARG activity. We also included additional discussion of this important point in the Discussion section.

      Moreover, WB analysis of PARG KO clones is inconclusive, as the additional prominent band at 50 kDa could be a degradation product. The authors should check PARG levels are localization by IF, which allows detection of intact proteins and their cellular localizations, since the shorter isoform should be localized in the cytosol. WB with PARG isoforms is missing important information regarding Mw of the PARG constructs and Mw labels of western blots, which makes is difficult to evaluate this data and compare to KO. Ideally, KO and PARG isoform samples should be all on one gel for proper comparison with different antibodies.

      We appreciate the concerns raised by this reviewer. We agree that the additional prominent band at 50kDa could be a degradation product. As we explained in our previous response, despite using several PARG antibodies, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Immunostaining experiments may not be more conclusive, since IF experiments rely on the same antibodies for recognizing endogenous PARG. Additionally, even a protein mainly localizes in the cytosol, we cannot exclude the possibility that a small fraction of this protein may localize in nuclei and have nuclear functions.

      Instead, as we presented in our manuscript, we used a biochemical assay to measure PARG activity in cell lysate and showed that our initial PARG KO cells still have residual PARG activity. However, we could not detect any PARG activity in our complete/conditional PARG KO cells (cKO cells; these cells can only survive in the presence of PARP inhibitor). These data strongly suggest that PARG is essential for cell survival.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The author should evaluate the possibility of naturally occurring arrhythmia due to the geometry of the tissues, by using voltage or calcium dye.

      Answer: We thank the reviewer for this suggestion. We have performed new experiments using a voltage-sensitive fluorescent dye (i.e. FluoVolt) with data reported in the new Figure 4 + new results section “arrhythmia analysis”. Briefly, we found that our ring-shaped tissues are compatible with live fluorescence imaging. We were then able to show that our cardiac tissues beat regularly, without naturally occurring arrhythmias or extra beats. We could not detect any re-entrant waves in our tissues in the conditions offered by the speed of our camera. A specific paragraph has also been added to the discussion.

      (2) There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. We are currently expanding to other cell lines with improvement in survival (see https://insight.jci.org/articles/view/161356). We confirm that the rings do not detach. The pillar was specifically designed to avoid this (See figure 1B).

      As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance.

      Answer: We propose a system and a measurement method that do not need calibration. Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A). There is thus no influence of the distance of the camera lens.

      In order to evaluate the consistency of the mechanical properties of the hydrogel, we reproduced the experiment pictured in Figure1-Supplement 1, and measured the Young’s Modulus of three different gel solutions on different days. In the three experiments performed, we found values of 10.0-12.2 kPa, resulting in a final average value of 11.2 (+/- 0.6) kPa, coherent with the value reported in the article. We are therefore confident that the mechanical properties are consistent across and within wells. More extensive mechanical characterization of the molded gels would require the access to an Atomic Force Microscope (AFM), and is considered in the future.

      The author should address the longevity and reproducibility issues, by working on the calibration of camera lens position/distance to tissues and further optimizing the seeding conditions with hydrogels such as collagen or fibrin, and/or making sure the PEG gels have high reproducibility and consistency.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. This platform (including the design, approach and choice of polymers) allows a fast and reproducible formation of an important number of cardiac tissues (up to 21 per well in a 96-well format, meaning a potential total of about 2,000 tissues) with a limited number of cells.

      (3) The evaluation of the arrhythmia should be more extensively explained and demonstrated.

      Answer : See answer to comment 1

      (4) The results of isoproterenol should be checked as non-paced tissues should have increased beating frequency with increasing dosages. Dofetilide does not typically have a negative inotropic effect on the tissues. Please check on the cell viability before and after dosing

      Answer : We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training. We agree that above a concentration of 10nM, dofetilide shows cardiotoxicity in our tissues as tissues completely stop beating.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the general comments in the public review, I have the following specific suggestions to the authors, that would help improve the manuscript.

      (1) Please describe the protocol for preparation of cardiac rings (shown in Figure 1C) in more detail. In particular, please describe how the tissues were transferred from the mold into the 96-well plate and how are they positioned and characterized during the study.

      Answer: There is no transfer of the tissues as they directly form in the well, that is pre-equipped with the molded PEG gel (See Figure 1B and methods section). The in situ analysis is a strong asset of this platform.

      (2) Please clarify the timepoints in this study. The overall schematic in Figure 1 C shows that the rings were formed on day 22 and then studied for 14 days, while Figure 2B shows data over 20 days following seeding, and Figure 3 shows data 14 days after seeding. It appears that these were separate studies (optimization of myocyte/fibroblast ratio followed by the main study.

      Answer: Figure 1C is showing the timeline including the cardiomyocytes differentiation. hiPSC-CMs are indeed seeded in the wells 22 days after starting the differentiation, which represent the Day0 for tissue formation. We apologize for the confusion.

      (3) Please explain if the number of rings per well (Figure 2) was used as the only criterion for selecting the myocyte/fibroblast ratio, and if so, why. Were these rings also characterized for their structural and contractile properties?

      Answer: Figure 2 supplement 1 report the contractility data according to the different tested ratios, and show no differences. The number for generated ring-shaped tissues was indeed the only criterion retained.

      (4) Please provide rationale for using the dermal rather than cardiac fibroblasts.

      Answer: We had previous experience generating EHTs using dermal fibroblasts which are easier to obtain commercially. Our approach could in theory also work using cardiac fibroblasts, which we have not tested in the present study.

      (5) Figure 2 panels C-E show an interesting segregation of cardiomyocytes into a thin cylindrical layer that does not appear to contain fibroblasts and a shorter and thicker cylinder containing fibroblasts mixed with occasional myocytes. Please specify at which time point this structure forms, and how does it change over time in culture? At which time point were the images taken? It would be helpful to include serial images taken over 1-14 days of study.

      Answer: We thank the reviewer for this interesting comment. We have performed additional immunostainings (reported in Figure 2 supplement 3) on tissues at Day 1 and day 7 after seeding. The segregation appears in the 7 first days. It appears that 1 day after seeding the fibroblasts are not yet attached, although the cardiac fiber has already started to be formed. Seven days after seeding, fibroblasts are fully spread and attached, and the contractile ring is formed and well-aligned. Brightfield images are reported in Figure 1E.

      (6) In the cardiomyocyte region (Figure 2D) the cells staining for troponin seem to be only at the surfaces. The thickness of the layer is only about 30-40 µµ, so one would assume that cell viability was not an issue. Please specify and discuss the composition of this region.

      Answer: We agree but we think this is a technical issue as at the center of the tissue, tissue thickness will limit laser penetration, although at the surface (inner our outer), the laser infiltrates easily between the tissue and the PEG. Moreover, we see on the zoomed view of the tissue in Figure 2 Supplement 2 that we have a staining inside the cardiac fiber, which just appears less strong due to tissue thickness.

      (7) Please also discuss segregation in terms of possible causes and the implications of apparently very limited contact between the two cell types, i.e., how representative is this two-region morphology of native heart tissue. Also, it would be interesting to know how the segregation has changed with the change in myocyte/fibroblast ratio.

      Answer: We are not sure there is a very limited contact as the use of fibroblasts is critical to ensure the formation of tissues (i.e. no tissues can be formed if we avoid the use of fibroblasts). We agree that these ring-shaped cardiac tissues are not especially representative of a native heart tissue in terms of interactions between several cell types. They were developed as a surrogate for physiopathological and pharmacological experiments (see a recent application in https://insight.jci.org/articles/view/161356)

      (8) There is interest and demonstrated ability to culture engineered cardiac tissues over longer periods of time. Please comment what was the rationale for selecting 14-day culture and if the system allows longer culture durations.

      Answer: In line with this comment, we have studied the contractile parameters of our rings 28 days after seeding and compared to their contractile parameters at D14. We found a slight increase for all the parameters, which is significant for the maximum contraction speed. Nevertheless, the data is much more variable and the number of tissues is lower (29 for D14 against 17 for D28). Therefore, we demonstrated that long-term culture of our tissues is possible, however not yet optimized. Hence, the following physiological and pharmacological tests have been done at D14.

      (9) Figure 3 documents the development of contractile parameters over 14 days of culture. Would it be possible to replace the arbitrary units with the actual values? Also, would it be possible to include the corresponding images of the rings taken at the same time points, to show the associated changes in ring morphologies.

      Answer: Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A): it is a ratio, thus without unit. Corresponding images can be seen in Figure 1 E.

      (10) The measured contraction stress, strain, and the speeds of contraction and relaxation improve from day 1 to day 7 and then plateau (Figure 3, Supplemental Figure 3. Please discuss this result.

      Answer: The new immunostainings performed on tissues at Day 1 and Day 7 show the progressive alignment of the cardiomyocytes and the muscular fibers, with an almost complete organization at Day 7.

      (11) The beating frequency does not appear to markedly change over time, while Figure 3B shows strong statistical significance (***) throughout the 14-day period. Please check/confirm.

      Answer: We confirm this result.

      (12) Please comment on the lack of effect of isoproterenol on beating frequency.

      Answer: We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training.

      (13) Please compare the contractile function of cardiac tissues measured in this study with data reported for other iPSC-derived tissue models.

      Answer : A specific paragraph tackles this aspect in the discussion

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews

      We thank the reviewers for their insightful comments and helpful suggestions that allowed us to improve the manuscript.

      Reviewer #1:

      Thermogenic adipocyte activity associate with cardiometabolic health in humans but decline with age. Identifying the underlying mechanisms of this decline is therefore highly important.

      To address this task, Holman and co-authors investigated the effects of two major determinants of thermogenic activity: cold, which induce thermogenic de novo differentiation as well as conversion of dormant thermogenic inguinal adipocytes: and aging, which strongly reduce thermogenic activity. The authors study young and middle-aged mice at thermoneutrality and following cold exposure.

      Using linage tracing, the authors conclude that the older group produce less thermogenic adipocytes from progenitor differentiation. However, they found no differences between thermogenic differentiation capacity between the age groups when progenitors are isolated and differentiated in vitro. This finding is consistent with previous findings in humans, demonstrating that progenitor cells derived from dormant perirenal brown fat of humans differentiate into thermogenic adipocytes in vitro. Taken together, this underscores that age-related changes in the microenvironment rather than autonomous alterations in the ASPCs explain the age-related decline in thermogenic capacity. This is an important finding in terms of identifying new approaches to switch dormant adipocytes into an active thermogenic phenotype.

      To gain insight into the age-related changes, the authors use single cell and single nuclei RNA sequencing mapping of their two age groups, comparing thermoneutral and cold conditions between the two groups. Interestingly, where the literature previously demonstrated that de novo lipogenesis (DNL) occurs in relation to thermogenic activation, the authors show that DNL in fact is activated in a white adipocyte cell type, whereas the beige thermogenic adipocytes form a separate cluster.

      Considering recent findings, that adipose tissue contains several subtypes of ASPCs and adipocytes, mapping the changes at single cell resolution following cold intervention provides an important contribution to the field, in particular as an older group with limited thermogenic adaptation is analyzed in parallel with a younger, more responsive group. This model also allowed for detection of microenvironment as a determining factor of thermogenic response.

      The use of only two time points (young and middle-aged) along the aging continuum limits the conclusions that can be made on aging as the only driver of the observed differences between the groups. It should for example be noted that the older mice had higher weights and larger fat depots, thus the phenotype is complex and this should be taken into consideration when interpreting the data.

      In conclusion, this study provides an important resource for further studies on how to reactivate dormant thermogenic fat and potentially improve metabolic health.

      (1) The authors claim "Aging impairs cold-induced beige adipogenesis and adipocyte metabolic reprogramming". It is previously established in humans that aging strongly associate with a decline in thermogenic capacity. With this in mind, it is easy to accept that the reduced browning observed in the older group is due to age. However, the older group also have larger adipose depots, which also can be a confounding factor. I, therefore, recommend bringing this into the discussion and putting more focus on the complexity of the phenotype. For example, it could be discussed whether the de novo lipogenesis less due to that the adipocytes of older mice is already filled with more lipids. Additional time points along the aging continuum would be needed to make a strong conclusion about age as the determinant, but even so, aging is complex and further definitions and discussion would be needed.

      We agree with the reviewer regarding the confounding effect of body weight changes. We have added a paragraph to the discussion (pasted below) to comment on the complexity of the phenotype and the contributing role of linked changes in body weight/composition.

      “Aging is a complex process, and unsurprisingly, many pathways have been linked to the aging-related decline in beiging capacity. For example, increased adipose cell senescence, impaired mitochondrial function, elevated PDGF signaling and dysregulated immune cell activity during aging diminish beige fat formation (Benvie et al., 2023; Berry et al., 2017; Goldberg et al., 2021; Nguyen et al., 2021). Of note, older mice exhibit higher body and fat mass, which is associated with metabolic dysfunction and reduced beige fat development. While the effects of aging and altered body composition are difficult to separate, previous studies suggest that the beiging deficit in aged mice is not solely attributable to changes in body weight (Rogers et al., 2012). Further studies, including additional time points across the aging continuum may help clarify the role of aging and ascertain when beiging capacity decreases.”

      (2) The study would gain from more comparisons to existing human studies and discussion on the translation potential of the findings. For example, how does the adipocyte subtypes identified in the current study translate to subtypes identified in human adipose tissue (e.g. Emont et al).

      We analyzed the human adipose tissue atlas from Emont et al. 2022 (PMID: 35296864). We did not find any obvious homologous human adipocyte subtypes. However, this and other available human single cell studies have not investigated the effects of cold exposure on white adipose tissue depots, which may be necessary to reveal DNL-high and especially beige adipocytes.

      (3) The group has contributed multiple studies demonstrating that Prdm16 is a major inducer of a thermogenic phenotype, and the literature shows that Prdm16 promote a thermogenic phenotype in favour of a fibrogenic aging phenotype. It would therefore be interesting to see how Prdm16 is regulated in the current data set, across adipocytes subtypes, age groups and temperature conditions.

      We thank the reviewer for this comment. Previous studies showed that PRDM16 protein and not mRNA levels are downregulated during aging (Wang et al., 2019, Cell Metab, PMID: 31155495; Wang et al., 2022, Nature, PMID: 35978186). Consistent with this, we did not observe an agingassociated reduction in Prdm16 mRNA levels in adipocytes in our dataset. We did observe enrichment of Prdm16 mRNA levels in beige adipocytes relative to other adipocyte clusters. We included these data in Fig. 5F.

      (4) In Figure 1, it is difficult to understand why the 6 weeks cold exposure is not shown in relation to the thermoneutrality, 3 days and 2-week cold exposure? It would be useful to have this in the same graph relating the levels and showing all four marker genes for all time points.

      These experiments were done at different times using separate groups of mice. We have now clarified this in the figure legend.

      (5) The older mice had larger inguinal fat depots, suggesting more lipids stored. The morphology of adipose tissue has previously been shown to be modulated by cold acclimation and is also the main similarity between brown adipose tissue in adult humans and young mice beige adipose tissue. Fig S2b suggests smaller adipocytes in the young group. It would also be useful, for comparison to published data, if authors show tissue sections with H&E of their model.

      Good point. We added panels showing H&E staining of serial iWAT sections, showing changes in tissue morphology across age and temperature conditions (Figure S1F).

      (6) The authors use t-tests to compare the differences induced by e.g. cold or min vs max cell culture media etc, within each age group. However, in my opinion, a two-way Anova with post-tests would be more informative as this would allow for testing the effects of the two age categories on any quantitative variable and allow for addressing whether there is an interaction between the categories.

      Following the reviewer’s recommendation, we applied two-way ANOVA with a Tukey correction for multiple comparisons for categorical comparisons with different age groups and conditions. P values from all significant multiple comparison tests are now included within the methods section.

      (7) In Figure 5F, please include Adipoq expression between clusters and please add a reference to why Nnat is considered a canonical white adipocyte marker.

      We added Adipoq to the violin plot in Figure 5F, showing differential expression across adipocyte clusters. We included a line in the results section to highlight this observation:

      “Interestingly, Adiponectin (Adipoq) was differentially expressed across adipocyte clusters, with higher levels in Npr3-high and DNL-high cells.”

      We removed “canonical” and added references for Nnat and Lep as white marker genes.

      (8) After 14 days of cold exposure, it looks like the DNL high population divides into two populations, did the authors explore if there was any differences between these clusters?

      We also noticed this apparent division and explored this question. However, upon increasing the resolution for clustering and splitting the DNL high population, there were no obvious differentially expressed genes that defined the two subclusters. Thus, we opted to keep them together.

      (9) As cold treatment transform a subset of cells, can authors perform a data-driven analysis to visualize the directions in their single nuclei data sets by using monocle pseudotime and/or velocity analyses?

      This is a good question. We spent a long time trying to address this question using several trajectory and pseudotime analysis methods, including Velocity (scVelo), Slingshot and Dynoverse. Unfortunately, we were unable to obtain concordant results using at least two different methods and felt that the analyses were unreliable.

      Reviewer #2:

      This manuscript focused on why aging leads to decreased beiging of white adipose tissue. The authors used an inducible lineage tracing system and provided in vivo evidence that de novo beige adipogenesis from Pdgfra+ adipocyte progenitor cells is blocked during early aging in subcutaneous fat. Single-cell RNA sequencing of adipocyte progenitor cells and in vitro assays showed that these cells have similar beige adipogenic capacities in vitro. Single-cell nucleus RNA sequencing of mature adipocytes indicated that aged mice have more Npr3 high-expressing adipocytes in the subcutaneous fat from aged mice.

      Meanwhile, adipocytes from aged mice have significantly lower expression of genes involved in de novo lipogenesis, which may contribute to the declined beige adipogenesis.

      The mechanism that leads to age-related impairment of white adipose tissue beiging is not very clear. The finding that Pdgfra+ adipocyte progenitor cells contribute to beige adipogenesis is novel and interesting. It is more intriguing that the aging process represses Pdgfra+ adipocyte progenitor cells from differentiating into beige adipocytes during cold stimulation. Mature adipocytes that have high de novo lipogenesis activity may support beige adipogenesis is also novel and worth further pursuing. The study was carried out with a nice experimental design, and the authors provided sufficient data to support the major conclusions. I only have a few comments that could potentially improve the manuscript.

      (1) It is interesting that after three days of cold exposure, aged mice also have much fewer beige adipocytes. Is de novo adipogenesis involved at this early stage? Or does the previous beige adipocyte that acquired white morphology have a better "reactivation" in young mice? It would be nice if the author could discuss the possibilities.

      This is a good question. We did not evaluate beige adipogenesis at the 3d timepoint. However, a previous study demonstrates that 3d of cold exposure is sufficient to promote de novo beige adipogenesis (Wang et al., Nat Med. 2013, PMID: 23995282). We observed that beige adipogenesis from Pdgfra+ cells are a relatively minor contributor to beige adipocyte development, even after long term cold exposure in young mice. Based on these data, we presume that beige adipocyte activation (or re-activation) is the dominant mechanism for beige adipocyte development.

      To clarify this point, we have included the following lines in the manuscript:

      “Previous studies in mice using an adipocyte fate tracking system show that a high proportion of beige adipocytes arise via the de novo differentiation of ASPCs as early as 3 days of cold (Wang et al., 2013).”

      “Based on these findings, we presume that mature (dormant beige) adipocytes serve as the major source of beige adipocytes in our cold-exposure paradigm. However, long-term cold exposure also recruits smooth muscle cells to differentiate into beige adipocytes; a process that we did not investigate here (Berry et al., 2016; Long et al., 2014; McDonald et al., 2015; Shamsi et al., 2021).”

      (2) Is the absolute number of Pdgfra+ cells decreased in aged mice? It would be nice to include quantifications of the percentage of tomato+ beige adipocytes in total tomato+ cells to reflect the adipogenic rate.

      We presented FACS quantification of tdTomato+/Pdgfra+ cells in Fig. 2B. We added a graph showing the percentage of Pdgfra+ cells of total live, lin- cells in adipose tissue; this showed no difference between young and aged mice. We did not perform FACS quantification of tdTomato+ beige adipocytes due to the technical challenges with sorting adipocytes. Quantification of total tdTomato+ cells was also unreliable and inconsistent due to the widespread labeling of fibroblasts, blood vessels, along with traced adipocytes. Thus, we did not include this analysis.

      (3) Line 112, the sentence seems to be not finished.

      This has been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewers’ Public Comments

      We are grateful for the reviewers’ comments. We have modified the manuscript accordingly and detail our responses to their major comments below.

      (1) Reviewer 2 was concerned that transformation of continuous functional data into categorical form could reduce precision in estimating the genetic architecture.

      We agree that transforming continuous data into categories may reduce resolution, but it also improves accuracy when the continuous data are affected by measurement noise. In our dataset, many genotypes are at the lower bound of measurement, and the variation in measured fluorescence among these genotypes is largely or entirely caused by measurement noise. By transforming to categorical data, we dramatically reduced the effect of this noise on the estimation of genetic effects. We modified the results and discussion sections to address this point.

      (2) Reviewer 2 asked about generalizability of our findings.

      Because our paper is the first use of reference-free analysis of a 20-state combinatorial dataset, generalizability is at this point unknown. However, a recent manuscript from our group confirms the generality of the simplicity of genetic architecture: using reference-free methods to analyze 20 published combinatorial deep mutational scans, several of which involve 20-state libraries, we found that main and pairwise effects account for virtually all of the genetic variance across a wide variety of protein families and types of biochemical functions (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057). Concerning the facilitating effect of epistasis on the evolution of new functions, we speculate that this result is likely to be general: we have no reason to think that the underlying cause of this observation – epistasis brings genotypes with different functions closer in sequence space to each other and expands the total number of functional sequences – arises from some peculiarity of the mechanisms of steroid receptor DBD folding or DNA binding. However, we acknowledge that our data involve sequence variation at those sites in the protein that directly mediate specific protein-DNA contact; it is plausible that sites far from the “active site” may have weaker epistatic interactions and therefore have weaker effects on navigability of the landscape. We have addressed these issues in the discussion.

      (3) Reviewer 3 asked “in which situation would the authors expect that pairwise epistasis does not play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape?”

      The question addressed in our paper is not whether epistasis shapes steps, trajectories or connectedness in sequence space but how it does so and what its particular effects are on the evolution of new functions. The dominant view in the field has been that the primary role of epistasis is to block evolutionary paths. We show, however, that in multi-state sequence space, epistasis facilitates rather than impedes the evolution of new functions. It does this by increasing the number of functional genotypes and bringing genotypes with different functions closer together in sequence space. This finding was possible because of the difference in approach between our paper and prior work: most prior work considered only direct paths in a binary sequence space between two particular starting points – and typically only considering optimization of a single function – whereas we studied the evolution of new functions in a multi-state amino acid space, under empirically relevant epistasis informed by complete combinatorial experiments. The result is a clear demonstration that the net effect of real-world levels of epistasis on navigability of the multidimensional sequence landscape is to make the evolution of new functions easier, not harder.

      (4) Reviewer 3 asked for “an explanation of how much new biological results this paper delivers as compared with the paper in which the data were originally published.”

      Starr 2017 did not use their data to characterize the underlying genetic architecture of function by estimating main and epistatic effects of amino acid states and combinations; it also did not evaluate the importance of epistasis in generating functional variants, determining the transcription factor’s specificity, or shaping evolutionary navigability on the landscape.

      (5) Reviewer 3 requested an explanation of how the results would have been (potentially) different if a reference-based approach were used, and how reference-based analysis compares with other reference-free approaches to estimating epistasis.

      This topic has been covered in detail in a recent manuscript from our group (Park et al. Biorxiv 2023.09.02.556057). Briefly, reference-free approaches provide the most efficient explanation of an entire genotype-phenotype map, explaining the maximum amount of genetic variance and reducing sensitivity to experimental noise and missing genotypes compared to reference-based approaches. Reference-based approaches tend to infer much more epistasis, especially higher-order epistasis, because measurement error and local idiosyncrasy near the wild-type sequence propagate into spurious high-order terms. Reference-based analyses are appropriate for characterizing only the immediate sequence neighborhood of a particular “wild-type” protein of interest. Reference-free approaches are therefore best suited to understanding genotype-phenotype landscapes as a whole. We have clarified these issues in the revised discussion.

      (6) Reviewer 3 suggested that the comparison between the full and main-effects-only model should involve a re-estimation of main effects in the latter case.

      This is indeed what we did in our analysis. We have clarified the description in the results and methods sections to make this clear.

      (7) Reviewer 3 asked about the applicability of the approach to data beyond those analyzed in the present study and requirements to use it.

      Our approach could be used for any combinatorial DMS dataset in which the phenotypic data are categorical (or can be converted to categorical form). Complete sampling is not required: a virtue of reference-free analysis is that by averaging the estimated effects of states and combinations over all variants that contain them, reference-free analysis is highly robust to missing data (except at the highest possible order of epistasis, where only a single variant represents a high-order effect) as long as variant sampling is unbiased with respect to phenotype. All the required code are publicly available at the github link provided in this manuscript. We have also described a general form of reference-free analysis for continuous data and applied it to 20 protein datasets in a recent publication (Park et al. Biorxiv 2023.09.02.556057).

      (8)Reviewer 3 suggested that the text could be shortened and made less dense.

      We agree and have done a careful edit to streamline the narrative.

      Response to Reviewers’ Non-Public Recommendations

      (1) Reviewer 1 noted that specific epistatic effects might in some cases produce global nonlinearities in the genotype-phenotype relationship. They then asked how our results might change if we did not impose a nonlinear transformation as part of the genotype-phenotype model. The reviewer’s underlying concern was that the non-specific transformation might capture high-order specific epistatic effects and thus reducing their importance.

      Because our data are categorical, we required a model that characterizes the effect of particular amino acid states and combinations on the probability that a variant is in a null, weak, or strong activation class. A logistic model is the classic approach to this kind of analysis. The model structure assumes that amino acid states and combinations have additive effects on the log-odds of being in one functional class versus the lower functional class(es); the only nonlinear transformation is that which arises mathematically when log-odds are transformed into probability through the logistic link function. Thinking through the reviewer’s comment, we have concluded that our model does not make any explicit transformation to account for nonlinearity in the relationship between the effects of specific sequence states/combinations and the measured phenotype (activation class). If additional global nonlinearities are present in the genotype-phenotype relationship – such as could be imposed by limited dynamic range in the production of the fluorescence phenotype or the assay used to measure it – it is possible that the sigmoid shape of the logistic link function may also accommodate these nonlinearities. We have noted this part in the revised manuscript.

      (2) Reviewer 1 observed that our model seems to prefer sets of several pairwise interactions among states across sites rather than fewer high-order interactions among those same states.

      This finding arises because the pattern of phenotypic variation across genotypes in our dataset is consistent with that which would be produced by pairwise interactions rather than by high-order interactions. In a reference-free framework, these patterns are distinct from each other: a group of second-order terms cannot fit the patterns produced by high-order epistasis, and high-order terms cannot fit the pattern produced by pairwise interactions. Similarly, main-effect terms cannot fit the pattern of phenotypes produced by a pairwise interaction, and a pairwise epistatic term cannot fit the pattern produced by main effects of states at two sites. For example, third-order terms are required when the genotypes possessing a particular triplet of states deviate from that expected given all the main and second-order effects of those states; this deviation cannot be explained by any combination of first- and second-order effects.

      We explain this point in detail in our recent manuscript (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057) and we summarize it here. Consider the simple example of two sites with two possible states (genotypes 00, 01, 10, and 11). If there are no main effects and no pairwise effects, this architecture will generate the same phenotype for all four variants – the global average (or zero-order effect). If there are pairwise effects but no main effects, this architecture will generate a set of phenotypes on which the average phenotype of genotypes with a 0 at the first site (00 and 01) equals the global average – as does the average of those with 0 at the second site (00 and 10). The epistatic effect causes the individual genotypes to deviate from the global average. This pattern can be fit only by a pairwise epistatic term, not by first-order terms. Conversely, if there are main effects but no pairwise effects, then the average phenotype of genotypes 00 and 01 will deviate from the global average (by an amount equal to the first-order effect), as will the average of (00 and 10): the phenotype of each genotype will be equal to the sum of the relevant first-order effects for the state it contains. This pattern cannot be fit by second-order model terms. The same logic extends to higher orders: a cluster of second-order terms cannot explain variation generated by third-order epistasis, because third-order variation is by definition is the deviation from the best second-order model.

      (3) Reviewer 1 suggested several places in the text where citations to prior work would be appropriate.

      We appreciate these suggestions and have modified the manuscript to refer to most of these works.

      (4) Reviewer 1 pointed to the paper of Gong et al eLife 2013 and asked whether it is known how robust the proteins in our study are to changes in conformation/stability compared to other proteins, and whether this might impact the likelihood of observing higher-order epistasis in this system.

      The DBDs that we study here are very stable, and previous work shows that mutations affect DNA specificity primarily by modifying the DBD’s affinity rather than its stability (McKeown et al., Cell 2014). Additionally, Gong et al.’s findings pertain to a globally nonlinear relationship between stability and function, which arises from the Boltzmann relationship between the energy of folding and occupancy of the folded state. Because our data are categorical – based on rank-order of measured phenotype rather than fluorescence as a continuous phenotype – the kind of global nonlinearity observed in Gong’s study are not expected to produce spurious estimates of epistasis in our work. We have modified the discussion to discuss the point.

      (5) Reviewer 1 asked a) why the epistatic models produce landscapes on which variants have fewer neighbors on average than main-effects only models and b) why the average distance from all ERE-specific nodes to all SRE-specific nodes is greater with epistasis (but the average distance from ERE to nearest SRE is lower with epistasis).

      In the main effects-only landscape, the functional genotypes are relatively similar to each other, because each must contain several of the states that contribute the most to a positive genetic score. Moreover, ERE-specific nodes are similar to each other, and SRE-specific nodes are similar to each other, because each must contain one or more of a relatively small number of specificity-determining states. When epistasis is added to the genetic architecture, two things happen: 1) more genotypes become functional because there are more combinations that can exceed the threshold score to produce a functional activator and 2) these additional functional variants are more different from each other – in general, and within the classes of ERE- or SRE-specific variants – because there are now more diverse combinations of states that can yield either phenotype. As a result, a broader span of sequence space is occupied, but ERE- and SRE-specific variants are more interspersed with each other. This means that the average distance between all pairs of nodes is greater, and this applies to all ERE-SRE pairs, as well. However, the interspersing means that the closest single SRE to any particular ERE is closer than it was without epistasis. We have added this explanation to the main text.

      (6) Reviewer 2 asked us to explain why average path length increases with pairwise epistasis as the strength of selection for specificity increases.

      This behavior occurs because of the existence of a local peak in the pairwise model. Genotypes on this peak contained few connections to other genotypes, all of which were less SRE specific. Thus, with strong selection, i.e. high population size, the simulations became stuck on the local peak, cycling among the genotypes many times before leaving, resulting in a large increase in the mean step number. As shown in the rest of the figure, when the longest set of paths are removed, there are still differences in the average number of steps with and without epistasis. This issue is described in the methods section.

      (7) Reviewers made several suggestions for clarity in the text and figures.

      We have modified the paper to address all of these comments.

      (8) Reviewer 3 stated that the code should be available.

      The code is available at https://github.com/JoeThorntonLab/DBD.GeneticArchitecture.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to understand the relationship between the development of large trunks and longirrostrine mandibles in bunodont proboscideans of Miocene, and how it reflects the variation in diet patterns.

      Strengths:

      The study is very well supported, written, and illustrated, with plenty of supplementary material. The findings are highly significant for the understanding of the diversification of bunodont proboscideans in Asia during Miocene, as well as explaining the cranial/jaw disparity of fossil lineages. This work elucidates the diversification of paleobiological aspects of fossil proboscideans and their evolutionary response to open environments in the Neogene using several methods. The authors included all Asian bunodont proboscideans with long mandibles and I suggest that they should use the expression "bunodont proboscideans" instead of gomphotheres.

      Weaknesses:

      I believe that the only weakness is the lack of discussion comparing their results with the development of gigantism and long limbs in proboscideans from the same epoch.

      Thank you for your comprehensive review and positive feedback on our study regarding the co-evolution of feeding organs in bunodont proboscideans during the Miocene. We appreciate your suggestion, and have decided to use the term "bunodont elephantiforms" (for more explicit clarification, we use elephantiforms to exclude some early proboscideans, like Moeritherium, ect.) instead of "gomphotheres," and we will make this change in our revised manuscript. We also appreciate the potential weakness you mentioned regarding the lack of discussion comparing our results with the development of gigantism and long limbs in proboscideans from the same epoch. We agree with the reviewer’s suggestion, and we are aware that gigantism and long limbs are potential factors for trunk development. Gigantism resulted in the loss of flexibility in elephantiforms, and long limbs made it more challenging for them to reach the ground. A long trunk serves as compensation for these limitations. limb bones were rare to find in our material, especially those preserved in association with the skull.

      Reviewer #2 (Public Review):

      This study focuses on the eco-morphology, the feeding behaviors, and the co-evolution of feeding organs of longirostrine gomphotheres (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) which are characterised by their distinctive mandible and mandible tusk morphologies. They also have different evolutionary stages of food acquisition organs which may have co-evolve with extremely elongated mandibular symphysis and tusks. Although these three longirostrine gomphothere families were widely distributed in Northern China in the Early-Middle Miocene, the relative abundances and the distribution of these groups were different through time as a result of the climatic changes and ecosysytems.

      These three groups have different feeding behaviors indicated by different mandibular symphysis and tusk morphologies. Additionally, they have different evolutionary stages of trunks which are reflected by the narial region morphology. To be able to construct the feeding behavior and the relation between the mandible and the trunk of early elephantiformes, the authors examined the crania and mandibles of these three groups from the Early and Middle Miocene of northern China from three different museums and also made different analyses.

      The analyses made in the study are:

      (1) Finite Element (FE) analysis: They conducted two kinds of tests: the distal forces test, and the twig-cutting test. With the distal forces test, advantageous and disadvantageous mechanical performances under distal vertical and horizontal external forces of each group are established. With the twig-cutting test, a cylindrical twig model of orthotropic elastoplasity was posed in three directions to the distal end of the mandibular task to calculate the sum of the equivalent plastic strain (SEPS). It is indicated that all three groups have different mandible specializations for cutting plants.

      (2) Phylogenetic reconstruction: These groups have different narial region morphology, and in connection with this, have different stages of trunk evolution. The phylogenetic tree shows the degree of specialization of the narial morphology. And narial region evolutionary level is correlated with that of character-combine in relation to horizontal cutting. In the trilophodont longirostrine gomphotheres, co-evolution between the narial region and horizontal cutting behaviour is strongly suggested.

      (3) Enamel isotopes analysis: The results of stable isotope analysis indicate an open environment with a diverse range of habitats and that the niches of these groups overlapped without obvious differentiation.

      The analysis shows that different eco-adaptations have led to the diverse mandibular morphology and open-land grazing has driven the development of trunk-specific functions and loss of the long mandible. This conclusion has been achieved with evidence on palaecological reconstruction, the reconstruction of feeding behaviors, and the examination of mandibular and narial region morphology from the detailed analysis during the study.

      All of the analyses are explained in detail in the supplementary files. The 3D models and movies in the supplementary files are detailed and understandable and explain the conclusion. The conclusions of the study are well supported by data.

      We appreciate your detailed and insightful review of our study. Your summary accurately captures the essence of our research, and we are pleased to note that multiple research methods were used to demonstrate our conclusions. Your recognition of the evidence-based conclusions from paleoecological, feeding behavior reconstruction, and morphological analyses reinforces the validity of our findings. Once again, we appreciate your time and thoughtful reviews.

      Reviewer #1 (Recommendations For The Authors):

      Thank you very much for the invitation to review this amazing manuscript. It is very well written and supported, and I have only minor suggestions to improve the text:

      (1) Some references are not in chronological sequence in the text, and this should be reviewed.

      We greatly appreciate the positive comments of the reviewer. We revised the reference of the manuscript as the reviewer’s suggestion.

      (2) I suggest the use of the expression "bunodont proboscideans" instead of Gomphotheres because there is no agreement if Amebelodontidae and Choerolophodontidae are within Gomphotheriidae, as well as some brevirrostrine bunodont proboscideans from South America. So I think it is ok to use "Gomphotheriidae", but not gomphotheres to refer to all bunodont proboscideans included in the study.

      The reviewer is correct. Using “gomphotheres” to refer to these three groups is inappropriate. We have replaced “gomphotheres” with "bunodont elephantiforms" throughout the entire manuscript. Here, we use “elephantiforms”, not “proboscideans”, to avoid confusion with some early proboscidean members like Moeritherium, ect.

      (3) I was expecting some discussion on the development of large trunks related to the gigantism in these bunodont proboscideans, regarding the huge skulls and the columnar limbs.

      We appreciate this suggestion, and we are aware that gigantism is a potential factor for trunk development. It is difficult to compare the three groups (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) in terms of their weight and limb bone length, because in our material, limb bones were rarely found, especially those associated with cranial material. Nevertheless, at this stage, all elephantiforms had significantly enlarged cranial sizes and limb bone lengths compared to early members like Phiomia. Gigantism caused the loss of flexibility in elephantiforms, and even the long limbs made it more difficult for an elephantiform to reach the ground. A long trunk compensates for this evolutionary change. Exploring these aspects further is a part of our future work.

      (4) The reference to Alejandro et al should be replaced by Kramarz et al (and the correct surname of the authors). The name and surname of this reference need to be corrected. The correct names are Kramarz, A., Garrido, A., Bond, M. 2019. Please correct this in the text too.

      We thank the reviewer for catching this error. This reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      I believe your paper will lead to other studies on other Proboscidean groups on the evolution of the mandible and trunk. There are some corrections in the text:

      • In line 199 in the text in pdf, "Tassy, 1994" should be "Tassy, 1996".

      • In line 241, "studied" should be "studies"

      • In line 313, "," after the word "tool" should be "."

      We appreciate the reviewer for pointing these errors out and have revised these based on the suggestions.

      • In the References, you write "et al." in some references. You should write the names of all of the authors.

      • In the References: "Lister AM. 2013" and "Shoshani&Tassy" are not referenced in the text.

      • In the References: "Tassy P. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn." should be "Tassy P. 1994. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn. 112, 1-2, 101-117" and replaced before "Tassy P. 1996".

      We appreciate the reviewer’s suggestions and have revised these references.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      The authors provided experimental data in response to my comments/suggestions in the revision. Overall, most points were appropriate and satisfactory, but some issues remain.

      (1) It is not fully addressed how atypical survivors are generated independently of Rad52-mediated homologous recombination.

      The newly provided data indicate that the formation of atypical telomeres is independent of the Rad52 homologous recombination pathway.

      "The atypical telomeres clones exhibit non-uniform telomere pattern", but the TG-hybridized signals after XhoI digestion are clear and uniform.

      "Atypical telomere" clones may carry circular chromosomes embedded with short TG repeats, rather than linear chromosomes. In other words, atypical telomeres may differ from telomeres, the ends of chromosomes. Is atypical telomere formation dependent on NHEJ? Given that "two chromosomes underwent intra-chromosomal fusions" (Line 248), are atypical telomere clones detected frequently in SY13 cells containing two chromosomes?

      We thank the reviewer’s questions. Frankly, we have not been able to determine the chromosome structures in these so-called "atypical survivors". As we mentioned in the manuscript, there could be mixed telomere structures, e.g. TG tract amplification, intro-chromosome telomere fusion and inter-chromosome telomere fusion. Worse still, these 'atypical survivors' may not have maintained a stable genome, and their karyotype may have undergone stochastic changes during passages. To avoid misunderstanding, we change the term "atypical" to "uncharacterized" in the revised manuscript.

      We have previously shown that deletion of YKU70 does not affect MMEJ-mediated intra-chromosome fusion in single-chromosome SY14 cdc13Δ cells (Wu et al., 2020). In SY12 cells, double knockout of TLC1 and YKU resulted in synthetic lethality, and we were unable to continue our investigation. The result of synthetic lethality of TLC1 and YKU70 double deletion was shown in the Figure 7B in the reviewed preprint version 1, and the result was not included in the reviewed preprint version 2 in accordance with the reviewer's instructions.

      "Atypical” survivors could be detected in SY13 cells (Figure 1D), but the frequency of their formation in the SY13 strain appeared to be lower than in SY12. As one can imagine, SY13 contains two chromosomes and its survivors should have a higher frequency of intra-chromosome fusions.

      (2) From their data, it is possible that X and Y elements influence homologous recombination, type 1 and type 2 (type X), at telomeres. In particular, the presence of X and Y elements appears to be important for promoting type 1 recombination. In other words, although not essential, subtelomeres have some function in maintaining telomeres. I suggest that the authors include author response image 4 in the text. They could revise their conclusion and the paper title accordingly.

      According to this suggestion, we have included author response image 4 in the revised manuscript as Figure 2E, Figure 5D, Figure 6C and Figure 6E. Accordingly, we have changed the title as “Elimination of subtelomeric repeat sequences exerts little effect on telomere essential functions in Saccharomyces cerevisiae”.

      (3) Minor points: The newly added data indicate that X survivors are generated in a type 2-dependent manner. The authors could discuss how Y elements were eroded while retaining X elements (line 225, Figure 2A).

      Thank this reviewer’s suggestion. We have discussed it in the revised manuscript (p.13 line 244-245). When telomere was deprotected, chromosome end resection took place. Since SY12 only has one Y’-element, it is hard to search homology sequences to repair the Y’-element in XVI-L. When the X-element in XVI-L was exposed by further resection, it is easier to find homology sequences to repair. So, in Type X survivor the Y’-element was eroded while retaining X-element.

      Reviewer #2

      I would like to congratulate the authors for their work and the efforts they put in improving the manuscript. The major criticism I had previously, ie testing the genetic requirements for the survivor subtypes, has been met. Below are a few minor comments that don't necessarily require a response.

      (1) I think the Author response image 6 could have been included in the manuscript. I understand that the authors don't want to overinterpret survivor subtype frequencies, but this figure would have suggested some implication of Rad51 in the emergence of survivors even in the absence of Y' elements. At this stage, however, it is up to the authors, and leaving this figure out is also fine in my opinion.

      According to the suggestion, the author response image 6 has been presented as Figure 6—figure supplement 7.

      (2) Chromosome circularization seems to rely on microhomologies. Previously, the authors proposed that SY14 circularization depended on SSA (Wu et al. 2020), but here, since circularization appears to be Rad52-independent, it is likely to be based on MMEJ rather than SSA (although there are contradictory results on Rad52's role in MMEJ in the literature).

      Yes, we mentioned it in the revised manuscript.

      (3) p. 28 lines 511-513: "The erosion sites and fusion sequences differed from those observed in SY12 tlc1Δ-C1 cells (Figure 2D), suggesting the stochastic nature of chromosomal circularization": I don't think they are necessarily stochastic, because the sequences beyond the telomeres are now modified, the available microhomologies have changed as well.

      We agreed with your opinion. In different chromosomes, there tend to be some hotspots for chromosome fusion. For example, in Figure 6C and 6F the resection site in Chr1 and Chr2 was the same in SY12XYΔ+Y tlc1Δ-C1 and SY12XYΔ tlc1Δ-C1. So, we speculate that there are some hotspots for chromosome fusion, but which site the cell will choose in one round chromosome fusion event is stochastic.

      (4) Typos and other errors:

      • p. 3 line 52: "subtelomerice" and "varies" are mispelled.

      • p. 5 line 78: "processes" should be "process".

      • Supp files are mislabelled (the numbers do not correspond to file name).

      • Supp file 2: how come SY12 has only one Y' element and SY13 has two?

      • p. 10 line 175: "emerging" should be "emergence".

      • p.15 line 276: "counter-selected" should be "being counter-selected" or "counterselection".

      • p. 29 line 523: "the formation of them" should be "their formation".

      • p. 37 line 653: "could have been an ideal tool": the sentence is grammatically incorrect. Writing "AND could have been an ideal tool" is enough to make it structurally correct.

      Thanks for pointing these errors out. We have corrected them in the revised manuscript. For the question “how come SY12 has only one Y' element and SY13 has two?” we were not sure at this moment. We speculated that one of the Y’ might be lost during genetic engineering of the chromosomes by CRISPR–Cas9 system.

      Reviewer #3

      The authors included statistical analyses of the qPCR data (Fig 4B) as requested, but did not comment on the striking difference in expression of MPH3 and HSP32 in the SY12 strain compared to BY4742. An improvement of the manuscript is the inclusion of rad52 tlc1 strains in their analyses, demonstrating that the "atypical and circular survivors" arose independently of homologous recombination. In addition, by analyzing rad51 and rad50 mutant strain they could demonstrate that the "type X" survivors had similar molecular requirements to type II survivors. Overall, the revised submission improves the article.

      We thank the reviewer’s comments and suggestions. The SY12 strain (with three chromosomes) exhibited lower expression levels of both MPH3 and HSP32 compared to the parental strain BY4742 (with 16 chromosomes). We speculated that with the reduced chromosome numbers, the silencing proteins appeared to no longer be titrated by other telomeres that have been deleted. We have added these comments in the revised manuscript.

      Wu, Z.J., Liu, J.C., Man, X., Gu, X., Li, T.Y., Cai, C., He, M.H., Shao, Y., Lu, N., Xue, X., et al. (2020). Cdc13 is predominant over Stn1 and Ten1 in preventing chromosome end fusions. Elife 9.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This valuable study describes a new role of epithelial intercellular adhesion molecule 1 (ICAM-1) protein in controlling bile duct size. The effect is mediated via EBP-50 and subapical actomyosin to regulate size of bile canaliculi. These solid findings have theoretical and practical implications in hepatology and human disorders of bile ducts.

      Public Reviews:

      In this study, Cacho-Navas et al. describe the role of ICAM-1 expressed on the apical membrane of bile canaliculi and its function to control the bile canaliculi (BCs) homeostasis. This is a previously unrecognized function of this protein in hepatocytes. The same authors have previously shown that basolateral ICAM-1 plays a role in controlling lymphocyte adhesion to hepatocytes during inflammation and that this interaction is responsible for the loss of polarity of hepatocytes during disease states.

      This new study shows that ICAM-1 is mainly localized in the apical domain of the BC and in association with EBP-50, communicates with the subapical acto-myosin ring to regulate the size and morphology of the BC. They used the well-known immortal cell line of liver cells (HepG2) in which they deleted ICAM-1 gene by CRISPR-Cas9 editing and hepatic organoids derived from WT and ICAM-1-KO mice. alternating KO as well as rescue experiments. They show that in the absence of apical ICAM-1, the BC become dilated.

      The data sufficiently support the conclusions of the study.

      Recommendations for the authors:

      We would like to thank the editor and reviewer for recognizing the manuscript's value and the solid nature of the data. We are also thankful to them for acknowledging that the manuscript supports the conclusions. Below, we have addressed their commentaries and questions in a point-by-point rebuttal document:

      We have a few suggestions to improve the manuscript:

      (1) HepG2 cells form canaliculi-like structures but are not the ideal system to study the apical basal polarity. On the other hand, hepatic organoids can assume a hepatocyte-like phenotype, when cultured under specific conditions but are not functionally comparable to hepatocytes organized in a 3D structure with a hollow lumen that does not recapitulate the BC physiological structure. Therefore, primary hepatocyte in collagen sandwich would be the best model to study the polarization of BCs and could be isolated from WT and ICAM-1-KO mice, that are available. Some of the major findings should be confirmed in this system.

      We adopted the culture of hepatic organoids as an experimental strategy motivated by the difficulties to culture primary hepatocytes experienced in previous analyses (RegleroReal, Cell Rep, 2014). The generation of organoids or mature hepatocytes from various sources of stem cells is a commonly employed strategy in hepatocyte cell biology (Meyer et al. EMBO Rep, 2023), due to the difficulties in maintaining mature hepatic epithelial cell cultures for longer than a few hours.

      The hepatic organoids we have used in the manuscript are being accepted as advanced cellular strategies for a broad range of fields (Belenguer, Nat Commun, 2022; de Crignis, eLife, 2021; Huch, Cell, 2015). Despite they have some morphological differences with real hepatocytes, we conducted a thorough characterization of their organization identifying canalicular-like structures with functional (CFDA) and molecular (HA-4) markers, which we believe adds value to the manuscript. In addition, the organoid technology has allowed us to import the bipotent precursors to get an permanent source of hepatic cells without the need to import and use the ICAM-1_KO mice, in line with the current guides to reduce animal experimentation.

      Taking this into account and to further validate data obtained with our cellular systems, we carried out a quantification of the canalicular diameter in livers from WT and ICAM1_KO cells (New Figure 8B), which validates our data on human cell lines and organoids. We acknowledge that the data obtained from hepatic tissues cannot rule out the contribution of immune cell adhesion to changes in the hepatocyte architecture. However, these experiments, together with the aforementioned organoids and human cell lines, strongly suggest a role for hepatic ICAM-1 in regulating canalicular size.

      (2) Overexpression of proteins was used in the study. While this approach is an easier means to visualize, without the use of specific antibodies, it is known to alter the distribution of the protein compared to the endogenous one.

      Most of our characterization has been done with antibodies or other fluorescent tools against endogenous proteins localized at BCs: CD59, F-actin, EBP50, MHC, MLC…. In addition, we have included MDR1-GFP and GFP-Rab11, the latter to analyze the subapical compartment (SAC) surrounding BCs. As requested by the reviewer, we now include in a new Supplementary Figure 1C the confocal analyses of endogenous canalicular markers, radixin and MRP2, as well as a new Supplementary Figure 1D containing the staining of an endogenous marker of the SAC, plasmolipin/PLLP (Fraticelli et al, Nat Cell Biol, 2015; Cacho-Navas, Cell Mol Life Sci, 2022), which is consistent with the previous analyses performed with GFP-Rab11.

      (3) In the absence of ICAM-1, BCs change shape and dimension but still show the presence of microvilli. What happens to the distribution of polarized transporters like Mrp2, or the transport of bile acids (CFDA clearance) in vivo in the KO animal?

      Thank you for this comment. We have analyzed this transporter in murine livers and human hepatic cells. MRP2 distribution does not significantly change and is concentrated in BCs also in ICAM-1_KO livers (New Figure 8C). Likewise, ICAM-1 gene edition does not affect MRP2 localization in the polarized human hepatic epithelial cell line in vitro (Supplementary Figure 1C). We cannot rule out changes for this transporter in other murine liver cell types in vivo, such as sinusoidal endothelial cells, which we believe should be further addressed in a different piece of work.

      (4) Does the lack of ICAM-1 affect the cell viability, proliferation or cell size?

      ICAM-1_KO cells proliferate slightly more slowly than their WT counterparts, with no detected changes in cell size and death. We present these data in Supplementary Figure 1, A and B.

      (5) Are the findings recapitulated in the livers of ICAM-1 KO animals?

      ICAM-1 KO animals present enlarged BCs, which is consistent with the main findings of the manuscript (Figure 8B).

      The text needs to be more concise. Some of the concepts, in particular those already published, should be condensed. There is a large amount of experiments that are difficult to connect logically. Possibly, cartoons summarizing the approach of the figure could help the reader.

      The text of Results and Discussion sections has been shortened by almost 100 words, despite the additional panels and experiments are now described and discussed. New cartoons have been added in Figure 5G and Figure 8F, in addition to those previously included in Figure 1 and Supplementary Figure 6, the latter containing a graphical descriptions of the main conclusions.

      Also, more detailed information about statistical analysis (what post-test was used?), concentration of cytokines, and description of the mouse model should be included in the methods.

      Cytokine concentrations have been included in the legend of Figure 3 and in the Cell and Culture section of Methods. A brief description of the ICAM-1_KO mouse and the corresponding reference for further information is also provided in the Organoid Culture section of Methods. A statistical analysis section describing the post-test used is also included at the end of Methods. The references of anti-plasmolipin, anti-radixin and antiMRP2 antibodies, as well as the new fixation methods used for immunofluorescence are also included in the corresponding Antibody List and in the Confocal Microscopy section of Methods, respectively . .

      Figure 3D. Sample names should be added as in the rest of the figures.

      The arrangement of sample names in Figure 3D has been revised and is now similar to that of Figure 3A.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Song, Shi, and Lin use an existing deep learning-based sequence model to derive a score for each haplotype within a genomic region, and then perform association tests between these scores and phenotypes of interest. The authors then perform some downstream analyses (fine-mapping, various enrichment analyses, and building polygenic scores) to ensure that these associations are meaningful. The authors find that their approach allows them to find additional associations, the associations have biologically interpretable enrichments in terms of tissues and pathways, and can slightly improve polygenic scores when combined with standard SNP-based PRS.

      Strengths:

      • I found the central idea of the paper to be conceptually straightforward and an appealing way to use the power of sequence models in an association testing framework.

      • The findings are largely biologically interpretable, and it seems like this could be a promising approach to boost power for some downstream applications.

      Weaknesses:

      • The methods used to generate polygenic scores were difficult to follow. In particular, a fully connected neural network with linear activations predicting a single output should be equivalent to linear regression (all intermediate layers of the network can be collapsed using matrix-multiplication, so the output is just the inner product of the input with some vector). Using the last hidden layer of such a network for downstream tasks should also be equivalent to projecting the input down to a lower dimensional space with some essentially randomly chosen projection. As such, I am surprised that the neural network approach performs so well, and it would be nice if the authors could compare it to other linear approaches (e.g., LASSO or ridge regression for prediction; PCA or an auto-encoder for converting the input to a lower dimensional representation).

      Response: We thank the reviewer for the recognition and valuable suggestion on our work. Just as the reviewer suggested, our polygenic prediction procedure is equivalent to linear transformation and in this revision, we indeed found that it was unnecessary to use neural network framework to replace linear model. Indeed, both our result and previous work indicated that linear model fitted polygenic traits better than non-linear one, which was also the reason we chose linear activation for neural network in the original manuscript.

      In this revision, we followed the reviewer’s suggestion to apply a more straightforward linear framework for polygenic prediction. We first calculated weighted sum of HFS for each block (1,361 independent blocks in total), then, in each target ancestry, we used LASSO regression to integrate them with SNP PRS into one final score. We also conducted comparative analysis in British European test set and found that LASSO, ridge and elastic net gave similar result, and LASSO performed slightly better. By applying this straightforward framework and sliding window strategy, we moderately improved the prediction performance.

      Line 349: “Using height as a representative trait, we first estimated the proportion of variance captured by top loci, and found that HFS of loci with PIP>0.4 (n=5,101) captured roughly 80% of variance explained by all genome-wide loci (n=1,200,024 corresponded to sling-window strategy; Figure 5A). We then calculated HFS+LDAK in non-British European (NBE), South Asian (SAS), East Asian (EAS) and African (AFR) population in UK Biobank, and observed 17.5%, 16.1%, 17.2% and 39.8% improvement over LDAK alone (p=3.21×10-16, 0.0001, 0.002 and 0.001, respectively. Figure 5C).”

      Author response image 1.

      • A very interesting point of the paper was the low R^2 between the HFS scores in adjacent windows, but the explanation of this was unclear to me. Since the HFS scores are just deterministic functions of the SNPs, it feels like if the SNPs are in LD then the HFS scores should be and vice versa. It would be nice to compare the LD between adjacent windows to the average LD of pairs of SNPs from the two windows to see if this is driven by the fact that SNPs are being separated into windows, or if sei is somehow upweighting the importance of SNPs that are less linked to other SNPs (e.g., rare variants).

      Response: We thank the reviewer for the suggestion on understanding LD mechanism. In this revision, we used chromosome 1 as an example and calculate the pairwise LD among all SNPs within two adjacent loci. As shown in Figure S1 (below), although HFS-based LD is still significantly lower than median SNP-based LD (paired Wilcoxon test p=1.76e-5), we found that median SNP LD between loci was still lower than what typically observed between adjacent SNPs in GWAS (histogram of x axis; median =0.06). We reasoned that dividing SNPs into block is one of the reasons that HFS suffer less LD than standard GWAS, but not the whole story.

      Author response image 2.

      We agree with the reviewer that the effect of rare variants could also play an important role. In fact, sei author has also found that rare variants tended to have larger sei-predicted effects. We conducted an approximate analysis that remove all rare variants and repeated HFS calculation. Indeed, here HFS LD has profoundly raised to median=0.14, indicating that involving rare variants was vital for low LD.

      Author response image 3.

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      • There were also a number of robustness checks that would have been good to include in the paper. For instance, do the findings change if the windows are shifted? Do the findings change if the sequence is reverse-complemented?

      Response: Following the reviewer’s suggestion, we conducted a sliding window analysis where all loci were shifted 2048 bp, thereby doubling the total number of loci. In fine-mapping analysis, more than 90% of the causal loci were reproduced in sliding window analysis, either by themselves or by a overlapping locus:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction analysis, sliding window strategy significantly improved prediction accuracy, as we discussed in question 1.

      As for the issue of reverse complement, the nature of sei input layer is to encode both strand in a symmetric manner, such that the output for both strands would be the same. We have also run sei on the reverse complement (generated by seqkit seq -r -p) to verify that original sequence and reverse complement give the same output.

      Response: Following the reviewer’s suggestion, we added a new discussion paragraph on the issue of sequence model performance on interindividual variations. In brief, we suggest that although the drawback of lack of cross-individual training sets exists and future improvement is necessary, chromatin changes could be better predicted than gene expression. This is because the latter task requires information on long range interaction, which varies among genes and are difficult to be captured by using reference genome as training set. We made a schematic to clarify this:

      Author response image 4.

      We also noticed a few recent studies that directly validated sei predictions by experiments and showed significant accuracy, such as https://doi.org/10.1016/j.neuron.2022.12.026. Taken together, while we agreed that it is necessary to improve sequence model by adding more cross-individual training samples, the current SOTA model sei could still provide unique value to our study.

      Line 423: “The challenge of using sequence-based deep learning (DL) models in HFS applications is further compounded by their difficulty in predicting variations between individuals. Recent studies(Huang et al., 2023; Sasse et al., 2023) indicate that DL models, trained on the reference human genome, demonstrate limited accuracy in predicting gene expression levels across different individuals. This limitation is likely due to the models' inability to account for long-range regulatory patterns, which are crucial for understanding the impact of variants on gene expression and vary across genes. In contrast, our study leveraged sequence-determined functional genomic profiles in association studies, which mitigates this issue to an extent. For instance, although sei cannot identify the specific gene regulated by a given input sequence, it can predict changes in the sequence's functional activity. Future improvements in DL models' ability to predict interindividual differences could be achieved by incorporating cross-individual data in the training process. An example of such data is the EN-TEX(Rozowsky et al., 2023) dataset, which aligns functional genomic peaks with the specific individuals and haplotypes they correspond to.”

      Reviewer #2 (Public Review):

      Summary:

      In this work, Song et al. propose a locus-based framework for performing GWAS and related downstream analyses including finemapping and polygenic risk score (PRS) estimation. GWAS are not sufficiently powered to detect phenotype associations with low-frequency variants. To overcome this limitation, the manuscript proposes a method to aggregate variant impacts on chromatin and transcription across a 4096 base pair (bp) loci in the form of a haplotype function score (HFS). At each locus, an association is computed between the HFS and trait. Computing associations at the level of imputed functional genomic scores should enable the integration of information across variants spanning the allele frequency spectrum and bolster the power of GWAS.

      The HFS for each locus is derived from a sequence-based predictive model. Sei. Sei predicts 21,907 chromatin and TF binding tracks, which can be projected onto 40 pre-defined sequence classes ( representing promoters, enhancers, etc.). For each 4096 bp haplotype in their UKB cohort, the proposed method uses the Sei sequence class scores to derive the haplotype function score (HFS). The authors apply their method to 14 polygenic traits, identifying ~16,500 HFS-trait associations. They finemap these trait-associated loci with SuSie, as well as perform target gene/pathway discovery and PRS estimation.

      Strengths:

      Sequence-based deep learning predictors of chromatin status and TF binding have become increasingly accurate over the past few years. Imputing aggregated variant impact using Sei, and then performing an HFS-trait association is, therefore, an interesting approach to bolster power in GWAS discovery. The manuscript demonstrates that associations can be identified at the level of an aggregated functional score. The finemapping and pathway identification analyses suggest that HFS-based associations identify relevant causal pathways and genes from an association study. Identifying associations at the level of functional genomics increases the portability of PRSs across populations. Imputing functional genomic predictions using a sequence-based deep learning model does not suffer from the limitation of TWAS where gene expression is imputed from a limited-size reference panel such as GTEx.

      However, there are several major limitations that need to be addressed.

      Major concerns/weaknesses:

      (1) There is limited characterization of the locus-level associations to SNP-level associations. How does the set of HFS-based associations differ from SNP-level associations?

      Response: We thank the reviewer for the recognition and the valuable suggestion on our manuscript. Following the reviewer’s suggestion, in this revision we added a paragraph to compare the basic characteristics between HFS-based and SNP-based association study. These comparisons suggested that HFS had no advantage in testing marginal association, but performed better in detecting causal associations.

      Line 144: “When comparing HFS association with the standard SNP-based GWAS on the same data, we found that 98% of significant HFS loci also harbored a significant SNP. There were a few cases (n=0~5) where significant HFS loci did not harbored even marginal SNP association (GWAS p>0.01), which were due to the lack of common SNP in these loci. HFS association p value was higher than GWAS p value in 95 % of significant loci, suggested that HFS did not improve power to detect marginal effect. The genomic control inflation factor (λGC) for the HFS association test varied between 0.99 for asthma and 1.50 for height, closely resembling the SNP GWAS (Pearson Correlation Coefficient [PCC]=0.91, paired t-test p=0.16; Method and Figure S3). We concluded that HFS-based association tests had adequate power and do not introduce additional p-value inflation.”

      (2) A clear advantage of performing HFS-trait associations is that the HFS score is imputed by considering variants across the allele frequency spectrum. However, no evidence is provided demonstrating that rare variants contribute to associations derived by the model. Similarly, do the authors find evidence that allelic heterogeneity is leveraged by the HFS-based association model? It would be useful to do simulations here to characterize the model behavior in the presence of trait-associated rare variants.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all rare (MAF<0.01) variants and repeated the HFS analysis (HFScommon) on chromosome 1. In linear association analysis, we found that 10.6% of HFS signals (p<5×10-8) were missed by HFScommon. In fine-mapping, 55.3% of HFS causal signals (PIP>0.95) were missed by HFScommon. We concluded that rare variants played an important role in the performance of HFS, especially its advantages in fine-mapping.

      Line 175: “We also found that rare variants played an important role in the good find-mapping performance of HFS: when variants with MAF<0.01 were removed, 55.3% of the causal signals would be missed in HFS+SUSIE analysis.”

      We then attempted to conduct a simulation analysis where rare variants were causal to the phenotype, and the association statistics were the same as real GWAS of height. However, such simulation seemed not to properly reflect real scenario: no matter how we changed the association between rare variants and the phenotype, HFS association p-value could hardly reached the significance level of SNP association. We proposed that this is because simulation could not properly reflect how variants impact functional genomics: in fact, when randomly selected a rare variant as causal variant, there is high possibility that it had no impact on functional genomics, therefore its HFS would be close to zero. When such a variant was set as causal (which is unlikely in real scenario), HFS would not properly capture the association. We reasoned that it might be difficult to evaluate HFS by simulation, since the nonlinear relation between SNP and HFS as well as among SNPs were difficult to be properly simulated.

      Author response image 5.

      (3) Sei predicts chromatin status / ChIP-seq peaks in the center of a 4kb region. It would therefore be more relevant to predict HFS using overlapping sequence windows that tile the genome as opposed to using non-overlapping windows for computing HFS scores. Specifically, in line 482, the authors state that "the HFS score represents overall activity of the entire sequence, not only the few bp at the center", but this would not hold given that Sei is predicting activity at the center for any sequence.

      Response: We thank the reviewer for the suggestion on sliding window design. In this revision, we shifted all loci 2,048 bp to double the number of loci and repeated the fine-mapping and polygenic prediction analysis. For fine-mapping, we found that the result was generally robust with regard to sliding window procedure, and the majority of the causal associations were retained:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction, sliding window analysis provided a significantly improved performance compared with previous analysis on non-overlapping loci:

      However, since in this revision we have several updates on the polygenic prediction procedure, it was difficult to quantify how much improvement was led by sliding window design. Thus, we directly showed the new result in figure 5 but did not compare it with the original result.

      We also modified the previously imprecise statement to:

      Line 490: “…it integrated information of the entire sequence, not only the few bp at the center.”

      (4) Is the HFS-based association going to miss coding variation and several regulatory variants such as splicing variants? There are also going to be cases where there's an association driven by a variant that is correlated with a Sei prediction in a neighboring window. These would represent false positives for the method, it would be useful to identify or characterize these cases.

      Response: As the reviewer suggested, sei captured only functional genomic features and is by nature prone not to perform well when the causal variants impact protein sequences. In this revision, we characterized this by focusing on causal exonic variants (SNP PIP>0.95):

      Line 322: “On the other hand, HFS perform worse than SNP-based fine-mapping on exonic regions. Taking height as an example, PolyFun detected 125 causal SNPs (PIP>0.95) in the exonic regions, but only 16% (20) of loci that harbored them also reached PIP>0. 5 (11 reached PIP>0.95) in HFS+SUSIE analysis. Among the 105 loci that missed such signals (HFS PIP<0.5), 12 had a nearby locus (within 10kb) showing HFS PIP>0.95, which likely reflected false positive led by LD. Thus, SNP-based analysis should be prioritized over HFS in coding regions.”

      Additional minor concerns:

      (1) It's not clear whether SuSie-based finemapping is appropriate at the locus level, when there is limited LD between neighboring HFS bins. How does the choice of the number of causal loci and the size of the segment being finemapped affect the results and is SuSie a good fit in this scenario?

      Response: Following the reviewer’s suggestion, we reran SUSIE under different predefined causal loci number (from 2 to 10), and found that the identified causal loci were consistent.

      Author response image 6.

      Line 211: “Besides, HFS+SUSIE was also robust when the predefined number of causal loci (L=2 to 10) was changed, and the number of detected loci were not changed.”

      As for the size of segmentation, we divided the predefined segmentations (independent blocks detected by LDetect) into two half and reran SUSIE, and found that three additional causal loci emerged in one half. This suggested that using too small segmentation might increase the false positive rate. However, since there is no LD between independent blocks (which was guaranteed by LDetect), it is not necessary to use even longer blocks.

      Author response image 7.

      Line 133: “Simulation analysis revealed that when a non-reference sequence class score was associated the trait, reference class score could still capture median 70% of HFS-trait association R2.”

      (2) It is not clear how a single score is chosen from the 117 values predicted by Sei for each locus. SuSie is run assuming a single causal signal per locus, an assumption which may not hold at ~4kb resolution (several classes could be associated with the trait of interest). It's not clear whether SuSie, run in this parameter setting, is a good choice for variable selection here.

      Response: As we discussed below (question 3), in this revision we no longer applied SUSIE to find one sequence class score for each locus due to the impact of overfitting, and use the reference sequence class uniformly for all loci. As reviewer suggested, we applied simulation to evaluate how this procedure influence HFS performance, especially when multiple sequence class of the same locus is causal to the phenotype. We found that reference sequence class score could capture median 69.1% of phenotypic R2 when the causal sequence class is not the reference, and captured median 59.2% of R2 when there was 2~5 non-reference causal class. We concluded that the loss led by skipping sequence class selection is mild, and it is necessary to do so in consideration of the risk of overfitting.

      Author response image 8.

      (3) A single HFS score is being chosen from amongst multiple tracks at each locus independently. Does this require additional multiple-hypothesis correction?

      Response: We agree with the reviewer that choosing the sequence class for each locus represented multiple testing, and with additional experiments we indeed observed some evidences of overfitting of this procedure. Thus, in this revision, we no longer applied the per-locus feature selection procedure, but instead used the sequence class corresponded to the reference (hg38) sequence. Consequently, additional multiple-testing correction is avoided with this procedure. We admitted that such simplification missed certain information, but as mentioned above, such lost is moderate, and is necessary to ensure statistical robustness and reduce false positive. In fact, with such simplification we better controlled the inflation factor of HFS GWAS and got better portability in polygenic prediction.

      (4) The results show that a larger number of loci are identified with HFS-based finemapping & that causal loci are enriched for causal SNPs. However, it is not clear how the number of causal loci should relate to the number of SNPs. It would be really nice to see examples of cases where a previously unresolved association is resolved when using HFS-based GWAS + finemapping.

      Response: In this revision, we did not observe a clear relation between causal loci number and causal gene number. The only trend is that SNP-based fine-mapping seemed to perform better at coding regions, in accordance with the fact that HFS capture functional genomic signals. We also added new interpretations to highlight some examples where HFS resolve previously unresolved association signals. For example,

      Line 287: “Specifically, in 1q32.1 region, HFS+SUSIE identified two loci with PIP>0.9 (Figure 4B). SNP-based association also found significant association in this region, but SNP fine-mapping(Weissbrod et al., 2020) could not resolve this signal and only found seven signals between PIP=0.1 to 0.5.”

      (5) Sequence-based deep learning model predictions can be miscalibrated for insertions and deletions (INDELs) as compared to SNPs. Scaling INDEL predictions would likely improve the downstream modeling.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all indel on chromosome 1 and repeated HFS analysis. Removing indel has indeed increased the number of significant (p<5e-8) association by 9%, but also slightly increased inflation factor (paired wilcox test p=0.0001). In fine mapping analysis, removing indel caused a 4.7% decrement in the number of detected causal association (PIP>0.95). We reasoned that the potential miscalibration on indel has indeed impacted the statistical power of HFS, but the proper approach to control this impact might not be direct and is still await optimizing. In this revision, we still kept all indels in the analysis, since we proposed that the power of fine-mapping is more important than the power of marginal association.

      Line 213: “Lastly, removing insertion and deletion would reveal 9% more significant association (p<5×10-8) but 4.7% less causal association (PIP>0.95), and slightly increased inflation factor (Wilcoxon p=0.0001, Figure S4).”

      Author response image 9.

      Reviewer #1 (Recommendations For The Authors):

      It was unclear to me why the sei output was rounded to two decimal places to "avoid influence of sei prediction noise". Wouldn't rounding introduce additional noise?

      Response: We thank the reviewer for pointing out our inadequate description. The rounding procedure is used to mask the low value that likely did not reflect any real change. The idea is that, even if a variant actually does not bring about any functional changes, sei would still output a very low HFS value that is not equal to, but close to, zero. By rounding procedure, such low values would be set to zero, which could avoid noise. We have added this rationale to the method section:

      Line 529: “This is due to the fact that even if a variant actually makes no impact on functional genomics, sei would still output a value that are close to but not equal to reference sequence class score. Rounding procedure would set such HFS to zero and remove the random value from sei.”

      Minor comments / typos:

      • There are many typos in the abstract.

      Response: We have revised the typo and grammar issues in the abstract in this revision.

      • I believe "Arachnoid acid-intelligence" should be "Arachidonic acid-intelligence".

      • Consistently there is no space between text and parenthetical citations. For example, "sei(Chen et al., 2022)" should be "sei (Chen et al., 2022)".

      • Line 110: "at least one non-reference haplotypes" --> "at least one non-reference haplotype".

      • Line 155: "data-based method" --> "data-based methods".

      • Lines 165-166: "functionally importance" --> "functional importance".

      Response: We have made these revisions accordingly.

      • Line 210: the sentence containing "this annotation on conditioned of a set of baseline annotations" is unclear.

      Response: We have revised this sentence as “…regressed the PIP against this annotation, with a set of baseline annotations included as covariates, similar to the LDSC framework.”

      • Line 213: "association" --> "associations".

      • Line 219: "association" --> "associations".

      • Line 251: "result" --> "results".

      • Line 269: "result" --> "results".

      • Line 289: "known to involved" --> "known to be involved".

      • Line 356: "LDAK along" --> "LDAK alone".

      • Line 362: "BOLT-LMM along" --> "BOLT-LMM alone".

      • Supplement: "Hihglighted" --> "Highlighted".

      Response: We have made these revisions accordingly.

      • Line 444: Were "British ancestry Caucasians" defined as individuals that self-identified as "white British"? If so, then they should be described as "self-identified "white British"".

      Response: As the reviewer pointed out, we have changed the description as self-identified British ancestry Caucasians.

      Reviewer #2 (Recommendations For The Authors):

      (1) A 2022 cistrome-wide association study (CWAS) computed associations between genetically-predicted chromatin activity and phenotypes. Adding a reference to this paper would be helpful. https://pubmed.ncbi.nlm.nih.gov/36071171/

      Response: Following the reviewer’s suggestion, we discussed the similarity between CWAS and our study:

      Line 89: “In line with this notion, a recent similar strategy called cistrome-wide association study (CWAS) integrated variant-chromatin activity and variant-phenotype association to boost power of genetic study of cancer. (Baca et al., 2022).”

      (2) Line 487 states: "We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark." It's not clear what normalization is being referred to here.

      Response: We have revised the sentence to:

      Line 495: “We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark (divided each track score by the sum of histone mark score) as suggested by the sei author.”

      (3) The figures are extremely low resolution, they need to be updated.

      Response: In this revision, we uploaded separate pdf file for each figure to provide high resolution graphs.

      (4). The results section was difficult to follow and would benefit from being written more clearly.

      Response: In this revision, we re-arranged some of the result section to better clarify the main idea. We moved all statistical results to the bracket and focused our main text on the interpretation. For example,

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      (5) "Along" is used several times in the final results section (PRS estimation), this should be "alone".

      Response: We have modified all misused “along” by “alone” in this revision.

      (6) Instead of using notation identifying genomic location, it might be clearer to provide gene names when illustrating examples of trait-associated promoters.

      Response: In this revision, we added gene name of the corresponding promoters to the main text to better clarify the findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Comments

      Reviewer 1

      (1) Despite the well-established role of Netrin-1 and UNC5C axon guidance during embryonic commissural axons, it remains unclear which cell type(s) express Netrin-1 or UNC5C in the dopaminergic axons and their targets. For instance, the data in Figure 1F-G and Figure 2 are quite confusing. Does Netrin-1 or UNC5C express in all cell types or only dopamine-positive neurons in these two mouse models? It will also be important to provide quantitative assessments of UNC5C expression in dopaminergic axons at different ages.

      Netrin-1 is a secreted protein and in this manuscript we did not examine what cell types express Netrin-1. This question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present. As per the reviewer’s request we include below images showing Netrin-1 protein and Netrin-1 mRNA expression in the forebrain. In Figure 1 below, we show a high magnification immunofluorescent image of a coronal forebrain section showing Netrin-1 protein expression.

      Author response image 1.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      In Figures 2 and 3 below we show low and high magnification images from an RNAscope experiment confirming that cells in the forebrain regions examined express Netrin-1 mRNA.

      Author response image 2.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, fmi: forceps minor of the corpus callosum, IL: Infralimbic Cortex, PrL: Prelimbic Cortex

      Author response image 3.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      Regarding UNC5c, this receptor homologue is expressed by dopamine neurons in the rodent ventral tegmental area (Daubaras et al., 2014; Manitt et al., 2010; Phillips et al., 2022). This does not preclude UNC5c expression in other cell types. UNC5c receptors are ubiquitously expressed in the brain throughout development, performing many different developmental functions (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). In this study we are interested in UNC5c expression by dopamine neurons, and particularly by their axons projecting to the nucleus accumbens. We therefore used immunofluorescent staining in the nucleus accumbens, showing UNC5 expression in TH+ axons. This work adds to the study by Manitt et al., 2010, which examined UNC5 expression in the VTA. Manitt et al. used Western blotting to demonstrate that UNC5 expression in VTA dopamine neurons increases during adolescence, as can be seen in the following figure:

       References:
      

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.20110.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (2) Figure 1 used shRNA to knockdown Netrin-1 in the Septum and these mice were subjected to behavioral testing. These results, again, are not supported by any valid data that the knockdown approach actually worked in dopaminergic axons. It is also unclear whether knocking down Netrin-1 in the septum will re-route dopaminergic axons or lead to cell death in the dopaminergic neurons in the substantia nigra pars compacta?

      First we want to clarify and emphasize, that our knockdown approach was not designed to knock down Netrin-1 in dopamine neurons or their axons. Our goal was to knock down Netrin-1 expression in cells expressing this guidance cue gene in the dorsal peduncular cortex.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      We agree that our experiments do not address the fate of the dopamine axons that are misrouted away from the medial prefrontal cortex. This research is ongoing, and we have now added a note regarding this to our manuscript.

      Our current hypothesis, based on experiments being conducted as part of another line of research in the lab, is that these axons are rerouted to a different brain region which they then ectopically innervate. In these experiments we are finding that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (3) Another issue with Figure1J. It is unclear whether the viruses were injected into a WT mouse model or into a Cre-mouse model driven by a promoter specifically expresses in dorsal peduncular cortex? The authors should provide evidence that Netrin-1 mRNA and proteins are indeed significantly reduced. The authors should address the anatomic results of the area of virus diffusion to confirm the virus specifically infected the cells in dorsal peduncular cortex.

      All the virus knockdown experiments were conducted in wild type mice, we added this information to Figure 1k.

      The efficacy of the shRNA in knocking down Netrin-1 was demonstrated by Cuesta et al. (2020) both in vitro and in vivo, as we show in our response to the reviewer’s previous comment above.

      We also now provide anatomical images demonstrating the localization of the injection and area of virus diffusion in the mouse forebrain. In Author response image 4 below the area of virus diffusion is visible as green fluorescent signal.

      Author response image 4.

      Fluorescent microscopy image of a mouse forebrain demonstrating the localization of the injection of a virus to knock down Netrin-1. The location of the virus is in green, while cell nuclei are in blue (DAPI). Abbreviations: DP: dorsopeduncular cortex IL: infralimbic cortex

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (4) The authors need to provide information regarding the efficiency and duration of knocking down. For instance, in Figure 1K, the mice were tested after 53 days post injection, can the virus activity in the brain last for such a long time?

      In our study we are interested in the role of Netrin-1 expression in the guidance of dopamine axons from the nucleus accumbens to the medial prefrontal cortex. The critical window for these axons leaving the nucleus accumbens and growing to the cortex is early adolescence (Reynolds et al., 2018b). This is why we injected the virus at the onset of adolescence, at postnatal day 21. As dopamine axons grow from the nucleus accumbens to the prefrontal cortex, they pass through the dorsal peduncular cortex. We disrupted Netrin-1 expression at this point along their route to determine whether it is the Netrin-1 present along their route that guides these axons to the prefrontal cortex. We hypothesized that the shRNA Netrin-1 virus would disrupt the growth of the dopamine axons, reducing the number of axons that reach the prefrontal cortex and therefore the number of axons that innervate this region in adulthood.

      We conducted our behavioural tests during adulthood, after the critical window during which dopamine axon growth occurs, so as to observe the enduring behavioral consequences of this misrouting. This experimental approach is designed for the shRNa Netrin-1 virus to be expressed in cells in the dorsopeduncular cortex when the dopamine axons are growing, during adolescence.

       References:
      

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      (5) In Figure 1N-Q, silencing Netrin-1 results in less DA axons targeting to infralimbic cortex, but why the Netrin-1 knocking down mice revealed the improved behavior?

      This is indeed an intriguing finding, and we have now added a mention of it to our manuscript. We have demonstrated that misrouting dopamine axons away from the medial prefrontal cortex during adolescence alters behaviour, but why this improves their action impulsivity ability is something currently unknown to us. One potential answer is that the dopamine axons are misrouted to a different brain region that is also involved in controlling impulsive behaviour, perhaps the dorsal striatum (Kim and Im, 2019) or the orbital prefrontal cortex (Jonker et al., 2015).

      We would also like to note that we are finding that other manipulations that appear to reroute dopamine axons to unintended targets can lead to reduced action impulsivity as measured using the Go No Go task. As we mentioned above, current experiments in the lab, which are part of a different line of research, are showing that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood, but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro2014-0043 Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      (6) What is the effect of knocking down UNC5C on dopamine axons guidance to the cortex?

      We have found that mice that are heterozygous for a nonsense Unc5c mutation, and as a result have reduced levels of UNC5c protein, show reduced amphetamine-induced locomotion and stereotypy (Auger et al., 2013). In the same manuscript we show that this effect only emerges during adolescence, in concert with the growth of dopamine axons to the prefrontal cortex. This is indirect but strong evidence that UNC5c receptors are necessary for correct adolescent dopamine axon development.

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (7) In Figures 2-4, the authors only showed the amount of DA axons and UNC5C in NAcc. However, it remains unclear whether these experiments also impact the projections of dopaminergic axons to other brain regions, critical for the behavioral phenotypes. What about other brain regions such as prefrontal cortex? Do the projection of DA axons and UNC5c level in cortex have similar pattern to those in NAcc?

      UNC5c receptors are expressed throughout development and are involved in many developmental processes (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). We cannot say whether the pattern we observe here is unique to the nucleus accumbens, but it is certainly not universal throughout the brain.

      The brain region we focus on in our manuscript, in addition to the nucleus accumbens, is the medial prefrontal cortex. Close and thorough examination of the prefrontal cortices of adult mice revealed practically no UNC5c expression by dopamine axons. However, we did observe very rare cases of dopamine axons expressing UNC5c. It is not clear whether these rare cases are present before or during adolescence.

      Below is a representative set of images of this observation, which is now also included as Supplementary Figure 4:

      Author response image 5.

      Expression of UNC5c protein in the medial prefrontal cortex of an adult male mouse. Low (A) and high (B) magnification images demonstrate that there is little UNC5c expression in dopamine axons in the medial prefrontal cortex. Here we identify dopamine axons by immunofluorescent staining for tyrosine hydroxylase (TH, see our response to comment #9 regarding the specificity of the TH antibody for dopamine axons in the prefrontal cortex). This figure is also included as Supplementary Figure 4 in the manuscript. Abbreviations: fmi: forceps minor of the corpus callosum, mPFC: medial prefrontal cortex.

      References:

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254- 10.20110.2011

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (8) Can overexpression of UNC5c or Netrin-1 in male winter hamsters mimic the observations in summer hamsters? Or overexpression of UNC5c in female summer hamsters to mimic the winter hamster? This would be helpful to confirm the causal role of UNC5C in guiding DA axons during adolescence.

      This is an excellent question. We are very interested in both increasing and decreasing UNC5c expression in hamster dopamine axons to see if we can directly manipulate summer hamsters into winter hamsters and vice versa. We are currently exploring virus-based approaches to design these experiments and are excited for results in this area.

      (9) The entire study relied on using tyrosine hydroxylase (TH) as a marker for dopaminergic axons. However, the expression of TH (either by IHC or IF) can be influenced by other environmental factors, that could alter the expression of TH at the cellular level.

      This is an excellent point that we now carefully address in our methods by adding the following:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      Furthermore, we are not aware of any other processes in the forebrain that are known to be immunopositive for TH under any environmental conditions.

      To reduce confusion, we have replaced the abbreviation for dopamine – DA – with TH in the relevant panels in Figures 1, 2, 3, and 4 to clarify exactly what is represented in these images. As can be seen in these images, fluorescent green labelling is present only in axons, which is to be expected of dopamine labelling in these forebrain regions.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (10) Are Netrin-1/UNC5C the only signal guiding dopamine axon during adolescence? Are there other neuronal circuits involved in this process?

      Our intention for this study was to examine the role of Netrin-1 and its receptor UNC5C specifically, but we do not suggest that they are the only molecules to play a role. The process of guiding growing dopamine axons during adolescence is likely complex and we expect other guidance mechanisms to also be involved. From our previous work we know that the Netrin-1 receptor DCC is critical in this process (Hoops and Flores, 2017; Reynolds et al., 2023). Several other molecules have been identified in Netrin-1/DCC signaling processes that control corpus callosum development and there is every possibility that the same or similar molecules may be important in guiding dopamine axons (Schlienger et al., 2023).

      References:

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      (11) Finally, despite the authors' claim that the dopaminergic axon project is sensitive to the duration of daylight in the hamster, they never provided definitive evidence to support this hypothesis.

      By “definitive evidence” we think that the reviewer is requesting a single statistical model including measures from both the summer and winter groups. Such a model would provide a probability estimate of whether dopamine axon growth is sensitive to daylight duration. Therefore, we ran these models, one for male hamsters and one for female hamsters.

      In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      Reviewer 3

      (1) Fig 1 A and B don't appear to be the same section level.

      The reviewer is correct that Fig 1B is anterior to Fig 1A. We have changed Figure 1A to match the section level of Figure 1B.

      (2) Fig 1C. It is not clear that these axons are crossing from the shell of the NAC.

      We have added a dashed line to Figure 1C to highlight the boundary of the nucleus accumbens, which hopefully emphasizes that there are fibres crossing the boundary. We also include here an enlarged image of this panel:

      Author response image 6.

      An enlarged image of Figure1c in the manuscript. The nucleus accumbens (left of the dotted line) is densely packed with TH+ axons (in green). Some of these TH+ axons can be observed extending from the nucleus accumbens medially towards a region containing dorsally oriented TH+ fibres (white arrows).

      (3) Fig 1. Measuring width of the bundle is an odd way to measure DA axon numbers. First the width could be changing during adult for various reasons including change in brain size. Second, I wouldn't consider these axons in a traditional bundle. Third, could DA axon counts be provided, rather than these proxy measures.

      With regards to potential changes in brain size, we agree that this could have potentially explained the increased width of the dopamine axon pathway. That is why it was important for us to use stereology to measure the density of dopamine axons within the pathway. If the width increased but no new axons grew along the pathway, we would have seen a decrease in axon density from adolescence to adulthood. Instead, our results show that the density of axons remained constant.

      We agree with the reviewer that the dopamine axons do not form a traditional “bundle”. Therefore, throughout the manuscript we now avoid using the term bundle.

      Although we cannot count every single axon, an accurate estimate of this number can be obtained using stereology, an unbiassed method for efficiently quantifying large, irregularly distributed objects. We used stereology to count TH+ axons in an unbiased subset of the total area occupied by these axons. Unbiased stereology is the gold-standard technique for estimating populations of anatomical objects, such as axons, that are so numerous that it would be impractical or impossible to measure every single one. Here and elsewhere we generally provide results as densities and areas of occupancy (Reynolds et al., 2022). To avoid confusion, we now clarify that we are counting the width of the area that dopamine axons occupy (rather than the dopamine axon “bundle”).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (4) TH in the cortex could also be of noradrenergic origin. This needs to be ruled out to score DA axons

      This is the same comment as Reviewer 1 #9. Please see our response below, which we have also added to our methods:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (5) Netrin staining should be provided with NeuN + DAPI; its not clear these are all cell bodies. An in situ of Netrin would help as well.

      A similar comment was raised by Reviewer 1 in point #1. Please see below the immunofluorescent and RNA scope images showing expression of Netrin-1 protein and mRNA in the forebrain.

      Author response image 7.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      Author response image 8.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). RNAscope was used to generate this image. Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, IL: Infralimbic Cortex, PrL: Prelimbic Cortex, fmi: forceps minor of the corpus callosum

      Author response image 9.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      (6) The Netrin knockdown needs validation. How strong was the knockdown etc?

      This comment was also raised by Reviewer 1 #1.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (7) If the conclusion that knocking down Netrin in cortex decreases DA innervation of the IL, how can that be reconciled with Netrin-Unc repulsion.

      This is an intriguing question and one that we are in the planning stages of addressing with new experiments.

      Although we do not have a mechanistic answered for how a repulsive receptor helps guide these axons, we would like to note that previous indirect evidence from a study by our group also suggests that reducing UNC5c signaling in dopamine axons in adolescence increases dopamine innervation to the prefrontal cortex (Auger et al, 2013).

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (8) The behavioral phenotype in Fig 1 is interesting, but its not clear if its related to DA axons/signaling. IN general, no evidence in this paper is provided for the role of DA in the adolescent behaviors described.

      We agree with the reviewer that the behaviours we describe in adult mice are complex and are likely to involve several neurotransmitter systems. However, there is ample evidence for the role of dopamine signaling in cognitive control behaviours (Bari and Robbins, 2013; Eagle et al., 2008; Ott et al., 2023) and our published work has shown that alterations in the growth of dopamine axons to the prefrontal cortex leads to changes in impulse control as measured via the Go/No-Go task in adulthood (Reynolds et al., 2023, 2018a; Vassilev et al., 2021).

      The other adolescent behaviour we examined was risk-like taking behaviour in male and female hamsters (Figures 4 and 5), as a means of characterizing maturation in this behavior over time. We decided not to use the Go/No-Go task because as far as we know, this has never been employed in Siberian Hamsters and it will be difficult to implement. Instead, we chose the light/dark box paradigm, which requires no training and is ideal for charting behavioural changes over short time periods. Indeed, risk-like taking behavior in rodents and in humans changes from adolescence to adulthood paralleling changes in prefrontal cortex development, including the gradual input of dopamine axons to this region.

      References:

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: cross-species translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439–456. doi:10.1007/s00213-008-1127-6

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      (9) Fig2 - boxes should be drawn on the NAc diagram to indicate sampled regions. Some quantification of Unc5c would be useful. Also, some validation of the Unc5c antibody would be nice.

      The images presented were taken medial to the anterior commissure and we have edited Figure 2 to show this. However, we did not notice any intra-accumbens variation, including between the core and the shell. Therefore, the images are representative of what was observed throughout the entire nucleus accumbens.

      To quantify UNC5c in the accumbens we conducted a Western blot experiment in male mice at different ages. A one-way ANOVA analyzing band intensity (relative to the 15-day-old average band intensity) as the response variable and age as the predictor variable showed a significant effect of age (F=5.615, p=0.01). Posthoc analysis revealed that 15-day-old mice have less UNC5c in the nucleus accumbens compared to 21- and 35-day-old mice.

      Author response image 10.

      The graph depicts the results of a Western blot experiment of UNC5c protein levels in the nucleus accumbens of male mice at postnatal days 15, 21 or 35 and reveals a significant increase in protein levels at the onset adolescence.

      Our methods for this Western blot were as follows: Samples were prepared as previously (Torres-Berrío et al., 2017). Briefly, mice were sacrificed by live decapitation and brains were flash frozen in heptane on dry ice for 10 seconds. Frozen brains were mounted in a cryomicrotome and two 500um sections were collected for the nucleus accumbens, corresponding to plates 14 and 18 of the Paxinos mouse brain atlas. Two tissue core samples were collected per section, one for each side of the brain, using a 15-gauge tissue corer (Fine surgical tools Cat no. NC9128328) and ejected in a microtube on dry ice. The tissue samples were homogenized in 100ul of standard radioimmunoprecipitation assay buffer using a handheld electric tissue homogenizer. The samples were clarified by centrifugation at 4C at a speed of 15000g for 30 minutes. Protein concentration was quantified using a bicinchoninic acid assay kit (Pierce BCA protein assay kit, Cat no.PI23225) and denatured with standard Laemmli buffer for 5 minutes at 70C. 10ug of protein per sample was loaded and run by SDS-PAGE gel electrophoresis in a Mini-PROTEAN system (Bio-Rad) on an 8% acrylamide gel by stacking for 30 minutes at 60V and resolving for 1.5 hours at 130V. The proteins were transferred to a nitrocellulose membrane for 1 hour at 100V in standard transfer buffer on ice. The membranes were blocked using 5% bovine serum albumin dissolved in tris-buffered saline with Tween 20 and probed with primary (UNC5c, Abcam Cat. no ab302924) and HRP-conjugated secondary antibodies for 1 hour. a-tubulin was probed and used as loading control. The probed membranes were resolved using SuperSignal West Pico PLUS chemiluminescent substrate (ThermoFisher Cat no.34579) in a ChemiDoc MP Imaging system (Bio-Rad). Band intensity was quantified using the ChemiDoc software and all ages were normalized to the P15 age group average.

      Validation of the UNC5c antibody was performed in the lab of Dr. Liu, from whom it was kindly provided. Briefly, in the validation study the authors showed that the anti-UNC5C antibody can detect endogenous UNC5C expression and the level of UNC5C is dramatically reduced after UNC5C knockdown. The antibody can also detect the tagged-UNC5C protein in several cell lines, which was confirmed by a tag antibody (Purohit et al., 2012; Shao et al., 2017).

      References:

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      (10) "In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, and reduction in UNC5C expression appears to cause growth of mesolimbic dopamine axons to the prefrontal cortex".....This is confusing. Figure 2 shows a developmental increase in UNc5c not a decrease. So when is the "reduction in Unc5c expression" occurring?

      We apologize for the mistake in this sentence. We have corrected the relevant passage in our manuscript as follows:

      In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, particularly when mesolimbic and mesocortical dopamine projections segregate in the nucleus accumbens (Manitt et al., 2010; Reynolds et al., 2018a). In contrast, dopamine axons in the prefrontal cortex do not express UNC5c except in very rare cases (Supplementary Figure 4). In adult male mice with Unc5c haploinsufficiency, there appears to be ectopic growth of mesolimbic dopamine axons to the prefrontal cortex (Auger et al., 2013). This miswiring is associated with alterations in prefrontal cortex-dependent behaviours (Auger et al., 2013).

      References:

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      (11) In Fig 3, a statistical comparison should be made between summer male and winter male, to justify the conclusions that the winter males have delayed DA innervation.

      This analysis was also suggested by Reviewer 1, #11. Here is our response:

      We analyzed the summer and winter data together in ANOVAs separately for males and females. In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      (12) Should axon length also be measured here (Fig 3)? It is not clear why the authors have switched to varicosity density. Also, a box should be drawn in the NAC cartoon to indicate the region that was sampled.

      It is untenable to quantify axon length in the prefrontal cortex as we cannot distinguish independent axons. Rather, they are “tangled”; they twist and turn in a multitude of directions as they make contact with various dendrites. Furthermore, they branch extensively. It would therefore be impossible to accurately quantify the number of axons. Using unbiased stereology to quantify varicosities is a valid, well-characterized and straightforward alternative (Reynolds et al., 2022).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (13) In Fig 3, Unc5c should be quantified to bolster the interesting finding that Unc5c expression dynamics are different between summer and winter hamsters. Unc5c mRNA experiments would also be important to see if similar changes are observed at the transcript level.

      We agree that it would be very interesting to see how UNC5c mRNA and protein levels change over time in summer and winter hamsters, both in males, as the reviewer suggests here, and in females. We are working on conducting these experiments in hamsters as part of a broader expansion of our research in this area. These experiments will require a lengthy amount of time and at this point we feel that they are beyond the scope of this manuscript.

      (14) Fig 4. The peak in exploratory behavior in winter females is counterintuitive and needs to be better discussed. IN general, the light dark behavior seems quite variable.

      This is indeed a very interesting finding, which we have expanded upon in our manuscript as follows:

      When raised under a winter-mimicking daylength, hamsters of either sex show a protracted peak in risk taking. In males, it is delayed beyond 80 days old, but the delay is substantially less in females. This is a counterintuitive finding considering that dopamine development in winter females appears to be accelerated. Our interpretation of this finding is that the timing of the risk-taking peak in females may reflect a balance between different adolescent developmental processes. The fact that dopamine axon growth is accelerated does not imply that all adolescent maturational processes are accelerated. Some may be delayed, for example those that induce axon pruning in the cortex. The timing of the risk-taking peak in winter female hamsters may therefore reflect the amalgamation of developmental processes that are advanced with those that are delayed – producing a behavioural effect that is timed somewhere in the middle. Disentangling the effects of different developmental processes on behaviour will require further experiments in hamsters, including the direct manipulation of dopamine activity in the nucleus accumbens and prefrontal cortex.

      Full Reference List

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: crossspecies translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439– 456. doi:10.1007/s00213-008-1127-6

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro-2014-0043

      Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1-mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      Torres-Berrío A, Lopez JP, Bagot RC, Nouel D, Dal-Bo G, Cuesta S, Zhu L, Manitt C, Eng C, Cooper HM, Storch K-F, Turecki G, Nestler EJ, Flores C. 2017. DCC Confers Susceptibility to Depression-like Behaviors in Humans and Mice and Is Regulated by miR-218. Biological psychiatry 81:306–315. doi:10.1016/j.biopsych.2016.08.017

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      Private Comments

      Reviewer #1

      (12) The language should be improved. Some expression is confusing (line178-179). Also some spelling errors (eg. Figure 1M).

      We have removed the word “Already” to make the sentence in lines 178-179 clearer, however we cannot find a spelling error in Figure 1M or its caption. We have further edited the manuscript for clarity and flow.

      Reviewer #2

      (1) The authors claim to have revealed how the 'timing of adolescence is programmed in the brain'. While their findings certainly shed light on molecular, circuit and behavioral processes that are unique to adolescence, their claim may be an overstatement. I suggest they refine this statement to discuss more specifically the processes they observed in the brain and animal behavior, rather than adolescence itself.

      We agree with the reviewer and have revised the manuscript to specify that we are referring to the timing of specific developmental processes that occur in the adolescent brain, not adolescence overall.

      (2) Along the same lines, the authors should also include a more substantiative discussion of how they selected their ages for investigation (for both mice and hamsters), For mice, their definition of adolescence (P21) is earlier than some (e.g. Spear L.P., Neurosci. and Beh. Reviews, 2000).

      There are certainly differences of opinion between researchers as to the precise definition of adolescence and the period it encompasses. Spear, 2000, provides one excellent discussion of the challenges related to identifying adolescence across species. This work gives specific ages only for rats, not mice (as we use here), and characterizes post-natal days 28-42 as being the conservative age range of “peak” adolescence (page 419, paragraph 1). Immediately thereafter the review states that the full adolescent period is longer than this, and it could encompass post-natal days 20-55 (page 419, paragraph 2).

      We have added the following statement to our methods:

      There is no universally accepted way to define the precise onset of adolescence. Therefore, there is no clear-cut boundary to define adolescent onset in rodents (Spear, 2000). Puberty can be more sharply defined, and puberty and adolescence overlap in time, but the terms are not interchangeable. Puberty is the onset of sexual maturation, while adolescence is a more diffuse period marked by the gradual transition from a juvenile state to independence. We, and others, suggest that adolescence in rodents spans from weaning (postnatal day 21) until adulthood, which we take to start on postnatal day 60 (Reynolds and Flores, 2021). We refer to “early adolescence” as the first two weeks postweaning (postnatal days 21-34). These ranges encompass discrete DA developmental periods (Kalsbeek et al., 1988; Manitt et al., 2011; Reynolds et al., 2018a), vulnerability to drug effects on DA circuitry (Hammerslag and Gulley, 2014; Reynolds et al., 2018a), and distinct behavioral characteristics (Adriani and Laviola, 2004; Makinodan et al., 2012; Schneider, 2013; Wheeler et al., 2013).

      References:

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette MP, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. Doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

      (3) Figure 1 - the conclusions hinge on the Netrin-1 staining, as shown in panel G, but the cells are difficult to see. It would be helpful to provide clearer, more zoomed images so readers can better assess the staining. Since Netrin-1 expression reduces dramatically after P4 and they had to use antigen retrieval to see signal, it would be helpful to show some images from additional brain regions and ages to see if expression levels follow predicted patterns. For instance, based on the allen brain atlas, it seems that around P21, there should be high levels of Netrin-1 in the cerebellum, but low levels in the cortex. These would be nice controls to demonstrate the specificity and sensitivity of the antibody in older tissue.

      We do not study the cerebellum and have never stained this region; doing so now would require generating additional tissue and we’re not sure it would add enough to the information provided to be worthwhile. Note that we have stained the forebrain for Netrin-1 previously, providing broad staining of many brain regions (Manitt et al., 2011)

      References:

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      (4) Figure 3 - Because mice tend to avoid brightly-lit spaces, the light/dark box is more commonly used as a measure of anxiety-like behavior than purely exploratory behavior (including in the paper they cited). It is important to address this possibility in their discussion of their findings. To bolster their conclusions about the coincidence of circuit and behavioral changes in adolescent hamsters, it would be useful to add an additional measure of exploratory behaviors (e.g. hole board).

      Regarding the light/dark box test, this is an excellent point. We prefer the term “risk taking” to “anxiety-like” and now use the former term in our manuscript. Furthermore, our interest in the behaviour is purely to chart the development of adolescent behaviour across our treatment groups, not to study a particular emotional state. Regardless of the specific emotion or emotions governing the light/dark box behaviour, it is an ideal test for charting adolescent shifts in behaviour as it is well-characterized in this respect, as we discuss in our manuscript.

      (5) Supplementary Figure 4,5 The authors defined puberty onset using uterine and testes weights in hamsters. While the weights appear to be different for summer and winter hamsters, there were no statistical comparison. Please add statistical analyses to bolster claims about puberty start times. Also, as many studies use vaginal opening to define puberty onset, it would be helpful to discuss how these measurements typically align and cite relevant literature that described use of uterine weights. Also, Supplementary Figures 4 and 5 were mis-cited as Supp. Fig. 2 in the text (e.g. line 317 and others).

      These are great suggestions. We have added statistical analyses to Supplementary Figures 5 and 6 and provided Vaginal Opening data as Supplementary Figure 7. The statistical analyses confirm that all three characters are delayed in winter hamsters compared to summer hamsters.

      We have also added the following references to the manuscript:

      Darrow JM, Davis FC, Elliott JA, Stetson MH, Turek FW, Menaker M. 1980. Influence of Photoperiod on Reproductive Development in the Golden Hamster. Biol Reprod 22:443–450. doi:10.1095/biolreprod22.3.443

      Ebling FJP. 1994. Photoperiodic Differences during Development in the Dwarf Hamsters Phodopus sungorus and Phodopus campbelli. Gen Comp Endocrinol 95:475–482. doi:10.1006/gcen.1994.1147

      Timonin ME, Place NJ, Wanderi E, Wynne-Edwards KE. 2006. Phodopus campbelli detect reduced photoperiod during development but, unlike Phodopus sungorus, retain functional reproductive physiology. Reproduction 132:661–670. doi:10.1530/rep.1.00019

      (6) The font in many figure panels is small and hard to read (e.g. 1A,D,E,H,I,L...). Please increase the size for legibility.

      We have increased the font size of our figure text throughout the manuscript.

      Reviewer #3

      (15) Fig 1 C,D. Clarify the units of the y axis

      We have now fixed this.

      Full Reference List

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625 Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

    2. Author Response

      The following is the authors’ response to the original reviews.

      Public Comments

      Reviewer 1

      (1) Despite the well-established role of Netrin-1 and UNC5C axon guidance during embryonic commissural axons, it remains unclear which cell type(s) express Netrin-1 or UNC5C in the dopaminergic axons and their targets. For instance, the data in Figure 1F-G and Figure 2 are quite confusing. Does Netrin-1 or UNC5C express in all cell types or only dopamine-positive neurons in these two mouse models? It will also be important to provide quantitative assessments of UNC5C expression in dopaminergic axons at different ages.

      Netrin-1 is a secreted protein and in this manuscript we did not examine what cell types express Netrin-1. This question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present. As per the reviewer’s request we include below images showing Netrin-1 protein and Netrin-1 mRNA expression in the forebrain. In Figure 1 below, we show a high magnification immunofluorescent image of a coronal forebrain section showing Netrin-1 protein expression.

      Author response image 1.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      In Author response images 2 and 3 below we show low and high magnification images from an RNAscope experiment confirming that cells in the forebrain regions examined express Netrin-1 mRNA.

      Author response image 2.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, fmi: forceps minor of the corpus callosum, IL: Infralimbic Cortex, PrL: Prelimbic Cortex

      Author response image 3.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      Regarding UNC5c, this receptor homologue is expressed by dopamine neurons in the rodent ventral tegmental area (Daubaras et al., 2014; Manitt et al., 2010; Phillips et al., 2022). This does not preclude UNC5c expression in other cell types. UNC5c receptors are ubiquitously expressed in the brain throughout development, performing many different developmental functions (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). In this study we are interested in UNC5c expression by dopamine neurons, and particularly by their axons projecting to the nucleus accumbens. We therefore used immunofluorescent staining in the nucleus accumbens, showing UNC5 expression in TH+ axons. This work adds to the study by Manitt et al., 2010, which examined UNC5 expression in the VTA. Manitt et al. used Western blotting to demonstrate that UNC5 expression in VTA dopamine neurons increases during adolescence, as can be seen in the following figure:

      References:

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.20110.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (2) Figure 1 used shRNA to knockdown Netrin-1 in the Septum and these mice were subjected to behavioral testing. These results, again, are not supported by any valid data that the knockdown approach actually worked in dopaminergic axons. It is also unclear whether knocking down Netrin-1 in the septum will re-route dopaminergic axons or lead to cell death in the dopaminergic neurons in the substantia nigra pars compacta?

      First we want to clarify and emphasize, that our knockdown approach was not designed to knock down Netrin-1 in dopamine neurons or their axons. Our goal was to knock down Netrin-1 expression in cells expressing this guidance cue gene in the dorsal peduncular cortex.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      We agree that our experiments do not address the fate of the dopamine axons that are misrouted away from the medial prefrontal cortex. This research is ongoing, and we have now added a note regarding this to our manuscript.

      Our current hypothesis, based on experiments being conducted as part of another line of research in the lab, is that these axons are rerouted to a different brain region which they then ectopically innervate. In these experiments we are finding that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (3) Another issue with Figure1J. It is unclear whether the viruses were injected into a WT mouse model or into a Cre-mouse model driven by a promoter specifically expresses in dorsal peduncular cortex? The authors should provide evidence that Netrin-1 mRNA and proteins are indeed significantly reduced. The authors should address the anatomic results of the area of virus diffusion to confirm the virus specifically infected the cells in dorsal peduncular cortex.

      All the virus knockdown experiments were conducted in wild type mice, we added this information to Figure 1k.

      The efficacy of the shRNA in knocking down Netrin-1 was demonstrated by Cuesta et al. (2020) both in vitro and in vivo, as we show in our response to the reviewer’s previous comment above.

      We also now provide anatomical images demonstrating the localization of the injection and area of virus diffusion in the mouse forebrain. In Author response image 4 below the area of virus diffusion is visible as green fluorescent signal.

      Author response image 4.

      Fluorescent microscopy image of a mouse forebrain demonstrating the localization of the injection of a virus to knock down Netrin-1. The location of the virus is in green, while cell nuclei are in blue (DAPI). Abbreviations: DP: dorsopeduncular cortex IL: infralimbic cortex

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (4) The authors need to provide information regarding the efficiency and duration of knocking down. For instance, in Figure 1K, the mice were tested after 53 days post injection, can the virus activity in the brain last for such a long time?

      In our study we are interested in the role of Netrin-1 expression in the guidance of dopamine axons from the nucleus accumbens to the medial prefrontal cortex. The critical window for these axons leaving the nucleus accumbens and growing to the cortex is early adolescence (Reynolds et al., 2018b). This is why we injected the virus at the onset of adolescence, at postnatal day 21. As dopamine axons grow from the nucleus accumbens to the prefrontal cortex, they pass through the dorsal peduncular cortex. We disrupted Netrin-1 expression at this point along their route to determine whether it is the Netrin-1 present along their route that guides these axons to the prefrontal cortex. We hypothesized that the shRNA Netrin-1 virus would disrupt the growth of the dopamine axons, reducing the number of axons that reach the prefrontal cortex and therefore the number of axons that innervate this region in adulthood.

      We conducted our behavioural tests during adulthood, after the critical window during which dopamine axon growth occurs, so as to observe the enduring behavioral consequences of this misrouting. This experimental approach is designed for the shRNa Netrin-1 virus to be expressed in cells in the dorsopeduncular cortex when the dopamine axons are growing, during adolescence.

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      (5) In Figure 1N-Q, silencing Netrin-1 results in less DA axons targeting to infralimbic cortex, but why the Netrin-1 knocking down mice revealed the improved behavior?

      This is indeed an intriguing finding, and we have now added a mention of it to our manuscript. We have demonstrated that misrouting dopamine axons away from the medial prefrontal cortex during adolescence alters behaviour, but why this improves their action impulsivity ability is something currently unknown to us. One potential answer is that the dopamine axons are misrouted to a different brain region that is also involved in controlling impulsive behaviour, perhaps the dorsal striatum (Kim and Im, 2019) or the orbital prefrontal cortex (Jonker et al., 2015).

      We would also like to note that we are finding that other manipulations that appear to reroute dopamine axons to unintended targets can lead to reduced action impulsivity as measured using the Go No Go task. As we mentioned above, current experiments in the lab, which are part of a different line of research, are showing that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood, but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro2014-0043 Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      (6) What is the effect of knocking down UNC5C on dopamine axons guidance to the cortex?

      We have found that mice that are heterozygous for a nonsense Unc5c mutation, and as a result have reduced levels of UNC5c protein, show reduced amphetamine-induced locomotion and stereotypy (Auger et al., 2013). In the same manuscript we show that this effect only emerges during adolescence, in concert with the growth of dopamine axons to the prefrontal cortex. This is indirect but strong evidence that UNC5c receptors are necessary for correct adolescent dopamine axon development.

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (7) In Figures 2-4, the authors only showed the amount of DA axons and UNC5C in NAcc. However, it remains unclear whether these experiments also impact the projections of dopaminergic axons to other brain regions, critical for the behavioral phenotypes. What about other brain regions such as prefrontal cortex? Do the projection of DA axons and UNC5c level in cortex have similar pattern to those in NAcc?

      UNC5c receptors are expressed throughout development and are involved in many developmental processes (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). We cannot say whether the pattern we observe here is unique to the nucleus accumbens, but it is certainly not universal throughout the brain.

      The brain region we focus on in our manuscript, in addition to the nucleus accumbens, is the medial prefrontal cortex. Close and thorough examination of the prefrontal cortices of adult mice revealed practically no UNC5c expression by dopamine axons. However, we did observe very rare cases of dopamine axons expressing UNC5c. It is not clear whether these rare cases are present before or during adolescence.

      Below is a representative set of images of this observation, which is now also included as Supplementary Figure 4:

      Author response image 5.

      Expression of UNC5c protein in the medial prefrontal cortex of an adult male mouse. Low (A) and high (B) magnification images demonstrate that there is little UNC5c expression in dopamine axons in the medial prefrontal cortex. Here we identify dopamine axons by immunofluorescent staining for tyrosine hydroxylase (TH, see our response to comment #9 regarding the specificity of the TH antibody for dopamine axons in the prefrontal cortex). This figure is also included as Supplementary Figure 4 in the manuscript. Abbreviations: fmi: forceps minor of the corpus callosum, mPFC: medial prefrontal cortex.

      References:

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254- 10.20110.2011

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (8) Can overexpression of UNC5c or Netrin-1 in male winter hamsters mimic the observations in summer hamsters? Or overexpression of UNC5c in female summer hamsters to mimic the winter hamster? This would be helpful to confirm the causal role of UNC5C in guiding DA axons during adolescence.

      This is an excellent question. We are very interested in both increasing and decreasing UNC5c expression in hamster dopamine axons to see if we can directly manipulate summer hamsters into winter hamsters and vice versa. We are currently exploring virus-based approaches to design these experiments and are excited for results in this area.

      (9) The entire study relied on using tyrosine hydroxylase (TH) as a marker for dopaminergic axons. However, the expression of TH (either by IHC or IF) can be influenced by other environmental factors, that could alter the expression of TH at the cellular level.

      This is an excellent point that we now carefully address in our methods by adding the following:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      Furthermore, we are not aware of any other processes in the forebrain that are known to be immunopositive for TH under any environmental conditions.

      To reduce confusion, we have replaced the abbreviation for dopamine – DA – with TH in the relevant panels in Figures 1, 2, 3, and 4 to clarify exactly what is represented in these images. As can be seen in these images, fluorescent green labelling is present only in axons, which is to be expected of dopamine labelling in these forebrain regions.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (10) Are Netrin-1/UNC5C the only signal guiding dopamine axon during adolescence? Are there other neuronal circuits involved in this process?

      Our intention for this study was to examine the role of Netrin-1 and its receptor UNC5C specifically, but we do not suggest that they are the only molecules to play a role. The process of guiding growing dopamine axons during adolescence is likely complex and we expect other guidance mechanisms to also be involved. From our previous work we know that the Netrin-1 receptor DCC is critical in this process (Hoops and Flores, 2017; Reynolds et al., 2023). Several other molecules have been identified in Netrin-1/DCC signaling processes that control corpus callosum development and there is every possibility that the same or similar molecules may be important in guiding dopamine axons (Schlienger et al., 2023).

      References:

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      (11) Finally, despite the authors' claim that the dopaminergic axon project is sensitive to the duration of daylight in the hamster, they never provided definitive evidence to support this hypothesis.

      By “definitive evidence” we think that the reviewer is requesting a single statistical model including measures from both the summer and winter groups. Such a model would provide a probability estimate of whether dopamine axon growth is sensitive to daylight duration. Therefore, we ran these models, one for male hamsters and one for female hamsters.

      In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      Reviewer 3

      (1) Fig 1 A and B don't appear to be the same section level.

      The reviewer is correct that Fig 1B is anterior to Fig 1A. We have changed Figure 1A to match the section level of Figure 1B.

      (2) Fig 1C. It is not clear that these axons are crossing from the shell of the NAC.

      We have added a dashed line to Figure 1C to highlight the boundary of the nucleus accumbens, which hopefully emphasizes that there are fibres crossing the boundary. We also include here an enlarged image of this panel:

      Author response image 6.

      An enlarged image of Figure1c in the manuscript. The nucleus accumbens (left of the dotted line) is densely packed with TH+ axons (in green). Some of these TH+ axons can be observed extending from the nucleus accumbens medially towards a region containing dorsally oriented TH+ fibres (white arrows).

      (3) Fig 1. Measuring width of the bundle is an odd way to measure DA axon numbers. First the width could be changing during adult for various reasons including change in brain size. Second, I wouldn't consider these axons in a traditional bundle. Third, could DA axon counts be provided, rather than these proxy measures.

      With regards to potential changes in brain size, we agree that this could have potentially explained the increased width of the dopamine axon pathway. That is why it was important for us to use stereology to measure the density of dopamine axons within the pathway. If the width increased but no new axons grew along the pathway, we would have seen a decrease in axon density from adolescence to adulthood. Instead, our results show that the density of axons remained constant.

      We agree with the reviewer that the dopamine axons do not form a traditional “bundle”. Therefore, throughout the manuscript we now avoid using the term bundle.

      Although we cannot count every single axon, an accurate estimate of this number can be obtained using stereology, an unbiassed method for efficiently quantifying large, irregularly distributed objects. We used stereology to count TH+ axons in an unbiased subset of the total area occupied by these axons. Unbiased stereology is the gold-standard technique for estimating populations of anatomical objects, such as axons, that are so numerous that it would be impractical or impossible to measure every single one. Here and elsewhere we generally provide results as densities and areas of occupancy (Reynolds et al., 2022). To avoid confusion, we now clarify that we are counting the width of the area that dopamine axons occupy (rather than the dopamine axon “bundle”).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (4) TH in the cortex could also be of noradrenergic origin. This needs to be ruled out to score DA axons

      This is the same comment as Reviewer 1 #9. Please see our response below, which we have also added to our methods:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (5) Netrin staining should be provided with NeuN + DAPI; its not clear these are all cell bodies. An in situ of Netrin would help as well.

      A similar comment was raised by Reviewer 1 in point #1. Please see below the immunofluorescent and RNA scope images showing expression of Netrin-1 protein and mRNA in the forebrain.

      Author response image 7.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      Author response image 8.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). RNAscope was used to generate this image. Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, IL: Infralimbic Cortex, PrL: Prelimbic Cortex, fmi: forceps minor of the corpus callosum

      Author response image 9.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      (6) The Netrin knockdown needs validation. How strong was the knockdown etc?

      This comment was also raised by Reviewer 1 #1.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (7) If the conclusion that knocking down Netrin in cortex decreases DA innervation of the IL, how can that be reconciled with Netrin-Unc repulsion.

      This is an intriguing question and one that we are in the planning stages of addressing with new experiments.

      Although we do not have a mechanistic answered for how a repulsive receptor helps guide these axons, we would like to note that previous indirect evidence from a study by our group also suggests that reducing UNC5c signaling in dopamine axons in adolescence increases dopamine innervation to the prefrontal cortex (Auger et al, 2013).

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (8) The behavioral phenotype in Fig 1 is interesting, but its not clear if its related to DA axons/signaling. IN general, no evidence in this paper is provided for the role of DA in the adolescent behaviors described.

      We agree with the reviewer that the behaviours we describe in adult mice are complex and are likely to involve several neurotransmitter systems. However, there is ample evidence for the role of dopamine signaling in cognitive control behaviours (Bari and Robbins, 2013; Eagle et al., 2008; Ott et al., 2023) and our published work has shown that alterations in the growth of dopamine axons to the prefrontal cortex leads to changes in impulse control as measured via the Go/No-Go task in adulthood (Reynolds et al., 2023, 2018a; Vassilev et al., 2021).

      The other adolescent behaviour we examined was risk-like taking behaviour in male and female hamsters (Figures 4 and 5), as a means of characterizing maturation in this behavior over time. We decided not to use the Go/No-Go task because as far as we know, this has never been employed in Siberian Hamsters and it will be difficult to implement. Instead, we chose the light/dark box paradigm, which requires no training and is ideal for charting behavioural changes over short time periods. Indeed, risk-like taking behavior in rodents and in humans changes from adolescence to adulthood paralleling changes in prefrontal cortex development, including the gradual input of dopamine axons to this region.

      References:

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: cross-species translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439–456. doi:10.1007/s00213-008-1127-6

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      (9) Fig2 - boxes should be drawn on the NAc diagram to indicate sampled regions. Some quantification of Unc5c would be useful. Also, some validation of the Unc5c antibody would be nice.

      The images presented were taken medial to the anterior commissure and we have edited Figure 2 to show this. However, we did not notice any intra-accumbens variation, including between the core and the shell. Therefore, the images are representative of what was observed throughout the entire nucleus accumbens.

      To quantify UNC5c in the accumbens we conducted a Western blot experiment in male mice at different ages. A one-way ANOVA analyzing band intensity (relative to the 15-day-old average band intensity) as the response variable and age as the predictor variable showed a significant effect of age (F=5.615, p=0.01). Posthoc analysis revealed that 15-day-old mice have less UNC5c in the nucleus accumbens compared to 21- and 35-day-old mice.

      Author response image 10.

      The graph depicts the results of a Western blot experiment of UNC5c protein levels in the nucleus accumbens of male mice at postnatal days 15, 21 or 35 and reveals a significant increase in protein levels at the onset adolescence.

      Our methods for this Western blot were as follows: Samples were prepared as previously (Torres-Berrío et al., 2017). Briefly, mice were sacrificed by live decapitation and brains were flash frozen in heptane on dry ice for 10 seconds. Frozen brains were mounted in a cryomicrotome and two 500um sections were collected for the nucleus accumbens, corresponding to plates 14 and 18 of the Paxinos mouse brain atlas. Two tissue core samples were collected per section, one for each side of the brain, using a 15-gauge tissue corer (Fine surgical tools Cat no. NC9128328) and ejected in a microtube on dry ice. The tissue samples were homogenized in 100ul of standard radioimmunoprecipitation assay buffer using a handheld electric tissue homogenizer. The samples were clarified by centrifugation at 4C at a speed of 15000g for 30 minutes. Protein concentration was quantified using a bicinchoninic acid assay kit (Pierce BCA protein assay kit, Cat no.PI23225) and denatured with standard Laemmli buffer for 5 minutes at 70C. 10ug of protein per sample was loaded and run by SDS-PAGE gel electrophoresis in a Mini-PROTEAN system (Bio-Rad) on an 8% acrylamide gel by stacking for 30 minutes at 60V and resolving for 1.5 hours at 130V. The proteins were transferred to a nitrocellulose membrane for 1 hour at 100V in standard transfer buffer on ice. The membranes were blocked using 5% bovine serum albumin dissolved in tris-buffered saline with Tween 20 and probed with primary (UNC5c, Abcam Cat. no ab302924) and HRP-conjugated secondary antibodies for 1 hour. a-tubulin was probed and used as loading control. The probed membranes were resolved using SuperSignal West Pico PLUS chemiluminescent substrate (ThermoFisher Cat no.34579) in a ChemiDoc MP Imaging system (Bio-Rad). Band intensity was quantified using the ChemiDoc software and all ages were normalized to the P15 age group average.

      Validation of the UNC5c antibody was performed in the lab of Dr. Liu, from whom it was kindly provided. Briefly, in the validation study the authors showed that the anti-UNC5C antibody can detect endogenous UNC5C expression and the level of UNC5C is dramatically reduced after UNC5C knockdown. The antibody can also detect the tagged-UNC5C protein in several cell lines, which was confirmed by a tag antibody (Purohit et al., 2012; Shao et al., 2017).

      References:

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      (10) "In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, and reduction in UNC5C expression appears to cause growth of mesolimbic dopamine axons to the prefrontal cortex".....This is confusing. Figure 2 shows a developmental increase in UNc5c not a decrease. So when is the "reduction in Unc5c expression" occurring?

      We apologize for the mistake in this sentence. We have corrected the relevant passage in our manuscript as follows:

      In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, particularly when mesolimbic and mesocortical dopamine projections segregate in the nucleus accumbens (Manitt et al., 2010; Reynolds et al., 2018a). In contrast, dopamine axons in the prefrontal cortex do not express UNC5c except in very rare cases (Supplementary Figure 4). In adult male mice with Unc5c haploinsufficiency, there appears to be ectopic growth of mesolimbic dopamine axons to the prefrontal cortex (Auger et al., 2013). This miswiring is associated with alterations in prefrontal cortex-dependent behaviours (Auger et al., 2013).

      References:

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      (11) In Fig 3, a statistical comparison should be made between summer male and winter male, to justify the conclusions that the winter males have delayed DA innervation.

      This analysis was also suggested by Reviewer 1, #11. Here is our response:

      We analyzed the summer and winter data together in ANOVAs separately for males and females. In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      (12) Should axon length also be measured here (Fig 3)? It is not clear why the authors have switched to varicosity density. Also, a box should be drawn in the NAC cartoon to indicate the region that was sampled.

      It is untenable to quantify axon length in the prefrontal cortex as we cannot distinguish independent axons. Rather, they are “tangled”; they twist and turn in a multitude of directions as they make contact with various dendrites. Furthermore, they branch extensively. It would therefore be impossible to accurately quantify the number of axons. Using unbiased stereology to quantify varicosities is a valid, well-characterized and straightforward alternative (Reynolds et al., 2022).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (13) In Fig 3, Unc5c should be quantified to bolster the interesting finding that Unc5c expression dynamics are different between summer and winter hamsters. Unc5c mRNA experiments would also be important to see if similar changes are observed at the transcript level.

      We agree that it would be very interesting to see how UNC5c mRNA and protein levels change over time in summer and winter hamsters, both in males, as the reviewer suggests here, and in females. We are working on conducting these experiments in hamsters as part of a broader expansion of our research in this area. These experiments will require a lengthy amount of time and at this point we feel that they are beyond the scope of this manuscript.

      (14) Fig 4. The peak in exploratory behavior in winter females is counterintuitive and needs to be better discussed. IN general, the light dark behavior seems quite variable.

      This is indeed a very interesting finding, which we have expanded upon in our manuscript as follows:

      When raised under a winter-mimicking daylength, hamsters of either sex show a protracted peak in risk taking. In males, it is delayed beyond 80 days old, but the delay is substantially less in females. This is a counterintuitive finding considering that dopamine development in winter females appears to be accelerated. Our interpretation of this finding is that the timing of the risk-taking peak in females may reflect a balance between different adolescent developmental processes. The fact that dopamine axon growth is accelerated does not imply that all adolescent maturational processes are accelerated. Some may be delayed, for example those that induce axon pruning in the cortex. The timing of the risk-taking peak in winter female hamsters may therefore reflect the amalgamation of developmental processes that are advanced with those that are delayed – producing a behavioural effect that is timed somewhere in the middle. Disentangling the effects of different developmental processes on behaviour will require further experiments in hamsters, including the direct manipulation of dopamine activity in the nucleus accumbens and prefrontal cortex.

      Full Reference List

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: crossspecies translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439– 456. doi:10.1007/s00213-008-1127-6

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro-2014-0043

      Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1-mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      Torres-Berrío A, Lopez JP, Bagot RC, Nouel D, Dal-Bo G, Cuesta S, Zhu L, Manitt C, Eng C, Cooper HM, Storch K-F, Turecki G, Nestler EJ, Flores C. 2017. DCC Confers Susceptibility to Depression-like Behaviors in Humans and Mice and Is Regulated by miR-218. Biological psychiatry 81:306–315. doi:10.1016/j.biopsych.2016.08.017

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      Private Comments

      Reviewer #1

      (12) The language should be improved. Some expression is confusing (line178-179). Also some spelling errors (eg. Figure 1M).

      We have removed the word “Already” to make the sentence in lines 178-179 clearer, however we cannot find a spelling error in Figure 1M or its caption. We have further edited the manuscript for clarity and flow.

      Reviewer #2

      (1) The authors claim to have revealed how the 'timing of adolescence is programmed in the brain'. While their findings certainly shed light on molecular, circuit and behavioral processes that are unique to adolescence, their claim may be an overstatement. I suggest they refine this statement to discuss more specifically the processes they observed in the brain and animal behavior, rather than adolescence itself.

      We agree with the reviewer and have revised the manuscript to specify that we are referring to the timing of specific developmental processes that occur in the adolescent brain, not adolescence overall.

      (2) Along the same lines, the authors should also include a more substantiative discussion of how they selected their ages for investigation (for both mice and hamsters), For mice, their definition of adolescence (P21) is earlier than some (e.g. Spear L.P., Neurosci. and Beh. Reviews, 2000).

      There are certainly differences of opinion between researchers as to the precise definition of adolescence and the period it encompasses. Spear, 2000, provides one excellent discussion of the challenges related to identifying adolescence across species. This work gives specific ages only for rats, not mice (as we use here), and characterizes post-natal days 28-42 as being the conservative age range of “peak” adolescence (page 419, paragraph 1). Immediately thereafter the review states that the full adolescent period is longer than this, and it could encompass post-natal days 20-55 (page 419, paragraph 2).

      We have added the following statement to our methods:

      There is no universally accepted way to define the precise onset of adolescence. Therefore, there is no clear-cut boundary to define adolescent onset in rodents (Spear, 2000). Puberty can be more sharply defined, and puberty and adolescence overlap in time, but the terms are not interchangeable. Puberty is the onset of sexual maturation, while adolescence is a more diffuse period marked by the gradual transition from a juvenile state to independence. We, and others, suggest that adolescence in rodents spans from weaning (postnatal day 21) until adulthood, which we take to start on postnatal day 60 (Reynolds and Flores, 2021). We refer to “early adolescence” as the first two weeks postweaning (postnatal days 21-34). These ranges encompass discrete DA developmental periods (Kalsbeek et al., 1988; Manitt et al., 2011; Reynolds et al., 2018a), vulnerability to drug effects on DA circuitry (Hammerslag and Gulley, 2014; Reynolds et al., 2018a), and distinct behavioral characteristics (Adriani and Laviola, 2004; Makinodan et al., 2012; Schneider, 2013; Wheeler et al., 2013).

      References:

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette MP, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. Doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

      (3) Figure 1 - the conclusions hinge on the Netrin-1 staining, as shown in panel G, but the cells are difficult to see. It would be helpful to provide clearer, more zoomed images so readers can better assess the staining. Since Netrin-1 expression reduces dramatically after P4 and they had to use antigen retrieval to see signal, it would be helpful to show some images from additional brain regions and ages to see if expression levels follow predicted patterns. For instance, based on the allen brain atlas, it seems that around P21, there should be high levels of Netrin-1 in the cerebellum, but low levels in the cortex. These would be nice controls to demonstrate the specificity and sensitivity of the antibody in older tissue.

      We do not study the cerebellum and have never stained this region; doing so now would require generating additional tissue and we’re not sure it would add enough to the information provided to be worthwhile. Note that we have stained the forebrain for Netrin-1 previously, providing broad staining of many brain regions (Manitt et al., 2011)

      References:

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      (4) Figure 3 - Because mice tend to avoid brightly-lit spaces, the light/dark box is more commonly used as a measure of anxiety-like behavior than purely exploratory behavior (including in the paper they cited). It is important to address this possibility in their discussion of their findings. To bolster their conclusions about the coincidence of circuit and behavioral changes in adolescent hamsters, it would be useful to add an additional measure of exploratory behaviors (e.g. hole board).

      Regarding the light/dark box test, this is an excellent point. We prefer the term “risk taking” to “anxiety-like” and now use the former term in our manuscript. Furthermore, our interest in the behaviour is purely to chart the development of adolescent behaviour across our treatment groups, not to study a particular emotional state. Regardless of the specific emotion or emotions governing the light/dark box behaviour, it is an ideal test for charting adolescent shifts in behaviour as it is well-characterized in this respect, as we discuss in our manuscript.

      (5) Supplementary Figure 4,5 The authors defined puberty onset using uterine and testes weights in hamsters. While the weights appear to be different for summer and winter hamsters, there were no statistical comparison. Please add statistical analyses to bolster claims about puberty start times. Also, as many studies use vaginal opening to define puberty onset, it would be helpful to discuss how these measurements typically align and cite relevant literature that described use of uterine weights. Also, Supplementary Figures 4 and 5 were mis-cited as Supp. Fig. 2 in the text (e.g. line 317 and others).

      These are great suggestions. We have added statistical analyses to Supplementary Figures 5 and 6 and provided Vaginal Opening data as Supplementary Figure 7. The statistical analyses confirm that all three characters are delayed in winter hamsters compared to summer hamsters.

      We have also added the following references to the manuscript:

      Darrow JM, Davis FC, Elliott JA, Stetson MH, Turek FW, Menaker M. 1980. Influence of Photoperiod on Reproductive Development in the Golden Hamster. Biol Reprod 22:443–450. doi:10.1095/biolreprod22.3.443

      Ebling FJP. 1994. Photoperiodic Differences during Development in the Dwarf Hamsters Phodopus sungorus and Phodopus campbelli. Gen Comp Endocrinol 95:475–482. doi:10.1006/gcen.1994.1147

      Timonin ME, Place NJ, Wanderi E, Wynne-Edwards KE. 2006. Phodopus campbelli detect reduced photoperiod during development but, unlike Phodopus sungorus, retain functional reproductive physiology. Reproduction 132:661–670. doi:10.1530/rep.1.00019

      (6) The font in many figure panels is small and hard to read (e.g. 1A,D,E,H,I,L...). Please increase the size for legibility.

      We have increased the font size of our figure text throughout the manuscript.

      Reviewer #3

      (15) Fig 1 C,D. Clarify the units of the y axis

      We have now fixed this.

      Full Reference List

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625 Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the authors developed an image analysis pipeline to automacally idenfy individual neurons within a populaon of fluorescently tagged neurons. This applicaon is opmized to deal with mul-cell analysis and builds on a previous soware version, developed by the same team, to resolve individual neurons from whole-brain imaging stacks. Using advanced stascal approaches and several heuriscs tailored for C. elegans anatomy, the method successfully idenfies individual neurons with a fairly high accuracy. Thus, while specific to C. elegans, this method can become instrumental for a variety of research direcons such as in-vivo single-cell gene expression analysis and calcium-based neural acvity studies.

      Thank you.

      Reviewer #2 (Public Review):

      The authors succeed in generalizing the pre-alignment procedure for their cell idenficaon method to allow it to work effecvely on data with only small subsets of cells labeled. They convincingly show that their extension accurately idenfies head angle, based on finding auto florescent ssue and looking for a symmetric l/r axis. They demonstrate method works to allow the idenficaon of a parcular subset of neurons. Their approach should be a useful one for researchers wishing to idenfy subsets of head neurons in C. elegans, and the ideas might be useful elsewhere.

      The authors also assess the relave usefulness of several atlases for making identy predicons. They atempt to give some addional general insights on what makes a good atlas, but here insights seem less clear as available data does not allow for experiments that cleanly decouple: 1. the number of examples in the atlas 2. the completeness of the atlas. and 3. the match in strain and imaging modality discussed. In the presented experiments the custom atlas, besides the strain and imaging modality mismatches discussed is also the only complete atlas with more than one example. The neuroPAL atlas, is an imperfect stand in, since a significant fracon of cells could not be idenfied in these data sets, making it a 60/40 mix of Openworm and a hypothecal perfect neuroPAL comparison. This waters down general insights since it is unclear if the performance is driven by strain/imaging modality or these difficules creang a complete neuroPal atlas. The experiments do usefully explore the volume of data needed. Though generalizaon remains to be shown the insight is useful for future atlas building that for the specific (small) set of cells labeled in the experiments 5-10 examples is sufficient to build a accurate atlas.

      The reviewer brings up an interesting point. As the reviewer noted, given the imperfection of the datasets (ours and others’), it is possible that artifacts from incomplete atlases can interfere with the assessment of the performances of different atlases. To address this, as the reviewer suggested, we have searched the literature and found two sets of data that give specific coordinates of identified neurons (both using NeuroPAL). We compared the performance of the atlases derived from these datasets to the strain-specific atlases, and the original conclusion stands. Details are now included in the revised manuscript (Figure 3- figure supplement 2).

      Recommendaons for the authors:

      Reviewer #1 (Recommendaons For The Authors):

      I appreciate the new mosaic analysis (Fig. 3 -figure suppl 2). Please fix the y-axis ck label that I believe should be 0.8 (instead of 0.9).

      We thank the reviewer for spotting the typo. We have fixed the error.

      **Reviewer #2 (Recommendaons For The Authors):

      Though I'm not familiar with the exact quality of GT labels in available neuroPAL data I know increasing volumes of published data is available. Comparison with a complete neuroPAL atlas, and a similar assessment on atlas size as made with the custom atlas would to my mind qualitavely increase the general insights on atlas construcon.

      We thank the reviewer for the insightful suggestion. We have newly constructed several other NeuroPAL atlases by incorporating neuron positional data from two other published data: [Yemini E. et al. NeuroPAL: A Multicolor Atlas for Whole-Brain Neuronal Identification in C. elegans. Cell. 2021 Jan 7;184(1):272-288.e11] and [Skuhersky, M. et al. Toward a more accurate 3D atlas of C. elegans neurons. BMC Bioinformatics 23, 195 (2022)].

      Interestingly, we found that the two new atlases (NP-Yemini and NP-Skuhersky) have significantly different values of PA, LR, DV, and angle relationships for certain cells compared to the OpenWorm and glr-1 atlases. For example, in both the NP atlases, SMDD is labeled as being anterior to AIB, which is the opposite of the SMDD-AIB relationship in the glr-1 atlas.

      Because this relationship (and other similar cases) were missing in our original NeuroPAL atlas (NP-Chaudhary), the addition of these two NeuroPAL datasets to our NeuroPAL atlas dramatically changed the atlas. As a result, incorporating the published data sets into the NeuroPAL atlas (NP-all) actually decreased the average prediction accuracy to 44%, while the average accuracy of original NeuroPAL atlas (NP-Chaudhary) was 57%. The atlas based on the Yemini et al. data alone (NP-Yemini) had 43% accuracy, and the atlas based on the Skuhersky et al. data alone (NP-Skuhersky) had 38% accuracy.

      For the rest of our analysis, we focused on comparing the NeuroPAL atlas that resulted in the highest accuracy against other atlases in figure 3 (NP-Chaudhary). Therefore, we have added Figure 3- figure supplement 2 and the following sentence in the discussion. “Several other NeuroPAL atlases from different data sources were considered, and the atlas that resulted in the highest neuron ID correspondence was selected (Figure 3- figure supplement 2).”

      Author response image 1.

      Figure3- figure supplement 2. Comparison of neuron ID correspondences resulng from addional atlases- atlases driven from NeuroPAL neuron posional data from mulple sources (Chaudhary et al., Yemini et al., and Skuhersky et al.) in red compared to other atlases in Figure 3. Two sample t-tests were performed for stascal analysis. The asterisk symbol denotes a significance level of p<0.05, and n.s. denotes no significance. OW: atlas driven by data from OpenWorm project, NP-source: NeuroPAL atlas driven by data from the source. NP-Chaudhary atlas corresponds to NeuroPAL atlas in Figure 3.

      80% agreement among manual idenficaons seems low to me for a relavely small, (mostly) known set of cells, which seems to cast into doubt ground truth idenes based on a best 2 out of 3 vote. The authors menon 3% of cell idenes had total disagreement and were excluded, what were the fracon unanimous and 2/3? Are there any further insights about what limited human performance in the context of this parcular idenficaon task?

      We closely looked into the manual annotation data. The fraction of cells in unanimous, two thirds, and no agreement are approximately 74%, 20%, and 6%, respectively. We made the corresponding change in the manuscript from 3% to 6%. Indeed, we identified certain patterns in labels that were more likely to be disagreed upon. First, cells in close proximity to each other, such as AVE and RMD, were often switched from annotator to annotator. Second, cells in the posterior part of the cluster, such as RIM, AVD, AVB, were more variable in positions, so their identities were not clear at times. Third, annotators were more likely to disagree on cells whose expressions are rare and low, and these include AIB, AVJ, and M1. These observations agree with our results in figure 4c.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Yao et al. explored the transcriptomic characteristics of neural stem cells (NSCs) in the human hippocampus and their changes under different conditions using single-nucleus RNA sequencing (snRNA-seq). They generated single-nucleus transcriptomic profiles of human hippocampal cells from neonatal, adult, and aging individuals, as well as from stroke patients. They focused on the cell groups related to neurogenesis, such as neural stem cells and their progeny. They revealed genes enriched in different NSC states and performed trajectory analysis to trace the transitions among NSC states and towards astroglial and neuronal lineages in silico. They also examined how NSCs are affected by aging and injury using their datasets and found differences in NSC numbers and gene expression patterns across age groups and injury conditions. One major issue of the manuscript is questionable cell type identification. For example, more than 50% of the cells in the astroglial lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies.

      While the authors have made efforts to address previous critics, major concerns have not been adequately addressed, including a very limited sample size and with poor patient information. In addition, some analytical approaches are still questionable and the authors acknowledged that some they cannot address. Therefore, while the topic is interesting, some results are preliminary and some conclusions are not fully supported by the data presented.

      We thank the reviewer for reevaluating our revised manuscript. We respect the reviewer’s comments and discuss the technical and conceptual limitations of this work. Here we provide the response to Reviewer #1 (Public Review) on these below.

      Firstly, we appreciate the concerns raised by Reviewer 1 regarding the high proportion of NSCs within the astroglia lineage clusters. it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure 2-Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts.

      Secondly, we cannot adequately address the major concern regarding sample size raised by the reviewer due to the scarcity of stroke and neonatal human brain samples. We have collected additional details about the donors. Please refer to Figure 1-source data 1 for the updated information. Other information regarding the lifestyle parameters of these donors has not been sufficiently recorded by the hospital. Therefore, we cannot improve the patient information further.

      Thirdly, regarding the questionable subpopulations of granule cells (GCs) that derive from neuroblasts in Figure 4A-4D, which are inconsistent with previous single-cell transcriptomic studies, we tried various strategies to confirm the identity of the two subpopulations of granule cells (GCs) derived from neuroblasts but didn’t get a clear answer. As a result, we can only provide an objective description of the differences in gene expression and developmental trajectory and speculate that these differences may be related to their degree of maturity but are not aligned on the same trajectory.

      In the end, we have discussed the technical and conceptual limitations of this work and added a brief discussion about these limitations in the last paragraph of the main text. We hope the readers can interprate our data critically and objectively.

      Reviewer #2 (Public Review):

      In this manuscript, Yao et al. present a series of experiments aiming at generating a cellular atlas of the human hippocampus across aging, and how it may be affected by injury, in particular, stroke. Although the aim of the study is interesting and relevant for a larger audience, due to the ongoing controversy around the existence of adult hippocampal neurogenesis in humans, a number or technical weaknesses result in a poor support for many of the conclusions made from the results of these experiments.

      In particular, a recent meta analysis of five previous studies applying similar techniques to human samples has identified different aspects of sample size as main determinants of the statistical power needed to make significant conclusions. Some of this aspects are the number of nuclei sequenced and subject stratification. These two aspects are of concern in Yao's study. First, the number of sequenced nuclei is lower than the calculated numbers of nuclei required for detecting rare cell types. However, Yao et al. report succeeding in detecting rare populations, including several types of neural stem cells in different proliferation states, which have been demonstrated to be extremely scarce by previous studies. It would be very interesting to read how the authors interpret these differences. Secondly, the number of donors included in some of the groups is extremely low (n=1) and the miscellaneous information provided about the donors is practically inexistent. As individual factors such as chronic conditions, medication, lifestyle parameters, etc... are considered determinant for the variability of adult hippocampal neurogenesis levels across individuals, this represents a series limitation of the current study. Overall, several technical weaknesses severely limit the relevance of this study and the ability of the authors to achieve their experimental aims.

      After a first review round, the manuscript is still lacking a clear discussion of its several technical limitations, which will help the audience to grasp the relevance of the findings. In particular, detailed information about individual patients health status and relevant lifestyle parameters that may have affected it is lacking. The authors make the point themselves that the discrepancies among studies might be caused by health state differences across hippocampi, which subsequently lead to different degrees of hippocampal neurogenesis.". So, even in the authors own interpretation this is a serious limitation to the manuscript, that however out of the authors control, impacts on the quality of their findings.

      Reviewer #2 (Recommendations For The Authors):

      Please see public review. I do understand the authors point about incomplete patient data collection and low patient numbers and how the former is out of their control. Nevertheless, these are crucial parameters that impact negatively on the quality and relevance of several of their bold claims in the manuscript, especially given the low number of patients included. The current version still lacks a clear and honest discussion of the several technical and conceptual limitations of the authors work, as in some cases they are presented to the reviewers in the rebuttal letter, for the readership, so that they could critically evaluate the relevance of the authors' finding in a bigger perspective.

      We thank the reviewer for reevaluating our revised manuscript. We respect the reviewer’s comm¬ents and discuss the technical and conceptual limitations of this work. Here we provide the response to Reviewer #2 (Public Review) on these below.

      We understand the reviewer’s concern and have also noticed that according to the computational modeling conducted by Tosoni et al. (Neuron, 2023), at least 21 neuroblast cells (NBs) can be identified out of 30,000 granule cells (GCs) from a total of 180,000 dentate gyrus (DG) cells. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Therefore, it is possible that we are able to detect NBs. Importantly, we have implemented strict quality control measures to support the reliability of our sequencing data. These measures include: 1. Immediate collection of tissue samples after postmortem (3-4 hrs) to ensure the quality of isolated nuclei. 2. Only nuclei expressing more than 200 genes but fewer than 5000-8600 genes (depending on the peak of enrichment genes) were considered. On average, each cell detected around 3000 genes. 3. The average proportion of mitochondrial genes in each sample was approximately 1.8%, with no sample exceeding 5%. We have shown that the number of cells captured from individual samples and the average number of genes detected per cell are sufficient, indicating overall good sequencing quality (Figure 1-supplement 1A,B andF, and Figure 1-source data 1). Additionally, we have further confirmed the presence of these cell types with low abundance by integrating immunofluorescence staining (Figure 4E, 5D and 6B), cell type-specific gene expression (Figure1 C and D), overall transcriptomic characteristics (Figure 1-supplement 1E), and developmental potential (Figure4 A-D, Figure 6E and F). We hope these evidences together could explain why we can identify the rare neurogenic populations.

      Regarding the limited sample size and poor patient information, we cannot adequately address these two major concerns. Due to the scarcity of stroke or neonatal human samples, it was not feasible to collect a larger sample size within the expected timeframe. We have collected additional details about the donors. Please refer to Figure 1-source data 1 for the updated information. Other information regarding the lifestyle parameters of these donors has not been sufficiently recorded by the hospital. Therefore, we cannot improve the patient information further.

      As per the reviewer’s recommendation, in the latest version, we have discussed the technical and conceptual limitations of this work and added a brief discussion about these limitations in the last paragraph of the main text. We hope the readers can interprate our data critically and objectively.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their careful reading of our manuscript and for the detailed and constructive feedback on our work. Please find attached the revised version of the manuscript. We performed an extensive revision of the manuscript to address the issues raised by the referees. We provide new analyses (regarding the response consistency and the neural complexity), added supplementary figures and edits to figures and texts. Based on the reviewers’ comments, we introduced several major changes to the manuscript.

      Most notably, we

      • added a limitation statement to emphasize the speculative nature of our interpretation of the timing of word processing/associative binding

      • emphasized the limitations of the control condition

      • added analyses on the interaction between memory retrieval after 12h versus 36h

      • clarified our definition of episodic memory

      • added detailed analyses of the “Feeling of having heard” responses and the confidence ratings

      We hope that the revised manuscript addresses the reviewers' comments to their satisfaction. We believe that the revised manuscript has been significantly improved owing to the feedback provided. Below you can find a point-by-point response to each reviewer comment in blue. We are looking forward that the revision will be published in the Journal eLife.

      Reviewer #1 (Public Review):

      The authors show that concurrently presenting foreign words and their translations during sleep leads to the ability to semantically categorize the foreign words above chance. Specifically, this procedure was successful when stimuli were delivered during slow oscillation troughs as opposed to peaks, which has been the focus of many recent investigations into the learning & memory functions of sleep. Finally, further analyses showed that larger and more prototypical slow oscillation troughs led to better categorization performance, which offers hints to others on how to improve or predict the efficacy of this intervention. The strength here is the novel behavioral finding and supporting physiological analyses, whereas the biggest weakness is the interpretation of the peak vs. trough effect.

      R1.1. Major importance:

      I believe the authors could attempt to address this question: What do the authors believe is the largest implication of this studies? How far can this technique be pushed, and how can it practically augment real-world learning?

      We revised the discussion to put more emphasis on possible practical applications of this study (lines 645-656).

      In our opinion, the strength of this paper is its contribution to the basic understanding of information processing during deep sleep, rather than its insights on how to augment realworld learning. Given the currently limited data on learning during sleep, we believe it would be premature to make strong claims about potential practical applications of sleep-learning. In addition, as pointed out in the discussion section, we do not know what adverse effects sleep-learning has on other sleep-related mechanisms such as memory consolidation.

      R1.2. Lines 155-7: How do the authors argue that the words fit well within the half-waves when the sounds lasted 540 ms and didn't necessarily start right at the beginning of each half-wave? This is a major point that should be discussed, as part of the down-state sound continues into the up-state. Looking at Figure 3A, it is clear that stimulus presented in the slow oscillation trough ends at a time that is solidly into the upstate, and would not neurolinguists argue that a lot of sound processing occurs after the end of the sound? It's not a problem for their findings, which is about when is the best time to start such a stimulus, but it's a problem for the interpretation. Additionally, the authors could include some discussion on whether possibly presenting shorter sounds would help to resolve the ambiguities here.

      The word pairs’ presentations lasted on average ~540 ms. Importantly, the word pairs’ onset was timed to occur 100 ms before the maximal amplitude of the targeted peaks/troughs.

      Therefore, most of a word’s sound pattern appeared during the negative going half-wave (about 350ms of 540ms). Importantly, Brodbeck and colleagues (2022) have shown that phonemes are continuously analyzed and interpreted with delays of about 50-200 ms, peaking at 100ms delay. These results suggest that word processing started just following the negative maximum of a trough and finished during the next peak. Our interpretation (e.g. line 520+) suggests that low-level auditory processing reaches the auditory cortex before the positive going half-wave. During the positive going half-wave the higher-level semantic networks appear the extract the presented word's meaning and associate the two simultaneously presented words. We clarified the time course regarding slow-wave phases and sound presentation in the manuscript (lines 158-164). Moreover, we added the limitation that we cannot know for sure when and in which slow-wave phase words were processed (lines 645-656). Future studies might want to look at shorter lasting stimuli to narrow down the timing of the word processing steps in relation to the sleep slow waves.

      R1.3. Medium importance:

      Throughout the paper, another concern relates to the term 'closed-loop'. It appears this term has been largely misused in the literature, and I believe the more appropriate term here is 'real-time' (Bergmann, 2018, Frontiers in Psychology; Antony et al., 2022, Journal of Sleep Research). For instance, if there were some sort of algorithm that assessed whether each individual word was successfully processed by the brain during sleep and then the delivery of words was subsequently changed, that could be more accurately labelled as 'closed-loop'.

      We acknowledge that the meaning of “closed-loop” in its narrowest sense is not fulfilled here. We believe that “slow oscillation phase-targeted, brain-state-dependent stimulation” is the most appropriate term to describe the applied procedure (BSDBS, Bergmann, 2018). We changed the wording in the manuscript to brain-state-dependent stimulation algorithm. Nevertheless, we would like to point out that the algorithm we developed and used (TOPOSO) is very similar to the algorithms often termed closed-loop algorithm in memory and sleep (e.g. Esfahani et al., 2023; Garcia-Molina et al., 2018; Ngo et al., 2013, for a comparison of TOPOSO to these techniques see Wunderlin et al., 2022 and for more information about TOPOSO see Ruch et al., 2022).

      R1.4. Figure 5 and corresponding analyses: Note that the two conditions end up with different sounds with likely different auditory complexities. That is, one word vs. two words simultaneously likely differ on some low-level acoustic characteristics, which could explain the physiological differences. Either the authors should address this via auditory analyses or it should be added as a limitation.

      This is correct, the two conditions differ on auditory complexities. Accordingly, we added this issue as another limitation of the study (line 651-653). We had decided for a single word control condition to ensure that no associative learning (between pseudowords) could take place in the control condition because this was the critical learning process in the experimental condition. We would like to point out that we observed significant differences in brain responses to the presentation of word-pairs (experimental condition) vs single pseudowords (control condition) in the Trough condition, but not the Peak condition. If indeed low-level acoustic characteristics explained the EEG differences occurring between the two conditions then one would expect these differences occurring in both the trough and the peak condition because earlier studies showed that low-level acoustic processing proceeds in both phases of slow waves (Andrillon et al., 2016; Batterink et al., 2016; Daltrozzo et al., 2012).

      R1.5. Line 562-7 (and elsewhere in the paper): "episodic" learning is referenced here and many times throughout the paper. But episodic learning is not what was enhanced here. Please be mindful of this wording, as it can be confusing otherwise.

      The reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g., Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013).

      We stand by our claim that sleep-learning was of episodic nature. Here we use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000) and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). We revised the manuscript to clarify that and how our definition differs from traditional definitions. Please see reviewer comment R3.1 for a more extensive answer.

      Reviewer #2 (Public Review):

      In this project, Schmidig, Ruch and Henke examined whether word pairs that were presented during slow-wave sleep would leave a detectable memory trace 12 and 36 hours later. Such an effect was found, as participants showed a bias to categorize pseudowords according to a familiar word that they were paired with during slow-wave sleep. This behavior was not accompanied by any sign of conscious understanding of why the judgment was made, and so demonstrates that long-term memory can be formed even without conscious access to the presented content. Unconscious learning occurred when pairs were presented during troughs but not during peaks of slow-wave oscillations. Differences in brain responses to the two types of presentation schemes, and between word pairs that were later correctly- vs. incorrectly-judged, suggest a potential mechanism for how such deep-sleep learning can occur.

      The results are very interesting, and they are based on solid methods and analyses. Results largely support the authors' conclusions, but I felt that there were a few points in which conclusions were not entirely convincing:

      R2.1. As a control for the critical stimuli in this study, authors used a single pseudoword simultaneously played to both ears. This control condition (CC) differs from the experimental condition (EC) in a few dimensions, among them: amount of information provided, binaural coherence and word familiarity. These differences make it hard to conclude that the higher theta and spindle power observed for EC over CC trials indicate associative binding, as claimed in the paper. Alternative explanations can be made, for instance, that they reflect word recognition, as only EC contains familiar words.

      We agree. In the revised version of the manuscript, we emphasise this as a limitation of our study (line 653-656). Moreover, we understand that the differences between stimuli of the control and the experimental condition must not rely only on the associative binding of two words. We cautioned our interpretation of the findings.

      Interestingly, EC vs CC exhibits differences following trough- but not peak targeting (see R1.4). If indeed all the EC vs CC differences were unrelated to associative binding, we would expect the same EC vs CC differences when peaks were targeted. Hence, the selective EC vs CC differences in the trough condition suggest that the brain is more responsive to sound, information, word familiarity and word semantics during troughs, where we found successful learning, compared to peaks, where no learning occurred. Troughtargeted word pairs (EC) versus foreign words (CC) enhanced the theta power 336 at 500 ms following word onset and this theta enhancement correlated significantly with interindividual retrieval performance indicating that theta probably promoted associative learning during sleep. This correlation was insignificant for spindle power.

      R2.2. The entire set of EC pairs were tested both following 12 hours and following 36 hours. Exposure to the pairs during test #1 can be expected to have an effect over memory one day later, during test #2, and so differences between the tests could be at least partially driven by the additional activation and rehearsal of the material during test #1. Therefore, it is hard to draw conclusions regarding automatic memory reorganization between 12 and 36 hours after unconscious learning. Specifically, a claim is made regarding a third wave of plasticity, but we cannot be certain that the improvement found in the 36 hour test would have happened without test #1.

      We understand that the retrieval test at 12h may have had an impact on performance on the retrieval test at 36h. Practicing retrieval of newly formed memories is known to facilitate future retrieval of the same memories (e.g. Karpicke & Roediger, 2008). Hence, practicing the retrieval of sleep-formed memories during the retrieval test at 12h may have boosted performance at 36h.

      However, recent literature suggests that retrieval practice is only beneficial when corrective feedback is provided (Belardi et al., 2021; Metcalfe, 2017). In our study, we only presented the sleep-played pseudowords at test and participants received no feedback regarding the accuracy of their responses. Thus, a proper conscious re-encoding could not take place. Nevertheless, the retrieval at 12h may have altered performance at 36h in other ways. For example, it could have tagged the reactivated sleep-formed memories for enhanced consolidation during the next night (Rabinovich Orlandi et al., 2020; Wilhelm et al., 2011).

      We included a paragraph on the potential carry-over effects from retrieval at 12h on retrieval at 36h in the discussion section (line 489-496; line 657-659). Furthermore, we removed the arguments about the “third wave of plasticity”.

      R2.3. Authors claim that perceptual and conceptual processing during sleep led to increased neural complexity in troughs. However, neural complexity was not found to differ between EC and CC, nor between remembered and forgotten pairs. It is therefore not clear to me why the increased complexity that was found in troughs should be attributed to perceptual and conceptual word processing, as CC contains meaningless vowels. Moreover, from the evidence presented in this work at least, I am not sure there is room to infer causation - that the increase in HFD is driven by the stimuli - as there is no control analysis looking at HFD during troughs that did not contain stimulation.

      With the analysis of the HFD we would like to provide an additional perspective to the oscillation-based analysis. We checked whether the boundary condition of Peak and Trough targeting changes the overall complexity or information content in the EEG. Our goal was to assess the change in neural complexity (relative to a pre-stimulus baseline) following the successful vs unsuccessful encoding of word pairs during sleep.

      We acknowledge that a causal interpretation about HFD is not warranted, and we revised the manuscript accordingly. It was unexpected that we could not find the same results in the contrast of EC vs CC or correct vs incorrect word pairs. We suggest that our signal-to noise ratio might have been too weak.

      One could argue that the phase targeting alone (without stimulation) induces peak/trough differences in complexity. We cannot completely rule out this concern. But we tried to use the EEG that was not influenced by the ongoing slow-wave: the EEG 2000-500ms before the stimulus onset and 500-2000ms after the stimulus onset. Therefore, we excluded the 1s of the targeted slow-wave, hoping that most of the phase inherent complexity should have faded out (see Figure 2). We could not further extend the time window of analysis due to the minimal stimulus onset interval of 2s. Of course we cannot exclude that the targeted Trough impacted the following HFD. We clarified this in the manuscript (line 384-425).

      Furthermore, we did find a difference of neural complexity between the pre-stimulus baseline and the post-stimulus complexity in the Peak condition but not in the Trough condition (we now added this contrast to the manuscript, line 416-419). Hence, the change in neural complexity is a reaction to the interaction of the specific slow-wave phase with the processing of the word pairs. Even though these results cannot provide unambiguous, causal links, we think they can figure as an important start for other studies to decipher neural complexity during slow wave sleep.

      Reviewer #3 (Public Review):

      The study aims at creating novel episodic memories during slow wave sleep, that can be transferred in the awake state. To do so, participants were simultaneously presented during sleep both foreign words and their arbitrary translations in their language (one word in each ear), or as a control condition only the foreign word alone, binaurally. Stimuli were presented either at the trough or the peak of the slow oscillation using a closed-loop stimulation algorithm. To test for the creation of a flexible association during sleep, participant were then presented at wake with the foreign words alone and had (1) to decide whether they had the feeling of having heard that word before, (2) to attribute this word to one out of three possible conceptual categories (to which translations word actually belong), and (3) to rate their confidence about their decision.

      R3.1. The paper is well written, the protocol ingenious and the methods are robust. However, the results do not really add conceptually to a prior publication of this group showing the possibility to associate in slow wave sleep pairs of words denoting large or small object and non words, and then asking during ensuing wakefulness participant to categorise these non words to a "large" or "small" category. In both cases, the main finding is that this type of association can be formed during slow wave sleep if presented at the trough (versus the peak) of the slow oscillation. Crucially, whether these associations truly represent episodic memory formation during sleep, as claimed by the authors, is highly disputable as there is no control condition allowing to exclude the alternative, simpler hypothesis that mere perceptual associations between two elements (foreign word and translation) have been created and stored during sleep (which is already in itself an interesting finding). In this latter case, it would be only during the awake state when the foreign word is presented that its presentation would implicitly recall the associated translation, which in turn would "ignite" the associative/semantic association process eventually leading to the observed categorisation bias (i.e., foreign words tending to be put in the same conceptual category than their associated translation). In the absence of a dis-confirmation of this alternative and more economical hypothesis, and if we follow Ocam's razor assumption, the claim that there is episodic memory formation during sleep is speculative and unsupported, which is a serious limitation irrespective of the merits of the study. The title and interpretations should be toned down in this respect

      Our study conceptually adds to and extends the findings by Züst et al. (a) by highlighting the precise time-window or brain state during which sleep-learning is possible (e.g. slow-wave trough targeting), (b) by demonstrating the feasibility of associative learning during night sleep, and (c) by uncovering the longevity of sleep-formed memories.

      We acknowledge that the reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g, (Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013). We stand by our claim that sleep-learning was of episodic nature. We use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000), and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). The core computational features of episodic memory are 1) rapid learning, 2) association formation, and 3) a compositional and flexible representation of the associations in long-term memory.

      Therefore, we revised the manuscript to emphasize how our definition differs from traditional definitions (line 64).

      For the current study, we designed a retrieval task that calls on the core computational features of episodic memory by assessing flexible retrieval of sleep-formed compositional word-word associations. Reviewer 3 suggests an alternative interpretation for the learning observed here: mere perceptual associations between foreign words and translations words are stored during sleep, and semantic associations are only inferred at retrieval testing during ensuing wakefulness. First, these processing steps would require the rapid soundsound associative encoding, long-term storage, and the flexible sound retrieval, which would still require hippocampal processing and computations in the episodic memory system. Second, this mechanism seems highly laborious and inefficient. The sound pattern of a word at 12 hours after learning triggers the reactivation of an associated sound pattern of another word. This sound pattern then elicits the activation of the translation words’ semantics leading to the selection of the correct superordinate semantic category at test.

      Overall, we believe that our pairwise-associative learning paradigm triggered a rapid conceptual-associative encoding process mediated by the hippocampus that provided for flexible representations of foreign and translation words in episodic memory. This study adds to the existing literature by examining specific boundary conditions of sleep-learning and demonstrates the longevity (at least 36 hours) of sleep-learned associations.

      Other remarks:

      R3.2. Lines 43-45 : the assumption that the sleeping brain decides whether external events can be disregarded, requires awakening or should be stored for further consideration in the waking state is dubious, and the supporting references date from a time (the 60') during which hypnopedia was investigated in badly controlled sleep conditions (leaving open the doubt about the possibility that it occurred during micro awakenings)

      We revised the manuscript to add timelier and better controlled studies that bolster the 60ties-born claim (line 40-51). Recently, it has been shown that the sleeping brain preferentially processes relevant information. For example the information conveyed by unfamiliar voices (Ameen et al., 2022), emotional content (Holeckova et al., 2006; Moyne et al., 2022), our own compared to others’ names (Blume et al., 2018).

      R3.3. 1st paragraph, lines 48-53 , the authors should be more specific about what kind of new associations and at which level they can be stored during sleep according to recent reports, as a wide variety of associations (mostly elementary levels) are shown in the cited references. Limitations in information processing during sleep should also be acknowledged.

      In the lines to which R3 refers, we cite an article (Ruch & Henke, 2020) in which two of the three authors of the current manuscript elaborate in detail what kind of associations can be stored during sleep. We revised these lines to more clearly present the current understanding of the potential and the limitations of sleep-learning (line 40-51). Although information processing during sleep is generally reduced (Andrillon et al., 2016), a variety of different kinds of associations can be stored, ranging from tone-odour to word-word association (Arzi et al., 2012, 2014; Koroma et al., 2022; Züst et al., 2019).

      R3.4. The authors ran their main behavioural analyses on delayed retrieval at 36h rather than 12h with the argument that retrieval performance was numerically larger at 36 than 12h but the difference was non-significant (line 181-183), and that effects were essentially similar. Looking at Figure 2, is the trough effect really significant at 12h ? In any case, the fact that it is (numerically) higher at 36 than 12h might suggest that the association created at the first 12h retrieval (considering the alternative hypothesis proposed above) has been reinforced by subsequent sleep.

      The Trough effect at 12h is not significant, as stated on line 185 (“Planned contrasts against chance level revealed that retrieval performance significantly exceeded chance at 36 hours only (P36hours = 0.036, P12hours = 0.094).”). It seems that our wording was not clear. Therefore, we refined the description of the behavioural analysis in the manuscript (lines 188-193).

      In brief, we report an omnibus ANOVA with a significant main effect of targeting type (Trough vs Peak, main effect Peak versus Trough: F(1,28) = 5.237, p = 0.030, d = 0.865). Because Trough-targeting led to significantly better memory retention than Peak-targeting, we computed a second ANOVA, solely including participants with through-targeted word-pair encoding. The memory retention in the Trough condition is above chance (MTrough = 39.11%, SD = 10.76; FIntercept (1,14) = 5.660, p = 0.032) and does not significantly differ between the 12h and 36h retrieval (FEncoding-Test Delay (1,14) = 1.308, p = 0.272). However, the retrieval performance at 36h numerically exceeds the performance at 12h and the direct comparison against chance reveals that the 36h but not the 12h retrieval was significant (P36hours = 0.036, P12hours = 0.094). Hence, we found no evidence for above chance performance at the 12h retrieval and focused on the retrieval after 36h in the EEG analysis.

      We agree with the reviewer that the subsequent sleep seems to have improved consolidation and subsequent retrieval. We assume that the reviewer suggests that participants merely formed perceptual associations during sleep and encoded episodic-like associations during testing at 12h (as pointed out in R 3.1). However, we believe that it is unlikely that the awake encoding of semantic associations during the 12h retrieval led to improved performance after 36h. We changed the discussion regarding the interaction between retrieval at 12h and 36h (line 505-512, also see R 2.2)

      R3.5> In the discussion section lines 419-427, the argument is somehow circular in claiming episodic memory mechanisms based on functional neuroanatomical elements that are not tested here, and the supporting studies conducted during sleep were in a different setting (e.g. TMR)

      Indeed, the TMR and animal studies are a different setting compared to the present study. We re-wrote this part and only focused on the findings of Züst and colleagues (2019), who examined hippocampal activity during the awake retrieval of sleep-formed memories (lines 472-482). Additionally, we would like to emphasise that our main reasoning is that the task requirements called upon the episodic memory system.

      R3.6. Supplementary Material: in the EEG data the differentiation between correct and incorrect ulterior classifications when presented at the peak of the slow oscillation is only significant in association with 36h delayed retrieval but not at 12h, how do the authors explain this lack of effect at 12 hour ?

      We assume that the reviewer refers to the TROUGH condition (word-pairs targeted at a slow-wave trough) and not as written to the peak condition. We argue that the retention performance at 12h is not significantly above chance (M12hours = 37.4%, P12hours = 0.094).

      Hence, the distinction between “correctly” and “incorrectly” categorised word pairs was not informative for the EEG analysis during sleep. For whatever reason the 12h retrieval was not significantly above chance, the less successful memory recall and thus a less balanced trial count makes recall accuracy a worse delineator for separating EEG trials then the recall performance after 36 hours.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor importance:

      Abstract: The opening framing is confusing here and in the introduction. Why frame the paper in the broadest terms about awakenings and threats from the environment when this is a paper about intersections between learning & memory and sleep? I do understand that there is an interesting point to be made about the counterintuitive behavioral findings with respect to sleep generally being perceived as a time when stimuli are blocked out, but this does not seem to me to be the broadest points or the way to start the paper. The authors should consider this but of course push back if they disagree.

      We understand the reviewer’s criticism but believe that this has more to do with personal preferences than with the scientific value or validity of our work. We believe that it is our duty as researchers to present our study in a broader context because this may help readers from various fields to understand why the work is relevant. To some readers, evidence for learning during sleep may seem trivial, to others, it may seem impossible or a weird but useless conundrum. By pointing out potential evolutionary benefits of the ability to acquire new information during sleep, we help the broad readership of eLife understand the relevance of this work.

      Lines 31-32: "Neural complexity" -> "neural measures of complexity" because it isn't clear what "neural complexity" means at this point in the abstract. Though, note my other point that I believe this analysis should be removed.

      To our understanding, “neural complexity” is a frequently used term in the field and yields more than 4000 entries on google scholar. Whereas ‘neural measures of complexity’ only finds 3 hits on google scholar [September 2023]. In order to link our study with other studies on neural complexity, we would like to keep this terminology. As an example, two recent publications using “neural complexity” are Lee et al. (2020) and Frohlich et al. (2022).

      Lines 42-43: The line of work on 'sentinel' modes would be good to cite here (e.g., Blume et al., 2017, Brain & Language).

      We added the suggested citation to the manuscript (lines 52).

      Lines 84-90: While I appreciate the authors desire to dig deep and try to piece this all together, this is far too speculative in my opinion. Please see my other points on the same topic.

      In this paragraph, we point out why both peaks and troughs are worth exploring for their contributions to sensory processing and learning during sleep. Peaks and troughs are contributing mutually to sleep-learning. Our speculations should inspire further work aimed at pinning down the benefits of peaks and troughs for sleep-learning. We clarified the purpose and speculative nature of our arguments in the revised version of the manuscript.

      Line 109: "outlasting" -> "lasting over" or "lasting >"

      We changed the wording accordingly.

      Line 111: I believe 'nonsense' is not the correct term here, and 'foreign' (again) would be preferred. Some may be offended to hear their foreign word regarded as 'nonsense'. However, please let me know if I have misunderstood.

      We would like to use the linguistic term “pseudoword” (aligned with reviewer 2’s comment) and we revised the manuscript accordingly.

      Figure 1A: "Enconding" -> "Encoding"

      Thank you for pointing this out.

      Lines 201-2: Were there interactions between confidence and correctness on the semantic categorization task? Were correct responses given with more confidence than incorrect ones? This would not necessarily be a problem for the authors' account, as there can of course be implicit influences on confidence (i.e., fluency).

      As is stated in the results section, confidence ratings did not differ significantly between correct and incorrect assignments (Trough condition: F(1,14) = 2.36, p = 0.15); Peak condition: F(1,14) = 0.48, p = 0.50).

      Line 236: "Nicknazar" -> "Niknazar"

      Thank you for pointing this out.

      Line 266: "profited" -> "benefited"

      We changed the wording accordingly.

      Lines 280-4: There seems some relevance here with Malerba et al. (2018) and her other papers to categorize slow oscillations.

      Diving into the details on how to best categorise slow oscillations is beyond the scope of this manuscript. Here, we build on work from the field of microstate analyses and use two measures to describe and quantify the targeted brain states: the topography of the electric field (i.e., the correlation of the electric field with an established template or “microstate”), and the field strength (global field power, GFP). While the topography of a quasi-stable electric field reflects activity in a specific neural network, the strength (GFP) of a field most likely mirrors the degree of activation (or inactivity) in the specific network. Here, we find that consistent targeting of a specific network state yielding a strong frontal negativity benefitted learning during sleep. For a more detailed explanation of the slow-wave phase targeting see (Ruch et al., 2022).

      Lines 343-6: Was it intentional to have 0.5 s (0.2-0.7 s) surrounding the analysis around 500 ms but only 0.4 s (0.8-1.2 s) surrounding the analysis around 1 s? Could the authors use the same size interval or justify having them be different?

      We apologise for the misleading phrasing and we clarified this in the revised manuscript. We applied the same procedure for the comparison of later correctly vs incorrectly classified pseudowords as we did for the comparison between EC and CC. Hence, we analysed the entire window from 0s to 2.5s with a cluster-based permutation approach. Contrary to the EC vs CC contrast, no cluster remained significant for the comparison of the subsequent memory effect. By mistake we reported the wrong time window. In the revised manuscript, the paragraph is corrected (lines 364-369).

      Line 356-entire HFD section: it is unclear what's gained by this analysis, as it could simply be another reflection of the state of the brain at the time of word presentation. In my opinion, the authors should remove this analysis and section, as it does not add clarity to other aspects of the paper.

      (If the authors keep the section) Line 361-2 - "Moreover, high HFD values have been associated with cognitive processing (Lau et al., 2021; Parbat & Chakraborty, 2021)." This statement is vague. Could the authors elaborate?

      Please see our answer to Reviewer 2 (2.3) for a more detailed explanation. In brief, we would like to keep the analysis with the broad time window of -2 to -0.5 and from 0.5 to 2 s.

      Lines 403-4: How was it determined that these neural networks mediated both conscious/unconscious processes? Perhaps the authors meant to make a different point, but the way it reads to me is that there is evidence that some neural networks are conscious and others are not and both forms engage in similar functions.

      We revised the manuscript to be more precise and clear: “The conscious and unconscious rapid encoding and flexible retrieval of novel relational memories was found to recruit the same or similar networks including the hippocampus(Henke et al., 2003; Schneider et al., 2021). This suggests that conscious and unconscious relational memories are processed by the same memory system.” (p. 22, top).

      Lines 433-41: Performance didn't actually significantly increase from 12 to 36 hours, so this is all too speculative in my opinion.

      We removed the speculative claim that performance may have increased from the retrieval at 12 hours to the retrieval at 36 hours.

      Line 534: "assisted by enhanced" -> "coincident with". It's unclear whether theta reflects successful processing as having occurred or whether it directly affects or assists with it.

      We have adjusted the wording to be more cautious, as suggested (line 588).

      Line 572-4: Rothschild et al. (2016) is relevant here.

      Unfortunately, we do not see the relevance of this article within the context of our work.

      Line 577 paragraph: The authors may consider adding a note on the importance of ethical considerations surrounding this form of 'inception'.

      We extended this part by adding ethical considerations to the discussion section (Stickgold et al., 2021, line 657).

      Line 1366: It would be better if the authors could eventually make their data publicly available. This is obviously not required, but I encourage the authors to consider it if they have not considered it already.

      In my opinion, the discussion is too long. I really appreciate the authors trying to figure out the set of precise times in which each level of neural processing might occur and how this intersects with their slow oscillation phase results. However, I found a lot of this too speculative, especially given that the sounds may bleed into parts of other phases of the slow oscillation. I do not believe this is a problem unique to these authors, as many investigators attempting to target certain phases in the target memory reactivation literature have faced the same problem, but I do believe the authors get ahead of the data here. In particular, there seems to be one paragraph in the discussion that is multiple pages long (p. 22-24). This paragraph I believe has too much detail and should be broken up regardless, as it is difficult for the reader to follow.

      Considering the recent literature, we believe this interpretation best explains the data. As argued earlier, we believe that a speculative interpretation of the reported phenomena can provide substantial added value because it inspires future experimental work. We have improved the manuscript by clearly distinguishing between data and interpretation. We do declare the speculative nature of some offered interpretations. We hope that these speculations, which are testable hypotheses (!), will eventually be confirmed or refuted experimentally.

      Reviewer #2 (Recommendations For The Authors):

      I very much enjoyed the paper and think it describes important findings. I have a few suggestions for improvement, and minor comments that caught my eye during reading:

      (1) I was missing an analysis of CC ERP, and its comparison to EC ERP.

      We added this analysis to the manuscript (line 299-301). The comparison of CC ERP with EC ERP did not yield any significant cluster for either the peak (cluster-level Monte Carlo p=0.54) or the trough (cluster-level Monte Carlo p>0.37). We assume that the noise level was too high for the identification of differences between CC and EC ERP.

      (2) Regarding my public review comment #2, some light can be shed on between-test effects, I believe, using an item-based analysis - looking at correlations between items' classifications in test #1 and test #2. The assumption seems to be that items that were correct in test #1 remained correct in test #2 while other new correct classifications were added, owing to the additional consolidation happening between the two tests. But that is an empirical question that can be easily tested. If no consistency in item classification is found, on the other hand, or if only consistency in correct classification is found, that would be interesting in itself. This item-based analysis can help tease away real memory from random correct classification. For instance, the subset of items that are consistently classified correctly could be regarded as non-fluke at higher confidence and used as the focus of subsequent-memory analysis instead of the ones that were correct only in test #2.

      Thanks, we re-analysed the data accordingly. Participants were consistent at choosing a specific object category for an item at 12 hours and 36 hours (consistency rate = 47% same category, chance level is 1/3). Moreover, the consistency rate did not differ between the Trough and the Peak condition (MTrough = 47.2%, MPeak = 47.0%, P = 0.98). The better retrieval performance in the Trough compared to the Peak condition after 36 hours is due to: A) if participants were correct at 12h, they chose again the correct answer at 36h (Trough: 20% & Peak: 14%). B) Following an incorrect answer at 12h, participants switched to another object category at 36h (Trough: 72%, Peak: 67%). C) If participants switched the object category following an incorrect answer at 12h, they switched more often to the correct category at 36h in the trough versus the peak condition (Trough: in 56% & Peak: 53%). Hence, the data support the reviewer’s assumption: items that were correct after 12 hours remained correct after 36 hours, while other new correct classifications were generated at 36h owing to the additional consolidation happening between the two tests. We added this finding to the manuscript (line 191-200, Figure S6):

      Author response image 1.

      As suggested, we re-analysed the ERP with respect to the subsequent memory effect. This time we computed four conditions according to the reviewer’s argument about consistently correctly classified pseudowords, presented in the figure below: ERP of trials that were correctly classified at 36h (blue), ERP of trials that were incorrectly classified at 36h (light blue), ERP of trials that were correctly classified twice (brown) and ERP of trials that were not correctly classified twice (orange, all trials that are not in brown). Please note that the two blue lines are reported in the manuscript and include all trials. The brown and the orange line take the consistency into account and together include as well all trials.

      Author response image 2.

      By excluding even more trials from the group of correct retrieval responses, the noise level gets high. Therefore, the difference between the twice-correct and the not-twice-correct trials is not significant (cluster-level Monte Carlo p > 0.27). Because the ERP of twice-correct trials seems very similar to the ERP of the trials correctly classified at 36h at frontal electrodes, we assume that our ERP effect is not driven by a few extreme subjects. Similarly, not-twicecorrect trials (orange) have a stronger frontal trough than the trials incorrectly classified at 36h (light blue).

      (3) In a similar vein, a subject-based analysis would be highly interesting. First and foremost, readers would benefit from seeing the lines that connect individual dots across the two tests in figures 2B and 2C. It is reasonable to expect that only a subset of participants were successful learners in this experiment. Finding them and analyzing their results separately could be revealing.

      We added a Figure S1 to the supplementary material, providing the pairing between performance of the 12h and the 36h retrieval.

      It is an interesting idea to look at successful learners alone. We computed the ERP of the subsequent memory effect for those participants, who had an above change retrieval accuracy at 36h. The result shows a similar effect as reported for all participants (frontal cluster ~0-0.3s). The p-value is only 0.08 because only 9 of 15 participants exhibited an above chance retrieval performance at 36 hours.

      Author response image 3.

      ERP effect of correct (blue) vs incorrect (light blue) pseudoword category assignment of participants with a retrieval performance above chance at 36h (SD as shades):

      We prefer to not include this data in the manuscript, but are happy to provide it here.

      (4) I wondered why the authors informed subjects of the task in advance (that they will be presented associations when they slept)? I imagine this may boost learning as compared to completely naïve subjects. Whether this is the reason or not, I think an explanation of why this was done is warranted, and a statement whether authors believe the manipulation would work otherwise. Also, the reader is left wondering why subjects were informed only about test #1 and not about test #2 (and when were they told about test #2).

      Subjects were informed of all the tests upfront. We apologize for the inconsistency in the manuscript and revised the method part. The explanation of why participants were informed is twofold: a) Participants had to sleep with in-ear headphones. We wanted to explain to participants why these are necessary and why they should not remove them. b) We hoped that participants would be expecting unconsciously sounds played during sleep, would process these sounds efficiently and would remain deeply asleep (no arousals).

      (5) FoHH is a binary yes/no question, and so may not have been sensitive enough to demonstrate small differences in familiarity. For comparison, the Perceptual Awareness Scale (Ramsøy & Overgaard, 2004) that is typically used in studies of unconscious processing is of a 4-point scale, and this allows to capture more nuanced effects such as partial consciousness and larger response biases. Regardless, it would be informative to have the FoHH numbers obtained in this study, and not just their comparison between conditions. Also, was familiarity of EC and CC pseudowords compared? One may wonder whether hearing the pseudowords clearly vs. in one ear alongside a familiar word would make the word slightly more familiar.

      We apologize for having simplified this part too much in the manuscript. Indeed, the FoHH is comparable to the PAS. We used a 4-point scale, where participants rated their feeling of whether they have heard the pseudoword during previous sleep. In the revised manuscript, we report the complete results (line 203-223). The FoHH did not differ between any of the suggested contrasts. Thus, for both the peak and the trough condition, the FoHH did not differ between sleep-played vs new; correct EC trials vs new; correct vs incorrect EC trials; EC vs CC trials. To illustrate the results, a figure of the FoHH has been added to the supplement (Figure S4).

      (6) Similarly, it would be good to report the numbers of the confidence ratings in the paper as well.

      In the revised manuscript, we extended the description of the confidence rating results. We added the descriptive statistics (line 224-236) and included a corresponding figure in the supplement (Figure S5).

      Minor/aesthetic comments:

      We implemented all the following suggestions.

      (1) I suggest using "pseudoword" or "nonsense word" instead of "foreign word", because "foreign word" typically means a real word from a different language. It is quite confusing when starting to read the paper.

      After reconsidering, we think that pseudoword is the appropriate linguistic term and have revised the manuscript accordingly.

      (2) Lines 1000-1001: "The required sample size of N = 30 was determined based on a previous sleep-learning study". I was missing a description of what study you are referring to.

      (3) I am not sure I understood the claim nor the rationale made in lines 414-417. Is the claim that pairs did not form one integrated engram? How do we know that? And why would having one engram not enable extracting the meaning from a visual-auditory presentation of the cue? The sentence needs some rewording and/or unpacking.

      (4) Were categories counterbalanced (i.e., did each subjects' EC contain 9 animal words, 9 tool words and 9 place words)?

      (5) Asterisks indicating significant effects are missing from Figure 4 and S2.

      (6) Fig1 legend: "Participants were played with pairs" is ungrammatical.

      (7) Line 1093: no need for a comma.

      (8) Line 1336: missing opening parenthesis

      (9) Line 430: "observe" instead of "observed".

      (10) Line 466: two dots instead of one..

      Reviewer #3 (Recommendations For The Authors):

      Methods: 2 separate ANOVAs are performed (lines 160-185), but would not it make more sense to combine both in one ? If kept separated then a correction for multiple comparisons might be needed (p/2 = 0.025)

      We computed an omnibus ANOVA. In a next step, we examined the effect in the significant targeting condition by computing another ANOVA. For further explanations, see reviewer comment 3.4.

      References

      Ameen, M. S., Heib, D. P. J., Blume, C., & Schabus, M. (2022). The Brain Selectively Tunes to Unfamiliar Voices during Sleep. Journal of Neuroscience, 42(9), 1791–1803. https://doi.org/10.1523/JNEUROSCI.2524-20.2021

      Andrillon, T., Poulsen, A. T., Hansen, L. K., Léger, D., & Kouider, S. (2016). Neural Markers of Responsiveness to the Environment in Human Sleep. The Journal of Neuroscience, 36(24), Article 24. https://doi.org/10.1523/JNEUROSCI.0902-16.2016

      Arzi, A., Holtzman, Y., Samnon, P., Eshel, N., Harel, E., & Sobel, N. (2014). Olfactory Aversive Conditioning during Sleep Reduces Cigarette-Smoking Behavior. Journal of Neuroscience, 34(46), Article 46. https://doi.org/10.1523/JNEUROSCI.2291-14.2014

      Arzi, A., Shedlesky, L., Ben-Shaul, M., Nasser, K., Oksenberg, A., Hairston, I. S., & Sobel, N. (2012). Humans can learn new information during sleep. Nature Neuroscience, 15(10), Article 10. https://doi.org/10.1038/nn.3193

      Batterink, L. J., Creery, J. D., & Paller, K. A. (2016). Phase of Spontaneous Slow Oscillations during Sleep Influences Memory-Related Processing of Auditory Cues. Journal of Neuroscience, 36(4), 1401–1409. https://doi.org/10.1523/JNEUROSCI.3175-15.2016

      Belardi, A., Pedrett, S., Rothen, N., & Reber, T. P. (2021). Spacing, Feedback, and Testing Boost Vocabulary Learning in a Web Application. Frontiers in Psychology, 12. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.757262

      Bergmann, T. O. (2018). Brain State-Dependent Brain Stimulation. Frontiers in Psychology, 9, 2108. https://doi.org/10.3389/fpsyg.2018.02108

      Blume, C., del Giudice, R., Wislowska, M., Heib, D. P. J., & Schabus, M. (2018). Standing sentinel during human sleep: Continued evaluation of environmental stimuli in the absence of consciousness. NeuroImage, 178, 638–648. https://doi.org/10.1016/j.neuroimage.2018.05.056

      Brodbeck, C., & Simon, J. Z. (2022). Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Frontiers in Neuroscience, 16. https://www.frontiersin.org/articles/10.3389/fnins.2022.828546

      Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia, and the Hippocampal System. A Bradford Book.

      Daltrozzo, J., Claude, L., Tillmann, B., Bastuji, H., & Perrin, F. (2012). Working memory is partially preserved during sleep. PloS One, 7(12), Article 12.

      Dew, I. T. Z., & Cabeza, R. (2011). The porous boundaries between explicit and implicit memory: Behavioral and neural evidence. Annals of the New York Academy of Sciences, 1224(1), 174–190. https://doi.org/10.1111/j.1749-6632.2010.05946.x

      Esfahani, M. J., Farboud, S., Ngo, H.-V. V., Schneider, J., Weber, F. D., Talamini, L. M., & Dresler, M. (2023). Closed-loop auditory stimulation of sleep slow oscillations: Basic principles and best practices. Neuroscience & Biobehavioral Reviews, 153, 105379. https://doi.org/10.1016/j.neubiorev.2023.105379

      Frohlich, J., Chiang, J. N., Mediano, P. A. M., Nespeca, M., Saravanapandian, V., Toker, D., Dell’Italia, J., Hipp, J. F., Jeste, S. S., Chu, C. J., Bird, L. M., & Monti, M. M. (2022). Neural complexity is a common denominator of human consciousness across diverse regimes of cortical dynamics. Communications Biology, 5(1), Article 1. https://doi.org/10.1038/s42003-022-04331-7

      Gabrieli, J. D. E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 87–115.

      Garcia-Molina, G., Tsoneva, T., Jasko, J., Steele, B., Aquino, A., Baher, K., Pastoor, S., Pfundtner, S., Ostrowski, L., Miller, B., Papas, N., Riedner, B., Tononi, G., & White, D. P. (2018). Closed-loop system to enhance slow-wave activity. Journal of Neural Engineering, 15(6), 066018. https://doi.org/10.1088/1741-2552/aae18f

      Hannula, D. E., Minor, G. N., & Slabbekoorn, D. (2023). Conscious awareness and memory systems in the brain. WIREs Cognitive Science, 14(5), e1648. https://doi.org/10.1002/wcs.1648

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), Article 7. https://doi.org/10.1038/nrn2850

      Henke, K., Mondadori, C. R. A., Treyer, V., Nitsch, R. M., Buck, A., & Hock, C. (2003). Nonconscious formation and reactivation of semantic associations by way of the medial temporal lobe. Neuropsychologia, 41(8), Article 8. https://doi.org/10.1016/S0028-3932(03)00035-6

      Holeckova, I., Fischer, C., Giard, M.-H., Delpuech, C., & Morlet, D. (2006). Brain responses to a subject’s own name uttered by a familiar voice. Brain Research, 1082(1), 142–152. https://doi.org/10.1016/j.brainres.2006.01.089

      Karpicke, J. D., & Roediger, H. L. (2008). The Critical Importance of Retrieval for Learning. Science, 319(5865), 966–968. https://doi.org/10.1126/science.1152408

      Koroma, M., Elbaz, M., Léger, D., & Kouider, S. (2022). Learning New Vocabulary Implicitly During Sleep Transfers With Cross-Modal Generalization Into Wakefulness. Frontiers in Neuroscience, 16, 801666. https://doi.org/10.3389/fnins.2022.801666

      Lee, Y., Lee, J., Hwang, S. J., Yang, E., & Choi, S. (2020). Neural Complexity Measures. Advances in Neural Information Processing Systems, 33, 9713–9724. https://proceedings.neurips.cc/paper/2020/hash/6e17a5fd135fcaf4b49f2860c2474c7 c-Abstract.html

      Metcalfe, J. (2017). Learning from Errors. Annual Review of Psychology, 68(1), 465–489. https://doi.org/10.1146/annurev-psych-010416-044022

      Moscovitch, M. (2008). The hippocampus as a “stupid,” domain-specific module: Implications for theories of recent and remote memory, and of imagination. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 62, 62–79. https://doi.org/10.1037/1196-1961.62.1.62

      Moyne, M., Legendre, G., Arnal, L., Kumar, S., Sterpenich, V., Seeck, M., Grandjean, D., Schwartz, S., Vuilleumier, P., & Domínguez-Borràs, J. (2022). Brain reactivity to emotion persists in NREM sleep and is associated with individual dream recall. Cerebral Cortex Communications, 3(1), tgac003. https://doi.org/10.1093/texcom/tgac003

      Ngo, H.-V. V., Martinetz, T., Born, J., & Mölle, M. (2013). Auditory Closed-Loop Stimulation of the Sleep Slow Oscillation Enhances Memory. Neuron, 78(3), Article 3. https://doi.org/10.1016/j.neuron.2013.03.006

      O’Reilly, R. C., Bhattacharyya, R., Howard, M. D., & Ketz, N. (2014). Complementary Learning Systems. Cognitive Science, 38(6), 1229–1248. https://doi.org/10.1111/j.1551-6709.2011.01214.x

      O’Reilly, R. C., & Rudy, J. W. (2000). Computational principles of learning in the neocortex and hippocampus. Hippocampus, 10(4), 389–397. https://doi.org/10.1002/1098-1063(2000)10:4<389::AID-HIPO5>3.0.CO;2-P

      Rabinovich Orlandi, I., Fullio, C. L., Schroeder, M. N., Giurfa, M., Ballarini, F., & Moncada, D. (2020). Behavioral tagging underlies memory reconsolidation. Proceedings of the National Academy of Sciences, 117(30), 18029–18036. https://doi.org/10.1073/pnas.2009517117

      Reder, L. M., Park, H., & Kieffaber, P. D. (2009). Memory systems do not divide on consciousness: Reinterpreting memory in terms of activation and binding. Psychological Bulletin, 135(1), Article 1. https://doi.org/10.1037/a0013974

      Ruch, S., & Henke, K. (2020). Learning During Sleep: A Dream Comes True? Trends in Cognitive Sciences, 24(3), 170–172. https://doi.org/10.1016/j.tics.2019.12.007

      Ruch, S., Schmidig, F. J., Knüsel, L., & Henke, K. (2022). Closed-loop modulation of local slow oscillations in human NREM sleep. NeuroImage, 264, 119682. https://doi.org/10.1016/j.neuroimage.2022.119682

      Schacter, D. L. (1998). Memory and Awareness. Science, 280(5360), 59–60. https://doi.org/10.1126/science.280.5360.59

      Schneider, E., Züst, M. A., Wuethrich, S., Schmidig, F., Klöppel, S., Wiest, R., Ruch, S., & Henke, K. (2021). Larger capacity for unconscious versus conscious episodic memory. Current Biology, 31(16), 3551-3563.e9. https://doi.org/10.1016/j.cub.2021.06.012

      Shohamy, D., & Turk-Browne, N. B. (2013). Mechanisms for widespread hippocampal involvement in cognition. Journal of Experimental Psychology: General, 142(4), 1159–1170. https://doi.org/10.1037/a0034461

      Squire, L. R., & Dede, A. J. O. (2015). Conscious and Unconscious Memory Systems. Cold Spring Harbor Perspectives in Biology, 7(3), a021667. https://doi.org/10.1101/cshperspect.a021667

      Stickgold, R., Zadra, A., & Haar, A. J. H. (2021). Advertising in Dreams is Coming: Now What? Dream Engineering. https://dxe.pubpub.org/pub/dreamadvertising/release/1

      Tulving, E. (2002). Episodic Memory: From Mind to Brain. Annual Review of Psychology, 53(1), 1–25. https://doi.org/10.1146/annurev.psych.53.100901.135114

      Wilhelm, I., Diekelmann, S., Molzow, I., Ayoub, A., Mölle, M., & Born, J. (2011). Sleep Selectively Enhances Memory Expected to Be of Future Relevance. Journal of Neuroscience, 31(5), 1563–1569. https://doi.org/10.1523/JNEUROSCI.3575-10.2011

      Wunderlin, M., Koenig, T., Zeller, C., Nissen, C., & Züst, M. A. (2022). Automatized online prediction of slow-wave peaks during non-rapid eye movement sleep in young and old individuals: Why we should not always rely on amplitude thresholds. Journal of Sleep Research, 31(6), e13584. https://doi.org/10.1111/jsr.13584

      Züst, M. A., Ruch, S., Wiest, R., & Henke, K. (2019). Implicit Vocabulary Learning during Sleep Is Bound to Slow-Wave Peaks. Current Biology, 29(4), 541-553.e7. https://doi.org/10.1016/j.cub.2018.12.038

    1. Author Response

      We thank both reviewers for the positive evaluation of our work and suggestions on how to improve it.

      We agree with Reviewer #1 that reporting uncertainties will both clarify and strengthen our arguments. Where applicable, uncertainties will be added in a revised version.

      To Reviewer #2’s suggestion of including free energy calculations to estimate the free energies of hydrogen bond and hydrophobic interactions, the current free energy methods are capable of given accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet.

      Again, we thank the reviewers for their assessment and suggestions. We will update the manuscript as we have outlined above.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public review

      Reviewer 1

      Zhang et al. tackle the important topic of primate-specific structural features of the brain and the link with functional specialization. The authors explore and compare gyral peaks of the human and macaque cortex through non-invasive neuroimagery, using convincing techniques that have been previously validated elsewhere. They show that nearly 60% of the macaque peaks are shared with humans, and use a multi-modal parcellation scheme to describe the spatial distribution of shared and unique gyral peaks in both species.

      We thank the reviewer for his/her summary and affirmation of our work.

      The claim is made that shared peaks are mainly located in lower-order cortical areas whereas unique peaks are located in higher-order regions, however, no systematic comparison is made. The authors then show that shared peaks are more consistently found across individuals than unique peaks, and show a positive but small and non-significant correlation between cross-individual counts of the shared peaks of the human and the macaque i.e. the authors show a non-significant trend for shared peaks that are more consistently found across humans to be those that are also more found across macaques.

      Answer: We appreciate the reviewer for raising questions about our work. In order to provide a more systematic comparison for the conclusion that ‘shared peaks are mainly located in lowerorder cortical areas whereas unique peaks are located in higher-order regions’, we have conducted two additional experiments. Following the reviewers’ suggestions, we conducted a statistical analysis of the ratio of shared and unique peaks within different brain networks (as depicted in Figure 2 (b)), and also presented the specific distribution quantities of the two types of peaks in both low- and high-order brain networks (as detailed in the corresponding Table 1). Through these three experiments, we have obtained a more systematic and comprehensive conclusion that ‘shared peaks are more distributed in lower-order networks, while unique peaks are more in higher-order networks’.

      In order to identify if unique and shared peaks could be identified based on the structural features of the cortical regions containing them, the authors compared them with t-tests. A correction for multiple comparisons should be applied and t-values reported. Graph-theoretical measures were applied to functional connectivity datasets (resting-state fMRI) and compared between unique and shared peak regions for each species separately. Again the absence of multiple comparison correction and t-values make the results hard to interpret. The same comment applies to the analysis reporting that shared peaks are surrounded by a larger number of brain regions than unique peaks. Finally, the potentially extremely interesting results about differential human gene expression of shared and unique peaks regions are not systematically reported e.g. the 28 genes identified are not listed and the selection procedure of 7 genes is not fully reported.

      Answer: We appreciate the reviewer for their suggestions about the statistical analysis in our manuscript. Firstly, we applied False Discovery Rate (FDR) correction to all experiments involving multiple comparisons throughout the entire manuscript, and the corrected t-values are reported (Table 2-5 and A5-A6). Additionally, in response to the reviewers’ guidance regarding the gene analysis section, we provided a list of 28 genes (Table A7) selected by lasso, along with the t-values obtained from Welch’s t-test for the expression of the two type of peaks. The functions corresponding to the seven genes with final t-values below 0.05 are reported in Table 6.

      The paper is well written and the methods used for data processing are very compelling i.e. the peak cluster extraction pipeline and cross-species registration. However, the analysis and especially the reporting of statistics, as they stand now, constitutes the main weakness of the paper. Some aspects of the statistical analysis need to be clarified.

      Reviewer 2

      The authors compared the cortical folding of human brains with folding in macaque monkey brains to reveal shared and unique locations of gyral peaks. The shared gyral peaks were located in cortical regions that are functionally similar and less changed in humans from those in macaques, while the locations of unique peaks in humans are in regions that have changed or expanded functions. These findings are important in that they suggest where human brains have changed more than macaque brains in their subsequent evolution from a common ancestor. The massive analysis of comparative results provides evidence of where humans and macaques are similar or different in cortical markers, as well as noting some of the variations within each of the two primates.

      Answer: Gratitude to the reviewer for his/her summary and appreciation of our cross-species work.

      Strengths:

      The study includes massive detail.

      Weaknesses:

      The manuscript is too long and there is not enough focus on the main points.

      Answer: We appreciate the reviewer for pointing out the shortcomings in our manuscript. Firstly, considering the manuscript is too long, we have chosen to retain only the core experiments and relevant analyses in the main text. Relatively minor conclusions have been moved to the supplementary information, such as original Table 1 is now moved to the Supplementary Information as Table A1 (locations of all shared clusters). Additionally, some non-essential expressions in the original manuscript have been removed.

      Our experiments primarily revealed the existence of partially shared cortical landmarks, known as gyral peaks, in both humans and macaques. We found that these shared and unique peaks are mainly distributed across low- and high-order brain networks. To emphasize this main point, we added two experiments on top of the existing ones to provide a more systematic explanation of this conclusion. We conducted a statistical analysis of the ratio of shared and unique peaks within different brain networks (as depicted in Figure 2 (b)), and also presented the specific distribution quantities of the two types of peaks in both low- and high-order brain networks (as detailed in the corresponding Table 1). By combining the results of these two experiments with the original manuscript’s statistical findings on the proportions of the two type of peaks in different brain networks, the conclusion that ‘shared and unique peaks are predominantly located in low-order and high-order brain networks’ becomes more prominent.

      A brief listing of previous views on why fissures form and what factors are important would be helpful.

      Answer: In response to this suggestion from the reviewer, we have incorporated some previous views on why fissures form and what factors are important into the ‘Introduction’ section.

      ‘Cortical folds are important features of primate brains. The primary driver of cortical folding is the differential growth between cortical and subcortical layers. During the gyrification process in the cortex, areas with high-density stiff axonal fiber bundles towards gyri. The brain’s folding pattern formed through a series of complex processes. The folding patterns in the brain, formed through a series of complex processes, are found to play a crucial role in various cognitive and behavioral processes, including perception, action, and cognition (Fornito et al. 2004; Cachia et al. 2018; Yang et al. 2019; Whittle et al. 2009).’

      Reviewer 1 (Recommendations For The Authors):

      (1) Figure 3b shows a non-significant trend for shared peaks that are more consistently found across humans to be those that are also more found across macaques. In the discussion, lines 218-219, the fact that the correlation is not significant should be reported more clearly.

      Answers: We thank the reviewer for this question. We revised the Line 218-219 (now Line 257-259) as follows: ‘2. Consistency: The inter-individual consistency of shared peaks within each species was greater than that of unique peaks. The consistency of shared peaks in the human and macaque brains exhibits a positive correlation (non-significant though).’

      (2) It is not fully clear how much shared peaks are mostly distributed in the higher-order cortex, especially in the macaque. It is reported in the results lines 132-133 that ‘In the macaque brain, shared peak cluster centers most distributed in the V2, DMN, and CON (Figure.2 (d)), while unique peak cluster centers most distributed in the DMN, Language (Lan), and Dorsal-attention (DAN)’ but not further discussed. Please develop this point in the discussion. Further, the results presented in Figures 2 and A1 are actually quite different and this shall be better described in the results. Given that shared and unique peaks can be found in the same region, this analysis would gain importance by applying a comparison test for the selection of regions where the most shared or unique peaks are found. The sentence lines 306-308 should be accordingly revised.

      It is hard to understand what the 0-3% corresponds to in Figures 2 and A1?

      Please also correct in both legends and in the text the labeling of panels and add in the legends a brief description of panel (c). In the legend of Figure 2, ‘shared peaks’ in the second sentence shall be replaced by ‘unique peaks’.

      Answers: We thank the reviewer for these questions and suggestions. Our responses to them are itemized as follows:

      A1: In general, to clarify the distribution of shared and unique peaks in the high-order and loworder networks, we divided 12 brain networks in Cole-Anticevic atlas into the low-order networks (visual 1 (V1), visual 2 (V2), auditory (Aud), somatomotor (SMN), posterior multimodal (PMN), ventral multimodal (VMN), and orbito-affective networks (OAN)) and higher-order networks (include cingulo-opercular (CON), dorsal attention (DAN), language (Lan), frontoparietal (FPN), default mode network (DMN)) based on previous research (Golesorkhi et al. 2022; Ito, Hearne, and Cole 2020). On this lower/higher -order division, we reported the number of shared and unique peaks in both species in Author response table 1. It is found that, whether in humans or macaques, shared peaks are more distributed in lower-order networks, while unique peaks are more in higher-order networks. This observation is particularly pronounced in humans.

      Author response table 1.

      The number of shared and unique peaks in lower- and higher-order brain networks of the two species. Lower-order networks include visual 1 (V1), visual 2 (V2), auditory (Aud), somatomotor (SMN), posterior multimodal (PMN), ventral multimodal (VMN), and orbito-affective networks (OAN), higher-order networks include cingulo-opercular (CON), dorsal attention (DAN), language (Lan), frontoparietal (FPN), default-mode network (DMN).

      In the main text, Figure 2 (referring to Author response figure 1 later in the text.) illustrates the proportions of shared and unique peaks across 12 brain networks in both species. In each pie chart, we have specifically highlighted the top three ranked brain regions. Although the pie chart also generally supports the above results, two brain networks deserve further discussion. They are DMN and CON, two higher-order networks that have higher ranks in terms of shared peak count (the second-ranked and the third-ranked on macaque shared peaks; the fourth-ranked and the fifth-ranked on human shared peaks).

      The cingulo-opercular network (CON) is a brain network associated with action, goal, arousal, and pain. However, a study found three newly discovered areas of the primary motor cortex that exhibit strong functional connectivity with the CON region, forming a novel network known as the somato-cognitive action network (SCAN) (Gordon et al. 2023). The SCAN integrates body control (motor and autonomic) and action planning, consistent with the findings that aspects of higher-level executive control might derive from movement coordination (Llinás 2002; Gordon et al. 2023). CON may be shared in the form of the SCAN network across these two species. This could explain in part the results in Author response figure 1 that shared peaks are more on CONs.

      Author response image 1.

      Pie chart shows the count of shared and unique peaks across different brain networks for both human and macaque. Right panel shows the Cole-Anticevic (CA) networks (Ji et al. 2019) on human surface as a reference.

      Default-mode network (DMN) is a ensemble of brain regions that are active in passive tasks, including the anterior and posterior cingulate cortex, medial and lateral parietal cortex, and medial prefrontal cortex (Buckner, Andrews-Hanna, and Schacter 2008). Although DMN is considered a higher-order brain network, numerous studies have provided evidence of its homologous presence in both humans and macaques. Many existing studies have confirmed the similarity between the DMN regions in humans and macaques from various perspectives, including cytoarchitectonic (Parvizi et al. 2006; Buckner, Andrews-Hanna, and Schacter 2008; Caminiti et al. 2010) and anatomical tracing (Vincent et al. 2007). These studies all support the notion that some elements of the DMN may be conserved across primate species (Mantini et al. 2011). In general, the partial sharing of DMN between humans and macaques may be attributed to the higher occurrence of shared peaks within the DMN.

      These results have been added to Table 2 along with corresponding text and discussion section.

      A2: The difference between the results of Figure 2 and Figure A1 (now Figure A2) is whether the peak count is normalized by cortical area, which hugely varies across networks. For example, among the 12 brain networks, the three networks with the largest surface areas are the DMN, SMN and CON, and the three networks with the smallest area are OAN, PMN and VMN. The area difference between networks can be as large as 18-fold. Therefore, it is not difficult to find that, although the DMN ranks high in both shared and unique peak counts during statistical analysis (Figure 2 (a)), it is relatively small in Figure A2 after area normalization. In contrast, VMN ranks lower in peak count statistics but exhibits a substantial proportion after area normalization (For example, 38% of macaque shared peaks are distributed in the VMN region, but there are actually only four peaks). However, the two pie charts deliver the same message that there are more shared peaks in lower-order networks, while unique peaks are more in higher-order networks (except for macaques, where shared peaks are also distributed significantly in DMN and CON).

      Following the suggestion from the reviewer, we adopted a new approach to present the ratio between shared peak count and unique peak count for each network (see Author response figure 2), such that the networks where the most shared or unique peaks are found can be easily highlighted. To mitigate potential imbalances in proportions caused by differences in the absolute numbers of each category (shared or unique) of peak, the proportions of peaks within their respective categories were utilized in the calculations. In Author response figure 2, the pink and green color bins represent ratios of shared and unique peaks, respectively. The dark blue dashed line represents the 50% reference line. In general, from left to right in the figure, the ratio of shared peaks decreases gradually while the ratio of unique peaks increases, suggesting that shared peaks are more (>0.5, above the dashed line) on lower-order networks (orange font), while unique peaks are generally more on higher-order networks (blue font). In specific, in human brains, the networks with a higher abundance of shared peaks are Aud, VMN, V1, SMN, and V2; whereas in macaques, they are CON, VMN, V1, V2, FPN, and SMN. Again, in the human brains, the disparity between shared and unique peaks tends to be more significant (further away from the reference line), for both lower-order and higher-order networks, respectively. In contrast, in the macaque brains, the disparity between shared and unique peaks is less significant (closer to the reference line). The ratio of shared and unique peaks is around 0.5 for 6 out of all 10 networks (including both lower and higher-order ones).

      Author response image 2.

      The ratio of shared and unique peaks in each brain network in the Cole-Anticevic (CA) atlas. The pink and green color bins represent ratios of shared and unique peaks, respectively. The dark blue dashed line represents the 50% reference line. For each brain region, the sum of the ratios of shared and unique peaks is equal to 1.

      Based on these analyses, the sentence lines 306-308 (now Line 368-370) has been revised as follows: ‘In the human brain, the more shared peaks (about 65%) are located in lower-order brain regions, while unique peaks are mainly (about 74%) located in higher-order regions. However, this trend is relatively less pronounced in the macaque brain.’

      These results have been added to Figure 2 (b) along with corresponding text and discussion section.

      A3: In response to the third suggestion from the reviewer, we have clearly labeled the brain region names corresponding to 0% to 3% in Figure 2 (now Figure 2 (a)) and Figure A1 (now Figure A2).

      Author response image 3.

      Pie chart shows the count of shared and unique peaks across different brain networks for both human and macaque. Right panel shows the Cole-Anticevic (CA) networks (Ji et al. 2019) on human surface as a reference.

      A4: Finally, we would like to express our gratitude to the reviewer for pointing out our mistakes.

      We have made improvements to Figure 2 and revised the figure captions accordingly.

      (3) The conclusions regarding the spatial relationship between peaks and functional regions shall be revised (Lines 187-188, 228-229, and 329-330). In the macaque, the results are opposite in the two atlases used. Further, in the human, it is not clear how multiple comparison corrections will impact statistics and some atlases show opposite results, although conclusions hold true in the majority of human atlases.

      Answers: We thank the reviewer very much for this suggestion. We have added the results of the Cole-Anticevic atlas for macaques in the main text, which also has the observation that shared>unique (Author response table 2, corresponds to Table 5 in main text), namely, there are more diverse brain regions around shared peaks than around unique peaks. Therefore, out of the commonly used three macaque atlases, two (Markov91 and Cole-Anticevic) conform to this observation, while BA05 does not. We utilized false discovery rate (FDR) correction for multiple comparisons, and the corrected p-values are reported in Tables (in the revised main text and are shown below). Results on atlas with multiple resolutions are reported in Author response table 4) (Table A6 in the Supplementary Information). The observation that more diverse brain regions around shared peaks than around unique peaks, holds for human atlases in Author response table 3) (Table 4 in main text), where the atlas resolutions ranges from 7 parcels to 300 parcels, demonstrating the robustness of the conclusion. It is noted that the observation is not consistent on atlases with relatively lower resolutions (e.g., BA05 for macaque, n=30 and Yeo2011 for human, n=7) or, in particular, higher resolutions (e.g., Schaefer-500, and Vosdewael-400, n>300). This inconsistency could be reasonable since the resolution of the parcellation itself will largely determines the chance of a cortical region appear in a peak’s neighborhood, if the parcellation is too coarse or too fine. For example, if n=1 (the entire cortex is the only one region) or n=30k (each vertex is a region), each peak will has the same number of neighboring regions for these two extreme cases (one brain region for each peak for n=1; around 30 vertices for each peak for n=30k).

      In conclusion, we observed that there are more diverse brain regions around shared peaks than around unique peaks for multiple brain atlases with a median parcellation resolution. These results have been added to Tables 4, 5, and A6 along with corresponding text and discussion section.

      Author response table 2.

      The mean values (±SD) of brain regions that appeared within a 3-ring neighborhood for shared and unique peaks in 3 common macaque atlases. For both Markov91 and Cole-Anticevic atlas, the shared peaks has more variety of functional regions around it than the unique peaks. But for the altas BA05, the conclusion was reversed. The bold font represent the larger values between the shared peak and unique peaks. All p<0.001, after false discovery rate (FDR) corrected.

      (4) For Tables 2-4, A4, and Figure 3a, please indicate in all the legends if values correspond to Mean plus minus Standard Deviation, report t-value, and n in the legend or in the text.

      Answers: We thank the reviewer very much for this suggestion. We added the ‘mean (±SD)’ in the notes of Tables 2-4, A4 (now A6), and Figure 3 (a). All the t and n values of t-test are reported in tables or in the main text.

      (5) Please create a statistical section in the Methods to describe more precisely the tests used e.g. for t-tests, if datasets follow a normal distribution with unknown variance. In the case of multiple comparisons like in e.g. Table 2-4, A4, please report what multiple comparisons correction was used to adjust the significance level.

      Author response table 3.

      The mean values (±SD) of brain regions that appeared within a 3-ring neighborhood for shared and unique peaks in 10 common human atlases. All the shared peaks in the table have a greater number of neighboring brain regions compared to the unique peaks. All p<0.001, false discovery rate (FDR) corrected.

      Author response table 4.

      The mean values (±SD) of brain regions where shared and unique peaks appeared within a 3-ring neighborhood in 21 common human atlases. The p-values were corrected by FDR.

      Answers: Thanks for the reviewer’s suggestion, we added a ‘Statistic Analysis’ section in the ‘Materials and Methods’ part:

      ‘All variables used in the two-samples t-test follow a normal distribution check and all p-values were corrected for multiple comparisons using the false discovery rate (FDR) method. Moreover, in order to identify differently expressed genes between shared and unique peaks, we employed the Welch’s t-test, given the unequal sample sizes for shared and unique peaks. For all tests, a p-value <0.05 was considered significant (FDR corrected).’

      For the experiments of multiple comparisons such as Table 2-4, A4 (now A6), etc., we have added explanations in the main text, multiple comparisons correction has been corrected by false discovery rate (FDR), p-value<0.05 is considered significant.

      (6) It would be of great interest to provide the full list of the 28 genes that significantly contributed to the classification of shared and unique peaks. Please provide a description of the Welch’s t-test results. From the 7 genes selected, only two are discussed. Could the authors please describe briefly the function of the other genes? Although we understand that they are not associated with neuronal activity and brain function.

      Answers: We thank the reviewer for these suggestions. We have provided a complete list of 28 genes selected by LASSO in the Author response table 5. Additionally, Welch’s t-test was employed to calculate p-values for the expression differences of each gene in shared and unique peak clusters, and the results are also reported in the Author response table 5.

      Author response table 5.

      The 28 genes selected by LASSO and their corresponding p-values from Welch’s t-test.

      Seven genes showed significant differential expression between shared and unique peaks in Welch’s t-test. These genes were PECAM1, TLR1, SNAP29, DHRS4, BHMT2, PLBD1, KCNH5. Brief descriptions of their functions are listed in Author response table 6. All gene function descriptions were derived from the NCBI website (https://www.ncbi.nlm.nih.gov/).

      These results have been added to Tables 6 and A7 along with corresponding text.

      (6) For comparison, could the authors provide a supplementary figure of shared peak clusters like in Figure 1b but displayed on the surface of the macaque brain template?

      Answers: We thank the reviewer very much for this suggestion and we have incorporated a display of shared peak clusters on the macaque brain template surface (Author response figure 4, corresponds to Figure A1 of Supplementary Information.)

      (7) Could the author develop or rephrase the sentence lines 69-72 which remains unclear?

      Answers: We appreciate the reviewer’s feedback and have revised this sentence to ensure clarity. The sentences from line 69 to 72 have been revised to ‘In the study of macaques, it has been observed that the peak consistently present across individuals is located on more curved gyri (S. Zhang, Chavoshnejad, et al. 2022). Similar conclusions have been drawn in human brain research (S. Zhang, T. Zhang, et al. 2023).’ Now, this sentence corresponds to lines 74-77 in the main text.

      (8) Line 99: please indicate which section.

      Author response table 6.

      Seven genes were selected using LASSO that showed significant differential expression in shared and unique peaks.

      Answers: We thank the reviewer very much for this suggestion and we revised this sentence to ‘The definition of peaks and the method for extracting peak clusters within each species are described in the Materials and Methods section’.

      (9) In Figure 3b, please report R2 and p-value. A semi-log might be more appropriate given the overdispersion of Human Peak Counts.

      Answers: We thank the reviewer very much for this suggestion. Linear regression analysis was conducted on the average counts of all corresponding shared peak clusters of human and macaque. The horizontal and vertical axes of the Author response figure 5 (b) represent the average count of shared peaks in the macaque and human brains, respectively. The Pearson correlation coefficient (PCC) of the interspecies consistency of the left and right brain is 0.20 and 0.26 (p>0.05 for both), respectively. The result of linear regression shows that there is a positive correlation in the inter-individual consistency of shared peaks between macaque and human brains, but it is not statistically significant (with R2 for the left and right brain are 0.07 and 0.01, respectively).

      Author response image 4.

      Shared peak clusters of macaque, shows on macaque brain template.

      The goodness of fit (R2), pearson correlation coefficient (PCC), and their respective p-values were indicated in Author response figure 5 (b). To avoid overdispersion, the peak count of the human brain is displayed in a semi-log format.

      The updated Figure and results are presented in Figure 3 of the main text.

      (10) Line 177: please indicate where in the Supplementary Information.

      Answers: Thank you for the reviewer’s reminder. We have incorporated the results of the human brain structural connectivity matrix into Table A5 in the Supplementary Information and provided corresponding indications in the main text.

      (11) Line 226: please correct ‘(except for betweeness [and efficiency] of the’.

      Answers: We thank the reviewer very much for this suggestion and we added ‘and efficiency’ in original Line 173 and 226 (now Line 206 and 267) after ‘betweeness’.

      (12) The gene expression dataset used is from the Allen Human Brain Atlas (AHBA). Reference to Hawrylycz et al., 2012 Nature. 2012 Sep 20;489(7416):391-399. doi: 10.1038/nature11405 shall be made and abbreviation defined at first use in the text.

      Answers: We added the full name ‘Allen Human Brain Atlas’ when AHBA is first mentioned, along with the reference suggested by the reviewer.

      Author response image 5.

      (a) Mean peak count (±SD) covered by shared and unique peak clusters in two species. ***indicates p<0.001. The t-values for the t-tests in humans and macaques are 4.74 and 2.67, respectively. (b) Linear regression results of the consistency of peak clusters shared between macaque and human brains. The pink and blue colors represent the left and right hemispheres, respectively. The results of the linear regression are depicted in the figure. While there was a positive correlation observed in the consistency of gyral peaks between macaque and human, the obtained p-value for the fitted results exceeded the significance threshold of 0.05.

      (13) Line 17: remove ‘are’.

      Answers: We thank the reviewer very much for this suggestion and we removed ‘are’ in Line 17 (now Line 18).

      (14) Line 201: remove ‘is used’.

      Answers: We thank the reviewer very much for this suggestion and we removed ‘is used’ in Line 201 (now Line 237).

      References

      Buckner, Randy L, Jessica R Andrews-Hanna, and Daniel L Schacter (2008). “The brain’s default network: anatomy, function, and relevance to disease”. In: Annals of the new York Academy of Sciences 1124.1, pp. 1–38.

      Cachia, Arnaud et al. (2018). “How interindividual differences in brain anatomy shape reading accuracy”. In: Brain Structure and Function 223, pp. 701–712.

      Caminiti, Roberto et al. (2010). “Understanding the parietal lobe syndrome from a neurophysiological and evolutionary perspective”. In: European Journal of Neuroscience 31.12, pp. 2320–2340.

      Fornito, Alexander et al. (2004). “Individual differences in anterior cingulate/paracingulate morphology are related to executive functions in healthy males”. In: Cerebral cortex 14.4, pp. 424–431.

      Golesorkhi, Mehrshad et al. (2022). “From temporal to spatial topography: hierarchy of neural dynamics in higher-and lower-order networks shapes their complexity”. In: Cerebral Cortex 32.24, pp. 5637–5653.

      Gordon, Evan M et al. (2023). “A somato-cognitive action network alternates with effector regions in motor cortex”. In: Nature, pp. 1–9.

      Ito, Takuya, Luke J Hearne, and Michael W Cole (2020). “A cortical hierarchy of localized and distributed processes revealed via dissociation of task activations, connectivity changes, and intrinsic timescales”. In: NeuroImage 221, p. 117141.

      Ji, Jie Lisa et al. (2019). “Mapping the human brain’s cortical-subcortical functional network organization”. In: Neuroimage 185, pp. 35–57.

      Llinás, Rodolfo R (2002). I of the vortex: From neurons to self. MIT press.

      Mantini, Dante et al. (2011). “Default mode f brain function in monkeys”. In: Journal of Neuroscience 31.36, pp. 12954–12962.

      Parvizi, Josef et al. (2006). “Neural connections of the posteromedial cortex in the macaque”. In:Proceedings of the National Academy of Sciences 103.5, pp. 1563–1568.

      Vincent, Justin L et al. (2007). “Intrinsic functional architecture in the anaesthetized monkey brain”.In: Nature 447.7140, pp. 83–86.

      Whittle, Sarah et al. (2009). “Variations in cortical folding patterns are related to individual differences in temperament”. In: Psychiatry Research: Neuroimaging 172.1, pp. 68–74.

      Yang, Shimin et al. (2019). “Temporal variability of cortical gyral-sulcal resting state functional activity correlates with fluid intelligence”. In: Frontiers in neural circuits 13, p. 36.

      Zhang, Songyao, Poorya Chavoshnejad, et al. (2022). “Gyral peaks: Novel gyral landmarks in developing macaque brains”. In: Human Brain Mapping 43.15, pp. 4540–4555.

      Zhang, Songyao, Tuo Zhang, et al. (2023). “Gyral peaks and patterns in human brains”. In: Cerebral Cortex.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by Ghafari et al. addresses a question that is highly relevant for the field of attention as it connects structural differences in subcortical regions with oscillatory modulations during attention allocation. Using a combination of magnetoencephalography (MEG) and magnetic resonance imaging (MRI) data in human subjects, inter-individual differences in the lateralization of alpha oscillations are explained by asymmetry of subcortical brain regions. The results are important, and the strength of the evidence is convincing. Yet, clarifying the rationale, reporting the data in full, a more comprehensive analysis, and a more detailed discussion of the implications will strengthen the manuscript further.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors re-analysed the data of a previous study in order to investigate the relation between asymmetries of subcortical brain structures and the hemispheric lateralization of alpha oscillations during visual spatial attention. The visual spatial attention task crossed the factors of target load and distractor salience, which made it possible to also test the specificity of the relation of subcortical asymmetries to lateralized alpha oscillations for specific attentional load conditions. Asymmetry of globus pallidus, caudate nucleus, and thalamus explained inter-individual differences in attentional alpha modulation in the left versus right hemisphere. Multivariate regression analysis revealed that the explanatory potential of these regions' asymmetries varies as a function of target load and distractor salience.

      Strengths:

      The analysis pipeline is straightforward and follows in large parts what the authors have previously used in Mazzetti et al (2019). The authors use an interesting study design, which allows for testing of effects specific to different dimensions of attentional load (target load/distractor salience). The results are largely convincing and in part replicate what has previously been shown. The article is well-written and easy to follow.

      We thank the reviewer for their interest in our study.

      Weaknesses:

      While the article is interesting to read for researchers studying alpha oscillations in spatial attention, I am somewhat sceptical about whether this article is of high interest to a broader readership. Although I read the article with interest, the conceptual advance made here can be considered mostly incremental. As the authors describe, the present study's main advance is that it does not include reward associations (as in previous work) and includes different levels of attentional load. While these design features and the obtained results indeed improve our general understanding of how asymmetries of subcortical structures relate to lateralized alpha oscillations, the conceptual advance is somewhat limited.

      We thank the reviewer for their constructive comment. We’d like to highlight that this is the first study to show relationship between subcortical structures asymmetry with attention-modulated alpha oscillation that did not involve any reward-associations- which is the most studied role of basal ganglia. We also believe there is value is having a second study linking the asymmetry in volume of subcortical structures to the modulation of alpha oscillations as this surprising finding also have important clinical implications (see below). We edited the manuscript as below to explain the advances made in this study:

      Introduction (Line 112): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 301): “It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46).”

      While the analysis of the relation of individual subcortical structures to alpha lateralization in different attentional load conditions is interesting, I am not convinced that the present analysis is suited to draw strong conclusions about the subcortical regions' specificity. For example, the Thalamus (Fig. 5) shows a significant negative beta estimate only in one condition (low-load target, non-salient distractor) but not in the other conditions. However, the actual specificity of the relation of thalamus asymmetry to lateralized alpha oscillations would require that the beta estimate for this one condition is significantly higher than the beta estimates for the other three conditions, which has not been tested as far as I understand.

      We thank the reviewer for this constructive comment. We agree with the reviewer that we should compare the beta value amongst the conditions. We therefore determined to better harness the multivariate nature of our analysis. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and found the following which we have added to the manuscript:

      Results (Line 250): “To ascertain whether each predictor contributes to all conditions, we conducted statistical tests on the results of our MMR using the null hypothesis that a given regressor does not impact all dependent variables. We found that while, with marginal significancy, caudate nucleus can predict variability across all four of the task conditions (F(26,4) = 2.82, p-value = 0.046), the predictive relationships of thalamus (F(26,4) = 2.43, p-value = 0.073) with condition 1, and globus pallidus (F(26,4) = 2.29, p-value = 0.087) with conditions 2 and 3 hold only for these conditions. In sum, this demonstrates that when the task is easiest (condition 1), the thalamus is related to alpha modulation. When the task is most difficult (condition 4), the caudate nucleus relates to the alpha modulation, however, its contributions are substantial enough to predict outcomes across all conditions. For the conditions with medium difficulty (conditions 2 and 3) the globus pallidus is related to the alpha band modulation. “

      Method (Line 599): “To examine the specificity of each regressor for lateralized alpha in each condition, we statistically assessed the results of the MMR against the null hypothesis that a particular predictor does not contribute to all dependent variables, employing a MANOVA test in RStudio (version 2022.02.2) (80).”

      Discussion (Line 337): “Thalamus, Globus Pallidus, and Caudate nucleus play varying roles across different load conditions.”

      Discussion (Line 361): “Although these findings highlight the varying contributions of different regions, they do not imply a lack of evidence for correlations between these subcortical structures and other load conditions.”

      Discussion (Line 379): “Additionally, we refrained from directly comparing the contributions of subcortical structures to different conditions due to low statistical power. […] In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments.”

      Reviewer #3 (Public Review):

      Summary:

      In this study, Ghafari et al. explored the correlation between hemispheric asymmetry in the volume of various subcortical regions and lateralization of posterior alpha-band oscillations in a spatial attention task with varying cognitive demands. To this end, they combined structural MRI and task MEG to investigate the relationship between hemispheric differences in the volume of basal ganglia, thalamus, hippocampus, and amygdala and hemisphere-specific modulation of alpha-band power. The authors report that differences in the thalamus, caudate nucleus, and globus pallidus volume are linked to the attention-related changes in alpha band oscillations with differential correlations for different regions in different conditions of the design (depending on the salience of the distractor and/or the target).

      Strengths:

      The manuscript contributes to filling an important gap in current research on attention allocation which commonly focuses exclusively on cortical structures. Because it is not possible to reliably measure subcortical activity with non-invasive electrophysiological methods, they correlate volumetric measurements of the relevant subcortical regions with cortical measurements of alpha band power. Specifically, they build on their own previous finding showing a correlation between hemispheric asymmetry of basal ganglia volumes and alpha lateralization by assessing a task without an explicit reward component. Furthermore, the authors use differences in saliency and perceptual load to disentangle the individual contributions of the subcortical regions.

      We appreciate the reviewer’s interest in our study.

      Weaknesses:

      The theoretical bases of several aspects of the design and analyses remain unclear. Specifically, we missed statements in the introduction about why it is reasonable, from a theoretical perspective, to expect:

      (i) a link between volumetric measurements and task activity;

      We thank the reviewer for this constructive feedback. We have now addressed this concern in the revised manuscript.

      Discussion (Line 293): “It has been demonstrated that extensive navigation experience enlarges the size of right hippocampus (40). Furthermore, in terms of neurological disorders, it is well established that shrinkage (atrophy) in specific regions is a predictor of a number of neurological and psychiatric conditions including Parkinson’s disease, dementia, and Huntington’s disease. […] It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increase relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46). “

      (ii) a specific link with hemispheric asymmetry in subcortical structures (While focusing on hemispheric lateralization might circumvent the problem of differences in head size, it would be better to justify this focus theoretically, which requires for example a short review of evidence showing ipsilateral vs contralateral connections between the relevant subcortical and cortical structures);

      We thank the reviewer for this helpful comment that resulted in clarification of the manuscript. We addressed this issue in the revised manuscript; we also now have complemented the revised manuscript with papers directly investigating asymmetry of subcortical regions in relation to neurological disorders:

      Introduction (Line 102): “We utilized the hemispheric laterality of subcortical structures and alpha modulation to overcome issues related to individual variations in oscillatory power and head size.”

      Discussion (Line 314): “Employing hemispheric lateralization was motivated by the organizational characteristic of structural asymmetry in healthy brain (47). Additionally, considering the effects of aging (48) and neurodegenerative disorders, such as Alzheimer's Disease (49), on brain symmetry influenced this approach. Furthermore, computing lateralization indices for individuals addresses the challenge of accommodating variations in both head size and the power of oscillatory activity.”

      Discussion (Line 374): “Furthermore, in this study, our emphasis has been on assessing the size of subcortical structures. Future investigations could explore subcortical white matter connectivities and hemispheric asymmetries. This approach has previously been conducted on superior longitudinal fasciculus (SLF) (61,62) and holds potential for examining cortico-subcortical connectivity in the context of oscillatory asymmetries.”

      (iii) effects not only in basal ganglia and thalamus, but also hippocampus and amygdala (a justification of selection of all ROIs);

      We thank the reviewer for this comment. We assessed the hippocampus and amygdala because they are automatically segmented in the FIRST algorithm. As our analysis showed they did not show a relation to the modulation of alpha oscillations, these regions also provide a useful control for our approach. Therefore, we included all subcortical structures in the model and evaluated their predictive impact. This is now addressed in the revised manuscript.

      Method (Line 477): “FIRST is an automated model-based tool that runs a two-stage affine transformation to MNI152 space, to achieve a robust pre-alignment of thalamus, caudate nucleus, putamen, globus pallidus, hippocampus, amygdala, and nucleus accumbens based on individual’s T1-weighted MR images.”

      Method (Line 576): “The absence of a relationship between modulations of alpha oscillations and the hippocampus and amygdala was expected as these regions typically are not associated with the allocation of spatial attention and thus add validity to our approach. “

      (iv) effects that depend on distractor versus target salience (a rationale for the specific two-factor design is missing);

      We thank the reviewer for this comment that helped us clarify the manuscript. The two-factor design is to investigate how allocation of attentional resources specifically relates to mechanisms of excitability and suppression mechanism. For this reason, both the salience of the distractor (associated with suppression) and the perceptual load of the target (associated with excitability) had to be manipulated. We clarified the rationale in the revised version as below:

      Introduction (Line 96): “We analyzed MEG and structural data from a previous study (27), in which spatial cues guided participants to covertly attend to one stimulus (target) and ignore the other (distractor). To investigate the relationship between the allocation of attentional resources and mechanisms of neural excitability and suppression, the target load and the visual saliency of the distractor were manipulated using a noise mask. This load/salience manipulation resulted in four conditions that affect the attentional demands of target and distractor.”

      (v) effects in the absence of reward (why it is important to show that the effect seen previously in a task with reward is seen also in a task without reward);

      We thank the reviewer for this clarification comment. We addressed this question in introduction and discussion as below:

      Introduction (Line 107): “By examining their role in a task without explicit reward, we aim to elucidate the generalizability of the contributions of subcortical structures to spatial attention modulation. Such a finding would implicate a role for the basal ganglia in cognition beyond the well-studied realm of the estimation of choice values (33). Specifically, in a prior study (28), we observed that the contributions of the basal ganglia were most pronounced when the items in question were associated with a reward. Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 333): “This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations beyond reward valence and to the context of attention.”

      (vi) effects on rapid frequency tagging.

      We thank the reviewer for this constructive comment. We have now included this analysis and added the results to the revised manuscript.

      Results (Line 224): “It is worth noting that neither the behavioural nor the rapid invisible frequency tagging (RIFT) measures showed significant relationships with LVs and HLM() (Supplementary material, Figure 1 and Table 3).”

      Discussion (Line 396): “We did not find any association between the power of RIFT signal and the size asymmetry of subcortical structures. Since to Bayes factors were less than 0.1, we conclude that our RIFT null findings are robust, suggesting a dissociation between how alpha oscillations and neuronal excitability indexed by RIFT relate to subcortical structures.”

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).”

      Supplementary Materials (Line 839): “Figure 1. Lateralization volume of thalamus, caudate nucleus and globus pallidus in relation to hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) on the right and behavioural asymmetry on the left. A and E, The beta coefficients for the best model (having three regressors) associated with a generalized linear model (GLM) where lateralization volume (LV) values were defined as explanatory variables for HLM(RIFT) (A) and behavioural asymmetry (E). Error bars indicate standard errors of mean (SEM). B and F, Partial regression plot showing the association between LVTh and HLM(RIFT) (B, p-value = 0.59) and behavioural asymmetry (F, p-value = 0.38) while controlling for LVGP and LVCN. C and G, Partial regression plot showing the association between LVGP and HLM(RIFT) (C, p-value = 0.16) and behavioural asymmetry (G, p-value = 0.80) while controlling for LVTh and LVCN . D and H, Partial regression plot showing the association between LVCN and HLM(RIFT) (D, p-value = 0.53) and behavioural asymmetry (H, p-value = 0.74) while controlling for LVTh and LVGP. Negative (or positive) LVs indices denote greater left (or right) volume for a given substructure; similarly negative HLM(RIFT) values indicate stronger modulation of RIFT power in the left compared with the right hemisphere, and vice versa; positive behavioural asymmetry value shows higher accuracy when the target was on the right as compared with left, and vice versa for negative behavioural asymmetry values. The dotted curves in B, C, D, F, G, and H indicate 95% confidence bounds for the regression line fitted on the plot in red.

      Author response image 1.

      Second, the results are not fully reported. The model space and the results from the model comparison are omitted. Behavioral data and rapid frequency tagging results are not shown. Without having access to the data or the results of the analyses, the reader cannot evaluate whether the null effect corresponds to the absence of evidence or (as claimed in the discussion) evidence of absence.

      We thank the reviewer for this constructive suggestion. In the revised manuscript, we incorporated the model space, model comparisons, BIC values from the models, behavioral and rapid frequency tagging analysis methods, and their respective results. Additionally, we computed Bayes factors for our null findings to enhance the interpretability of our results.

      Results (Line 199): “This model predicted the HLM(α) values significantly in the GLM (F3,29 = 7.4824, p = 0.0007, adjusted R2 = 0.376) as compared with an intercept-only null model (Figure 4A).”

      Although, the beta estimate of LVGP only showed a positive trend, removing it from the regression resulted in worse models (AIC and BIC tables in supplementary material).

      Supplementary materials (Line 827): “Table 1. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for all possible combinations of regressors (Lateralized Volume of subcortical structures). The selected model, with lowest AIC, is marked in green.

      Author response table 1.

      Author response table 2.

      Author response table 3.

      Bayes factors for correlation between hemispheric laterality of subcortical structures with hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) and with behavioural asymmetry (BA). The Pearson correlation between each subcortical structure with HLM(RIFT) and behavioural asymmetry was calculated. The likelihood of the data under the alternative hypothesis (the evidence of correlation) were subsequently compared to the likelihood under null hypothesis (absence of correlation), given the data. As it is demonstrated in the table, all Bayes factors were below or very close to 1 indicating evidence for the null hypothesis.

      For the results of frequency tagging signal, we have now included this analysis and added the results to the revised manuscript. We refer the reviewer to our response to the weakness (vi) from reviewer #3.

      Third, it remains unclear whether the MMS is the best approach to analyzing effects as a function of target and distractor salience. To address the question of whether the effects of subcortical volumes on alpha lateralization vary with task demands (which we assume is the primary research question of interest, given the factorial design), we would like to evaluate some sort of omnibus interaction effect, e.g., by having target and distractor saliency interact with the subcortical volume factors to predict alpha lateralization. Without such analyses, the results are very hard to interpret. What are the implications of finding the differential effects of the different volumes for the different task conditions without directly assessing the effect of the task manipulation? Moreover, the report would benefit from a further breakdown of the effects into simple effects on unattended and attended alpha, to evaluate whether effects as a function of distractor (vs target) salience are indeed accompanied by effects on unattended (vs attended) alpha.

      The reviewer is correct that we did not directly compare between task conditions when we assessed the predictive relationship between basal ganglia lateralization and alpha lateralization. We opted for the multivariate regression approach as this allowed us to simultaneously model the predictive relationship between our continuous predictors and HLM alpha in each condition, allowing us to be most efficient with our level of statistical power (N=33). Indeed, directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). This approach would be underpowered given our sample size, and the ensuing results are likely to be unreliable.

      However, we statistically analysed our regression results. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and reported the findings in response to weakness two from reviewer #1.

      Discussion (Line 384): “In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments. “

      The fourth concern is that the discussion section is not quite ready to help the reader appreciate the implications of key aspects of the findings. What are the implications for our understanding of the roles of different subcortical structures in the various psychological component processes of spatial attention? Why does the volumetric asymmetry of different subcortical structures have diametrically opposite effects on alpha lateralization? Instead, the discussion section highlights that the different subcortical structures are connected in circuits: "Globus pallidus also has wide projections to the thalamus and can thereby impact the dorsal attentional networks by modulating prefrontal activities." If this is true, then why does the effect of the GP dissociate from that of the thalamus? Also, what is it about the current behavioural paradigm that makes the behavioral readout insensitive to variation in subcortical volume (or alpha lateralization?)?

      We thank the reviewer for this feedback. These are indeed all good points, and we hope that our findings will inspire further research to address these issues. In the revised manuscript we now write:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained but the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (57).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] . It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      Discussion (Line 388): “Moreover, our failure to identify a relationship between the lateralized volume of subcortical structures and behavioural measures should be addressed in studies that are better designed to capture performance asymmetries (63). Individual preferences toward one hemifield, which were not addressed in the current study design, could potentially strengthen the power to detect correlations between structural variations in the subcortical structures and behavioural measures.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comment:

      Between-subject correlation/regression analyses always rely on the assumption that the underlying dependent measures are reliable. While the reliability of asymmetries of subcortical structures can be assumed, the reliability of lateralized alpha oscillations during spatial attention can be questioned. It would be helpful if the authors could test the reliability of alpha lateralization, for instance by calculating HLM(a) in the first and second half of the experiment and correlating the resulting HLM(a) values (split-half reliability).

      We appreciate the reviewer for their insightful comment. Acknowledging that the between-subject regression relies on the reliability of alpha lateralization. Nonetheless, a previous study has demonstrated consistent results regarding HLM(α). We have further elaborated on these aspects in the discussion section:

      Discussion (Line 328): “Furthermore, our regression analysis outcomes align with the findings of Mazzetti et al. (28) underscoring the significant predictive influence exerted by the lateralized volume of globus pallidus on the modulation of hemispheric lateralization in alpha oscillations during spatial attention tasks. This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations within the context of attention.”

      Reviewer #3 (Recommendations For The Authors):

      We recommend that a revised version of the manuscript

      • Clarifies the theoretical basis for the 6 key design & analysis choices that we have outlined above;

      We thank the reviewer for their precision. We addressed the concerns outlined above in the previous section.

      • Also clarifies the task description (perhaps referring to target and distractor salience instead of target load versus distractor salience might help);

      Thank you for this constructive comment. We used the terms ‘load’ for target and ‘salience’ for distractor because the noise manipulation of the faces reduces the salience of the image which results in distractors being less distractive (easier) but targets being more perceptually loaded (harder). The explanation of these terms is made clear in the revised manuscript.

      Method (Line 447): “Over trials, the perceptual load of targets was manipulated using a noise mask; noisy targets are harder to detect than clear targets and therefore incur greater perceptual load in their detection. The saliency of distractor stimuli was also manipulated using a noise mask; noisy distractor stimuli are less salient than clear distractors and therefore less disruptive to performance on the detection task. The noise mask was created by randomly swapping 50% of the stimulus pixels (Figure 1B). This manipulation resulted in four target-load/distractor-saliency conditions: (1) target: low load, distractor: low saliency (i.e., clear target, noisy distractor), (2) target: high load, distractor: low saliency (i.e., noisy target, noisy distractor), (3) target: low load, distractor: high saliency (i.e., clear target, clear distractor), (4) target: high load, distractor: high saliency (i.e., noisy target, clear distractor) (Figure 1B and C).”

      • Fully reports all the data, including those of the model comparisons, the behavioural results, and the rapid frequency tagging results;

      We thank the reviewer for this constructive comment. We refer the reviewer to our response to second comment and comment (vi) from reviewer #3.

      • Reports interaction effects to directly test the modulating role of task demands in the link between volume and alpha, and break down the alpha lateralization indices into their simple effects on the ipsilateral and contralateral hemispheres;

      task demands have been addressed in response to in response to weakness two from reviewer #1.

      Regarding the second part of the comment, in our study, to compare the lateralized modulation of alpha oscillations between the right and left hemispheres, we computed hemispheric lateralization modulation. This involved dividing trials into attention right and attention left. Subsequently, we calculated the lateralization index separately for sensors on the right and left. Specifically, this entailed computing ipsilateral – contralateral for sensors on the right and contralateral – ipsilateral for sensors on the left side of the brain. We addressed this concern in methods section as below:

      Method (Line 537): “As MI(α) consistently represents power of alpha in attention right versus attention left conditions, it entails the comparison between ipsilateral and contralateral alpha modulation power for sensors located on the right side of the head. The same comparison applies inversely for sensors situated on the left side of the brain.”

      • Clarifies in the discussion section the specific implications of the results for our understanding of the link between distinct subcortical structures and distinct component processes of spatial attention.

      We thank the reviewer for their constructive comment. This point is addressed in response to the fourth concern of reviewer #3.

      More detailed specific recommendations are provided below:

      • Line 40ff: In this paragraph, the theoretical framework concerning the function of the subcortical regions of interest is described. Here, the authors jump back and forth between the role of the basal ganglia and the role of the thalamus. For clarity, we would advise to describe the functions of these two structures one after the other. And include a justification for assessing the hippocampus and the amygdala.

      We appreciate the reviewer’s preciseness in this comment. We put the description of these structures one after the other in the revised manuscript as below:

      Introduction (Line 44): “For instance, it has been shown that the pulvinar plays an important role in the modulation of neocortical alpha oscillations associated with the allocation of attention (9). Studies in rats and non-human primates have shown that both the thalamus and superior colliculus, are involved in the control of spatial attention by contributing to the regulation of neocortical activity (9-11). Notably, when the largest nucleus of the thalamus, the pulvinar, was inactivated after muscimol infusion, the monkey’s ability to detect colour changes in attended stimuli was lowered. This behavioral deficit occurred when the target was in the receptive field of V4 neurons that were connected to lesioned pulvinar (12). The basal ganglia play a role in different aspects of cognitive control, encompassing attention (13,14), behavioural output (15), and conscious perception (16). Moreover, the basal ganglia contribute to visuospatial attention by linking with cortical regions like the prefrontal cortex via the thalamus.”

      Justification for assessing the hippocampus and the amygdala has been addressed in response to weakness (iii) from reviewer #3.

      • The authors mention they defined symmetric clusters of 5 sensors in each hemisphere that showed the highest modulation, but it is not clear how this number of sensors was determined a priori.

      We thank the reviewer for their comment. We edited the revised manuscript as below:

      Method (Line 536): “Ten sensors were selected to ensure sufficient coverage of the region exhibiting alpha modulation as judged from prior work (62).”

      • In line 141, the abbreviation HLM is first mentioned but the concept of "hemispheric lateralization modulation of alpha power" is only mentioned in the following section. For the ease of the reader, the abbreviation could be mentioned together with this concept at the beginning of this paragraph.

      We thank the reviewer for the attention. In the revised manuscript HLM() is now mentioned with its concept.

      Results (Line 153): “Next, we computed the hemispheric lateralization modulation of alpha power (HLM()) in each individual.”

      • In line 188 of the results section, it is mentioned that the table including the AIC values for model comparisons is in the supplementary material, however, we could not locate this table.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the supplementary materials to the end of the manuscript for convenience.

      • Figure 4 is missing the panel headers A, B, C, and D.

      We thank the reviewer for their precision. This figure is now fixed.

      Author response image 2.

      • In lines 205 and 206, behavioral and rapid frequency tagging analysis are mentioned. For the behavioral analysis, the method is described, but no results are provided. For the rapid frequency tagging, neither the methods nor the results are described. To evaluate the strength of this (non)-evidence, we would advise to elaborate on these analysis steps and report the results in the supplementary material.

      We thank the reviewer for this constructive comment. A brief explanation of the analysis method of rapid frequency tagging signal is added to the revised manuscript.

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).” For a more detailed answer, we refer the reviewer to the second comment from reviewer #3.

      • For the paragraph starting at line 209, we would recommend referring to Figure 1.

      We thank the reviewer for their suggestion. This paragraph is now referring to Figure 1.

      Results (Line 229): “To relate load and salience conditions of the task to the relationship between subcortical structures and the alpha activity, we combined low-load or high-load targets with high-saliency or low-saliency distractors to manipulate the perceptual load appointed to each trial (Method section, Figure 1). “

      • Figure 5 as well as the report of the beta weights in this section shows a difference in the direction of the effect for the thalamus compared to the globus pallidus and caudate nucleus which is not discussed in this section.

      We thank the reviewer for bringing this important point to our attention. We addressed this comment in the discussion section as below:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained by the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (54).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      • Comment 2 on line 80 is addressed in the paragraph following 264 by describing volumetric changes in basal ganglia in neurodegenerative disorders such as PD or Huntington's. Still, the link of how a decrease in volume in this region could be causally linked to changes in alpha-band power could be better supported.

      We thank the reviewer for their constructive feedback. We are here highlighting the significant correlation between subcortical structures and changes in attention modulated alpha oscillation. We added a few more references to the discussion supporting the relationship between size and function in relation to neurological disorders. We also edited the manuscript to make this point clearer as below:

      Introduction (Line 113): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, independent of any reward or value associations. “

      Discussion (Line 305): “Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (42). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (43). “

      • Related to the previous comment on behavioral and rapid frequency tagging results, these are difficult to evaluate without mention of the methods and/or results.

      We thank the reviewer for this comment. We refer the reviewer to our response to the second comment from reviewer #3.

      • The authors show differential effects of target load and distractor saliency; however, we missed the description of how these two variables differ conceptually as they are both described as contributing to task difficulty and it is not described why we would expect differential effects for these concepts (or in other words, how the authors explain the differential effects).

      We thank the reviewer for their comment. Directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). Give our sample size, this study is underpowered to directly compare alpha lateralisation in contralateral versus ipsilateral conditions. For a more detailed answer please refer to our response to weakness two from reviewer #1.

      • Line 364ff: Based on the description of the experimental design, it is not clear to us whether participants only had to report on the change in gaze for the stimulus in the cued hemifield.

      We thank the reviewer for this comment, which prompted us to clarify the experimental design as below:

      Method (Line 440): “Then followed a 1000 ms response interval where participants were asked to respond with their right or left index finger whether the gaze direction of the cued face shifted left or right.”

      • Line 47ff: As mentioned above, the AIC table is not included. Further, as it is mentioned that BIC values led to similar results (indicating that they are not identical), it would be valuable to report both AIC and BIC values.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the BIC values and attached the supplementary materials to the end of the manuscript for convenience.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      Songbirds provide a tractable system to examine neural mechanisms of sequence generation and variability. In past work, the projection from LMAN to RA (output of the anterior forebrain pathway) was shown to be critical for driving vocal variability during babbling, learning, and adulthood. LMAN is immediately adjacent to MMAN, which projects to HVC. MMAN is less well understood but, anatomically, appears to resemble LMAN in that it is the cortical output of a BG-thalamocortical loop. Because it projects to HVC, a major sequence generator for both syllable phonology and sequence, a strong prediction would be that MMAN drives sequence variability in the same way that LMAN drives phonological variability. This hypothesis predicts that MMAN lesions in a Bengalese finch would reduce sequence variability. Here, the authors test this hypothesis. They provide a surprising and important result that is well motivated and well analyzed: MMAN lesions increase sequence variability - this is exactly the opposite result from what would be predicted based on the functions of LMAN.

      Strengths:

      (1) A very important and surprising result shows that lesions of a frontal projection from MMAN to HVC, a sequence generator for birdsong, increase syntactical variability.

      (2) The choice of Bengalese finches, which have complex transition structures, to examine the mechanisms of sequence generation, enabled this important discovery.

      (3) The idea that frontal outputs of BG-cortical loops can generate vocal variability comes from lesions/inactivations of a parallel pathway from LMAN to RA. The difference between MMAN and LMAN functions is striking and important.

      Weaknesses:

      (1) If more attention was paid to how syllable phonology was (or was not) affected by MMAN lesions then the claims could be stronger around the specific effects on sequence.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates the neural substrates of syntax variation in Bengalese finch songs. Here, the authors tested the effects of bilateral lesions of mMAN, a brain area with inputs to HVC, a premotor area required for song production. Lesions in mMAN induce variability in syntactic elements of song specifically through increased transition entropy, variability within stereotyped song elements known as chunks, and increases in the repeat number of individual syllables. These results suggest that mMAN projections to HVC contribute to multiple aspects of song syntax in the Bengalese finch. Overall the experiments are well-designed, the analysis excellent, and the results are of high interest.

      Strengths:

      The study identifies a novel role for mMAN, the medial magnocellular nucleus of the anterior nidopallium, in the control of syntactic variation within adult Bengalese finch song. This is of particular interest as multiple studies previously demonstrated that mMAN lesions do not affect song structure in zebra finches. The study undertakes a thorough analysis to characterise specific aspects of variability within the song of lesioned animals. The conclusions are well supported by the data.

      Weaknesses:

      The study would benefit from additional mechanistic information. A more fine-grained or reversible manipulation, such as brain cooling, might allow additional insights into how mMAN influences specific aspects of syntax structure. Are repeat number increases and transition entropy resulting from shared mechanisms within mMAN, or perhaps arising from differential output to downstream pathways (i.e. projections to HVC)? Similarly, unilateral manipulations would allow the authors to further test the hypothesis that mMAN is involved in inter-hemispheric synchronization.

      We thank the reviewers and editor for their encouraging and helpful comments and suggestions. We have revised the previous submission with new analyses and discussion to address points raised by the reviewers.

      Following the suggestion of Reviewer 1 we have added an analysis of the effects of mMAN lesions on syllable phonology, using a variety of measures. We have included 3 new Figure Supplements that detail our analyses and elaborate on these points.

      We agree with Reviewer 2 that reversible and unilateral manipulations would be interesting and potentially enable additional insights into the mechanisms by which mMAN influences song sequencing, and we are planning to perform such experiments in future studies.

      We made additional minor changes throughout the manuscript to address other points raised by the reviewers, and we thank them again for their time and effort in providing constructive feedback to improve our study.

      A complete point by point detailing of these changes is included below, interspersed with the reviewer comments.

      Reviewer #1 (Recommenda1ons For The Authors):

      The opposite result from what would be predicted based on the functions of LMAN.

      Shoring up the paper's claims and ruling out alternative interpretations will require attention to the following issues:

      Major comments

      (1) Acoustic structure of syllables

      Line 294 & Sup. Figure 2, in some birds the syllable acoustic structures seem to be significantly different between the pre- and post-lesion condition, e.g. 'w' in Bird 1, 'g' in Bird 2, 'blm' in Bird 6. This observation seems to contradict the claim that acoustic structures are not affected by MMAN lesions.

      Related to the previous point, a more detailed analysis is needed to quantify the extent of acoustic changes caused by MMAN lesions. For example, do these pre- and post- lesion syllables form distinct clusters if embedded in a UMAP? Do more standard measures of syllable phonology (e.g. SAP similarity scores or feature distributions) show differences in pre- and post-MMAN lesion?

      We agree with the reviewer that there were individual syllables as illustrated in the average spectrograms of Figure 2 – figure supplement 1 that qualitatively differed between pre- and post-lesion recordings. We have followed the reviewer’s suggestion to quantify changes to syllable phonology using both similarity scores by Sound Analysis Pro (SAP) and a variety of identified acoustic features.

      In brief, these measures largely corroborate the conclusion that for most birds and syllables there was little or no difference in phonology between pre- and post-lesion songs, but that in a minority of cases syllables were altered noticeably (further detail below). In those cases where syllable phonology was altered, changes were not consistent across birds, and we cannot rule out off-target effects due to damage to structures or fibers of passage neighboring mMAN, so that it is unclear whether some subtle changes to syllable phonology can be attributed to mMAN lesions versus other causes. Future studies could more specifically examine whether damage to mMAN alone is sufficient in some cases to degrade syllable structure by using viral or other approaches that enable the more specific disruption of mMAN projection neurons.

      In practice, almost all syllables were identifiable in post-lesion songs so that we could unambiguously assign identity for purposes of evaluating effects of lesions on sequencing. Moreover, in any individual cases where there was ambiguity in syllable identity, we used the sequential context to assign the most likely label. Thus, any errors in assignment in such cases would have tended to reduce rather than accentuate the magnitude of reported sequencing effects. Lastly, each of the reported effects of mMAN lesions on sequencing were observed in multiple birds for which we detected no significant changes to syllable similarity.

      Further details of the analyses of syllable structure are detailed below, and have been added as new figure supplements:

      (1) Syllable similarity scores calculated using SAP (Sound Analysis Pro) (new Figure 2 – figure supplement 2). We compared pre-post lesion similarity scores for each syllable with selfsimilarity measures for the same syllables taken from separate control recordings before lesions. For comparison, we also included a cross-similarity score for syllables of different types. These measures confirmed the qualitative impression from spectrograms that for most birds there were no greater changes to syllable structure following lesions than was present across control recordings. For one bird, pre-post changes were significantly larger than changes across control recordings, but pre-post similarity remained higher than crosssimilarity.

      (2) Analysis of fundamental frequency and coefficient of variation (CV) of fundamental frequency of select syllables for each bird before and after mMAN lesions (new Figure 2- figure supplement 3). This analysis is directly comparable with the same analysis performed on LMAN lesions in Sakata, Hampton, Brainard (2008). We carried out this analysis in part to address changes to syllable structure that might have inadvertently arisen due to damage to LMAN, which sits immediately lateral to mMAN. In the Bengalese finch and zebra finch, lesions of LMAN cause little change to the mean fundamental frequency of individual syllables but cause a consistent reduction in the coefficient of variation (CV) of fundamental frequency across repeated renditions of a given syllable (Sakata, Hampton, Brainard 2008, Andalman, Fee 2009, Warren et al. 2011,). We therefore supposed that unintended damage to LMAN or its projections to RA might have resulted in a reduction in the CV of syllables following mMAN lesions. Instead, we saw a modest increase in the CV of fundamental frequency (mean across birds of +20%; range -19 to +43%). These data suggest that off target effects on LMAN were largely absent in our experiments (consistent with histology, e.g. Figure 1 - figure supplement 1).

      (3) Comparison of Entropy of spectral envelope (entS), Temporal centroid for the temporal envelope (meanT), First, second and third formants (F1, F2, F3), before and after lesions (calculated using the python SoundSig toolbox (Elie and Theunissen 2016) (new Figure 2- figure supplement 4). Acoustic features generally showed little change between pre and post lesion songs. They highlight as relative outliers the same individual examples that stand out in the average spectrograms in Figure 2 – figure supplement 1.

      Author response image 1.

      Syllable similarity calculated using Sound Analysis Pro (SAP). ‘Self Similarity’ = Similarity comparison of syllables before mMAN lesions to syllables of the same type, taken from two separate control recordings before the lesions, ‘Pre vs Post’ = Similarity comparison of the same syllable types before and aqer mMAN lesions, ‘Cross Similarity’ = Similarity comparison of each syllable type to other syllable types. For Birds 1-2 and 4-7, ‘Self Similarity’ was not significantly different from ‘Pre vs Post’ Similarity (p>0.05, Wilcoxon sign rank test), while for Bird 3, there was a significant difference (p = 0.03, Wilcoxon sign rank test). For all birds ‘Pre vs Post’ was significantly different from ‘Cross Similarity’ (p<0.05, Wilcoxon sign rank test). On average, ‘Pre vs Post’ was 4.8 % less than ‘Self Similarity’ (range 0.2%-14%) while ‘Cross Similarity’ was 40% less than ‘Self Similarity’ (range 20.2%-56.3%). These measures confirm the qualitative impression from Figure 2- figure supplement 1 that for most birds and syllables there were no greater changes to syllable structure following lesions than was present across control recordings, and that pre-post similarity remained higher than cross-similarity, i.e. syllables remained clearly identifiable.

      Author response image 2.

      (A) CV of fundamental frequency (FF) of select syllables before and aqer mMAN lesions. In the Bengalese finch and zebra finch, lesions of lMAN, which sits immediately lateral to mMAN, cause a consistent reduction in the coefficient of variation (CV) of fundamental frequency across repeated renditions of a given syllable (Sakata, Hampton, Brainard 2008, Andalman, Fee 2009, Warren et al. 2011). We therefore supposed that unintended damage to lMAN or its projections to RA might have resulted in a reduction in the CV of syllables following mMAN lesions. Instead we saw a modest increase in the CV of fundamental frequency (p<0.05, Wilcoxon sign rank test; mean across birds of +20%; range -19 to +43%). These data suggest that it is unlikely that changes to syllable structure might have arisen due to accidental damage to lMAN. (B) Percent change in mean fundamental frequency aqer mMAN lesions vs mean fundamental frequency before mMAN lesions.

      Author response image 3.

      Selected acoustic features for all syllables in all birds before and after mMAN lesions. Different colors represent different syllable types per bird. ‘entS’ = Entropy of spectral envelope, ‘meanT’ = Temporal centroid for temporal envelope, ‘F1’ = First formant, ‘F2’= Second formant, ‘F3’ = Third formant. Acoustic features generally showed little change between pre and post lesion songs. They highlight as relative outliers the same individual examples that stand out in the average spectrograms in Figure 2 – figure supplement 1.

      (2) Shoring up claims of increased transitional variability

      Line 301 & Sup. Figure 1, in several birds (1, 2, 5, 6), seems that there is a downward trend for postlesion, i.e. the transition entropy gradually decreases with time. How to exclude the possibility that the increased variability is a transient effect, e.g. caused by surgery side effects or destabilization of circuits, which may eventually recover to normal?

      Transition entropy remains elevated for as long as the birds were followed in this study. While the persistence of the effects we observed is longer than transient effects such as those following Nif lesion in zebra finches (Otchy et al., 2015 ~2 days), we cannot rule out either recovery or further deterioration following lesions on much longer time scales, such as those reported by Kubikova et al., 2007 (X lesions, 6 months). We have now added data points for 4 birds where we had songs from later timepoints following lesions; for three of these birds, transition entropy remained elevated above the baseline values for 14 and 33 days, respectively (Figure 1 - figure supplement 2).

      Line 313 & Sup. Figure 4, the claim that "transitions that had low history dependence tended to show larger changes after mMAN lesions" needs better statistical support, because in Sup. Figure 4, the correlation is not significant.

      We apologize for the phrasing. We have changed the sentence to: “Consistent with the first possibility, we observed that there was a nonsignificant trend toward larger changes after mMAN lesion for transitions with low history dependence.”

      Figure 4C-D, only data from 5 out of 7 birds was included, did the other two birds not have repeats? If so, the authors need to be explicit on data exclusion.

      The reviewer’s inference is correct that in our dataset only 5 out of 7 birds had songs which contained repeat phrases. We have added the following sentence to state that explicitly: “In our dataset of 7 birds, only 5 birds had songs which contained repeat phrases.”

      Minor comments

      Sup. Figure 3, to help readers understand, 1) add symbols and arrows to point to the structures; 2) indicate the orientation of the slide, e.g. which direction is medial/lateral; 3) a negative control without lesion needs to be shown for comparison.

      We have made the suggested changes and updated new Figure 1- figure supplement 1.

      Author response image 4.

      Image of calcitonin gene-related peptide (CGRP)-stained frontal section (leq) control and (right) bird 5. CGRP labels cells in both lMAN (seen in black to the leq of the lesion) and mMAN (blue, intact; red, completely destroyed).

      A statistical test is needed for Sup. Figure 5B.

      We have modified the Figure legend for Figure 3 – figure supplement 1 as follows:

      “Change in transition entropy was not significantly different for transitions within chunks and at branchpoints (p> 0.05, Wilcoxon rank sum test)”

      Line 363, these can be moved to the Introduction, so readers have a better sense of what's already known about MMAN lesion.

      We have moved the sentence to Introduction.

      Fig 1e. RA also projects to DLM.

      Our intention was to focus on the connections involving mMAN; we have now added the connection in Figure 1E.

      Reviewer #2 (Recommenda1ons For The Authors):

      Please address this issue in the discussion (no new experiments required): It would be interesting to consider how social context modulates the variability of the song. In these experiments, Bengalese finches were singing in isolation. How might changes in syntax be modulated by the presence of a female in directed song and in other social contexts?

      Thank you for your suggestion. One study by Jarvis, et al., (Jarvis E., et al., 1998) shows that ZENK expression in mMAN aqer singing does not differ between female-directed singing, undirected singing and singing in presence of a male conspecific. This suggests that activity in mMAN might not be modulated by social context. But we agree that it would be interesting to test how a change in social context (which typically leads to reduced transition entropy) interacts with the increased variability we see aqer mMAN lesions. We have added the following sentences to the discussion:

      “In our study, we only recorded song sequencing of male Bengalese finches singing in isolaBon. Social context, such as female-directed song, can also change song sequencing (Hampton, Sakata and Brainard, 2009; Chen, Matheson and Sakata, 2016). It would be interesBng to test whether mMAN plays a role in the social context-modulated changes in sequencing (Jarvis et al., 1998), similar to how lMAN contributes to social context-modulated changes in syllable structure (Sakata, Hampton and Brainard, 2008).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review

      [...] A particular strength of the present study is the structural characterization of human PURA, which is a challenging target for structural biology approaches. The molecular dynamics simulations are state-of-the-art, allowing a statistically meaningful assessment of the differences between wild-type and mutant proteins. The functional consequences of PURA mutations at the cellular level are fascinating, particularly the differential compartmentalization of wild-type and mutant PURA variants into certain subcellular condensates.

      Weaknesses that warrant rectification relate to (i) The interpretation of statistically non-significant effects seen in the molecular dynamic simulations.

      We removed from the manuscript the sentence which indicated that we analyzed statistically non-significant effects. Therefore, the above statement has been resolved.

      (ii) The statistical analysis of the differential compartmentalization of PURA variants into processing bodies vs. stress granules, and

      We re-analyzed all cell-biological data and adjusted the statistical analysis of P-bodies and Stress-granule intensity analysis. The new, and improved statistics have replaced the original analyses in the corresponding figures (Figs. 1C and 2B).

      (iii) Insufficient documentation of protein expression levels and knock-down efficiencies.

      Quantification of protein expression levels by Western blotting is shown in Appendix Figure S1. Quantification of knock-down efficiencies by Western blot experiments (Appendix Figure S3).

      Recommendations for the authors: Reviewer #1

      Concerns and Suggested Changes

      (a) I have only one concern about the computational part and that is about statements such as "There are also large differences in the residue surrounding the mutation spot (residues 90 to 100), where the K97E mutant also shows much greater fluctuation. However, these differences are not significant due to the large standard deviations." If the differences are not statistically significant, then I would suggest either removing such a statement or increasing the statistics.

      We agree with the Reviewer’s comment. We removed this sentence from the text.

      Recommendations for the authors: Reviewer #2

      General Comments

      This is a challenging structural target and the authors have made considerable efforts to determine the effect of several mutations on the structure and function. Many of the constructs, however, could not be expressed and/or purified in bacteria. However, it is not clear to what extent other expression systems (e.g. Drosophila or human) were considered and if this would have been beneficial.

      We did not use other expression systems because the wild-type protein is well-behaved when expressed in E. coli. In case a mutant variant cannot be expressed or does not behave well in E. coli, this constitutes a clear indication that the respective mutation impairs the protein’s integrity. Thus, by using E. coli as a reference system for all the variants of PURA protein, we could assess the influence of the mutations on the structural integrity and solubility. Only for the variants that did not show impairment in E. coli expression, we continued to assess in more detail why they are nevertheless functionally impaired and cause PURA Syndrome.

      Concerns and Suggested Changes

      (a) The schematic in Figure 3A would have been helpful for interpreting the mutations discussed in Figures 1 and 2. I would suggest moving it earlier in the text.

      We changed the figure according to the Reviewer’s suggestion.

      (b) I believe the RNA used for binding studies in Figures 3C and D was (CGG)8. Are the two "free" RNA bands a monomer and a dimer (duplex?)?

      Although we do not know for certain, it is indeed likely that the two free RNA bands represent either different secondary structures of the free RNA or a duplex of two molecules. Of note, PURA binds to both “free” RNA bands, indicating that it either does not discriminate between them or melts double-stranded RNA in these EMSAs.

      There also seems to be considerable cooperativity in the binding, so I wonder if a shorter RNA oligonucleotide might facilitate the measurement of Kds.

      The length of the used RNA was selected based on the estimated elongated size of the full-length PURA and the presence of 3 PUR repeats. Assuming that one PUR repeat interacts with about 6-7 bases (data from the co-structure of Drosophila PURA with DNA; PDB-ID: 5FGP) and that full-length PURA forms a dimer consisting of three PUR repeats, the full-length protein in its extended form should cover a nucleic-acid stretch of about 24 bases.

      Also, it is not clear how the affinities were measured particularly for hsPURA III since free band is never fully bound at the highest protein concentration.

      It was not our goal to measure Kds for the interaction of PURA variants with RNA. The EMSA experiments were conducted to detect relative differences in the interaction between PURA variants and RNA. To estimate the differences, we measured total intensity of the bound (shifted) and unbound RNA. The intensities of the bands observed on the scanned EMSA gels were quantified with FUJI ImageJ software. We calculated the percentage of the shifted RNA and normalized it. hsPURA III fragment shows much lower affinity therefore it does not fully shift RNA with the highest protein concentration when compared to the full-length PURA and to PURA I-II.

      (c) Do the human PURA I+II and dmPURA I+ II crystallize in the same space group and have similar packing? Can the observed structural flexibility be due to crystal contacts?

      hsPURA I+II and dmPURA I+II crystallize in different space groups with different crystal packing. In both cases, the asymmetric unit contains 4 independent molecules with the flexible part of the structure composed of the β4 and β8 (β ridge) exposed to solvent. In the case of the Drosophila structure, we do not observe any flexibility of both β-strands. In contrast, for the human PURA structure the β ridge exhibits lots of flexibility and it adopts different conformations in all 4 molecules of the asymmetric unit. We observe similar flexibility of the β4 and β8 (β ridge) in the structure of K97E mutant which contains 2 molecules in the asymmetric unit. We would like to add that we expect crystal contacts to rather stabilize than destabilize domains.

      Similarly, can the conformations observed for the K97E mutant be partially explained by packing?

      Regarding the sequence shift observed for the β5 and β6 strands in hsPURA I+II K97E variant: although the β5 strand with shifted amino acid sequence is involved in the contact with the symmetry-related molecule with another β5 strand we don’t consider this interaction as a source of the shift. To be sure that the shift is not forced by the crystallization, we had performed NMR measurement which confirmed that in solution there is a strong change in the β-stands comparing WT and K97E mutant. This is an unambiguous indication that the structural changes observed in the crystal structure are also happening in solution. In addition, the MD simulations provide additional confirmation of our interpretation that K97E destabilizes the corresponding PUR domain. Taken together, we provide proof from three different angles that the observed differences indeed affect the integrity and hence function of the protein.

      (d) Perhaps, it is my misunderstanding, but I find the NMR data on the Arg sidechains for the K97E confusing. If they are visible for K97E and not WT, doesn't this indicate that there is an exchange between two conformations or more dynamics in the WT structure? This does not seem to be the opposite of the expectation if K97E is thought to have more conformational flexibility.

      Due to a technical issue (peak contour level), arginine side chain resonances were not clearly visible in the WT spectrum. The figure 5F has been updated. Now, they do correspond to those seen in the mutant spectrum. However, to prevent any confusion or mis/overinterpretation, we removed the sentence regarding arginine side chain: "Intriguingly, arginine side chain resonances Nε-Hε were only visible in the K97E variant, while they were broadened out in the wild-type spectrum."

      (e) The most speculative part of the paper is the interpretation of SG and PB localization of PURA in Fig 1 and 2. There is an important issue with the statistics that must be clarified because it would appear that statistical significance was determined using each SG or PB as an independent measurement. This is incorrect and significance should be measured by only using the means of three biological replicates. This is well described here. It is not clear at this time if the reported P values will be confirmed upon reanalysis, and this may require reinterpretation of the data.

      We are grateful for this clarifying comment and agree that the statistical analysis of P-body and stress granule was misleading. Of note, while the figures depicted all the values independent of the biological repeats, the statistical analyses were done on the mean value of each replicate of each cell line and not all raw data points.

      We prepared new Plots, only showing the mean value of each replicate, and also re-calculated P-values. The values have changed only slightly in this new analysis because we now also included the previously labeled outliers (red points) to better demonstrate that significance still exists even when considering them.

      In the new analysis of stress-granule association, only the value of the K97E mutant lost its significance, indicating that its association to stress granules is not lost. Therefore, we adjusted the following sentences in the manuscript.

      Results:

      Original: "While quantification showed a reduced association of hsPURA K97E mutant with G3BP1-positive granules (Fig 1B), the two other mutants, I206F and F233del, showed the same co-localization to stress granules as the wild type control."

      Corrected: "In all the patient-related mutations, no significant reduction in stress granule association was seen when compared to the wild type control (Fig 1C)."

      Original: "The observation that only one of the patient-related mutations of hsPURA, K97E, showed reduced stress granule association indicates that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      Corrected: "As we did not observe significant changes in the association of patient-related mutations of hsPURA to stress granules, it is suggested that that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      (f) A western blot showing the level of overexpression of the PURA proteins should be shown in Figure 1 as well as the KD of endogenous PURA for Figure S2?

      As requested, a Western blot showing the level of overexpression of the different PURA proteins has been added as Appendix Figure S1.

      A Western blot of the siRNA-mediated knock-down experiments of PURA and their corresponding control has been added to Appendix Figure S3. Quantification of three biological repeats showed a significant reduction of PURA protein levels upon knock down.

      (g) While I appreciate that rewriting is time-consuming, I would recommend considering restructuring the manuscript because I think that it would aid the overall clarity. I think the foundation of the work is the structural characterization and would suggest beginning the paper with this data and the biochemical characterization. The co-localization with SGs and PBs and how this may be relevant to disease is much more speculative and is therefore better to present later. While I appreciate that the structural interpretation of why some mutants localize to PBs differently is not entirely clear, I do think that this would provide some context for the discussion.

      In the initial version of the manuscript we first presented the structural characterization of PURA and afterwards the co-localization with SGs and PBs. As this reviewer stated him-/herself in (e), we also noticed that the SG and PB interpretation is the most speculative part of this manuscript. We felt that having this at the end of the results section would weaken the manuscript. On the other hand, we consider that the structural interpretation of mutations is much stronger and has a greater impact for future research. After long discussion we decided to swap the order to leave the most important results for the end of the manuscript.

      Recommendations for the authors: Reviewer #3

      Concerns and Suggested Changes:

      (a) For the characterization of G3BP1-positive stress granules in HeLa cells upon depletion of PURA, it remains unclear what is the efficiency of siRNA? The authors should provide a western blot to indicate how much the endogenous levels were reduced.

      We completely agree with the stated concern and addressed it accordingly. We had performed this experiment prior to submission but for some unknown reason it was not included in the manuscript.

      The Western blot of siRNA-mediated knock-down experiments of PURA and their corresponding control is now shown in Appendix Figure S3. Quantification of three biological repeats, showed a significant reduction of PURA protein levels upon knock down.

      (b) How does knocking down PURA affect DCP1A-positive structures in HeLa cells? Would P bodies be formed even in the absence (or reduction) of total PURA?

      Indeed, the stated question is very interesting. In fact, we have already shown in our recent publication (Molitor et al., 2023) that a knock down of PURA in HeLa and NHDF cells leads to a significant reduction of P-bodies. We actually referred to this finding on page 6:

      "Since hsPURA was recently shown to be required for P-body formation in HeLa cells and fibroblasts (Molitor et al. 2023), PURA-dependent liquid phase separation could potentially also directly contribute to the formation of these granules."

      On the same page, we also refer to the underlying molecular mechanism:

      "However, when putting this observation in perspective with previous reports, it seems unlikely that P-body formation directly depends on phase separation by hsPURA, but rather on its recently reported function as gene regulator of the essential P-body core factors LSM14a and DDX6 (Molitor et al., 2023)."

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      These ingenious and thoughtful studies present important findings concerning how people represent and generalise abstract patterns of sensory data. The issue of generalisation is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception, learning, and cognitive science. The findings have the potential to provide compelling support for the outlined account, but there appear other possible explanations, too, that may affect the scope of the findings but could be considered in a revision.

      Thank you for sending the feedback from the three peer reviewers regarding our paper. Please find below our detailed responses addressing the reviewers' comments. We have incorporated these suggestions into the paper and provided explanations for the modifications made.

      We have specifically addressed the point of uncertainty highlighted in eLife's editorial assessment, which concerned alternative explanations for the reported effect. In response to Reviewer #1, we have clarified how Exp. 2c and Exp. 3c address the potential alternative explanation related to "attention to dimensions." Further, we present a supplementary analysis to account for differences in asymptotic learning, as noted by Reviewer #2. We have also clarified how our control experiments address effects associated with general cognitive engagement in the task. Lastly, we have further clarified the conceptual foundation of our paper, addressing concerns raised by Reviewers #2 and #3.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports a series of experiments examining category learning and subsequent generalization of stimulus representations across spatial and nonspatial domains. In Experiment 1, participants were first trained to make category judgments about sequences of stimuli presented either in nonspatial auditory or visual modalities (with feature values drawn from a two-dimensional feature manifold, e.g., pitch vs timbre), or in a spatial modality (with feature values defined by positions in physical space, e.g., Cartesian x and y coordinates). A subsequent test phase assessed category judgments for 'rotated' exemplars of these stimuli: i.e., versions in which the transition vectors are rotated in the same feature space used during training (near transfer) or in a different feature space belonging to the same domain (far transfer). Findings demonstrate clearly that representations developed for the spatial domain allow for representational generalization, whereas this pattern is not observed for the nonspatial domains that are tested. Subsequent experiments demonstrate that if participants are first pre-trained to map nonspatial auditory/visual features to spatial locations, then rotational generalization is facilitated even for these nonspatial domains. It is argued that these findings are consistent with the idea that spatial representations form a generalized substrate for cognition: that space can act as a scaffold for learning abstract nonspatial concepts.

      Strengths:

      I enjoyed reading this manuscript, which is extremely well-written and well-presented. The writing is clear and concise throughout, and the figures do a great job of highlighting the key concepts. The issue of generalization is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception and cognitive science. It's also excellent to see that the hypotheses, methods, and analyses were pre-registered.

      The experiments that have been run are ingenious and thoughtful; I particularly liked the use of stimulus structures that allow for disentangling of one-dimensional and two-dimensional response patterns. The studies are also well-powered for detecting the effects of interest. The model-based statistical analyses are thorough and appropriate throughout (and it's good to see model recovery analysis too). The findings themselves are clear-cut: I have little doubt about the robustness and replicability of these data.

      Weaknesses:

      I have only one significant concern regarding this manuscript, which relates to the interpretation of the findings. The findings are taken to suggest that "space may serve as a 'scaffold', allowing people to visualize and manipulate nonspatial concepts" (p13). However, I think the data may be amenable to an alternative possibility. I wonder if it's possible that, for the visual and auditory stimuli, participants naturally tended to attend to one feature dimension and ignore the other - i.e., there may have been a (potentially idiosyncratic) difference in salience between the feature dimensions that led to participants learning the feature sequence in a one-dimensional way (akin to the 'overshadowing' effect in associative learning: e.g., see Mackintosh, 1976, "Overshadowing and stimulus intensity", Animal Learning and Behaviour). By contrast, we are very used to thinking about space as a multidimensional domain, in particular with regard to two-dimensional vertical and horizontal displacements. As a result, one would naturally expect to see more evidence of two-dimensional representation (allowing for rotational generalization) for spatial than nonspatial domains.

      In this view, the impact of spatial pre-training and (particularly) mapping is simply to highlight to participants that the auditory/visual stimuli comprise two separable (and independent) dimensions. Once they understand this, during subsequent training, they can learn about sequences on both dimensions, which will allow for a 2D representation and hence rotational generalization - as observed in Experiments 2 and 3. This account also anticipates that mapping alone (as in Experiment 4) could be sufficient to promote a 2D strategy for auditory and visual domains.

      This "attention to dimensions" account has some similarities to the "spatial scaffolding" idea put forward in the article, in arguing that experience of how auditory/visual feature manifolds can be translated into a spatial representation helps people to see those domains in a way that allows for rotational generalization. Where it differs is that it does not propose that space provides a scaffold for the development of the nonspatial representations, i.e., that people represent/learn the nonspatial information in a spatial format, and this is what allows them to manipulate nonspatial concepts. Instead, the "attention to dimensions" account anticipates that ANY manipulation that highlights to participants the separable-dimension nature of auditory/visual stimuli could facilitate 2D representation and hence rotational generalization. For example, explicit instruction on how the stimuli are constructed may be sufficient, or pre-training of some form with each dimension separately, before they are combined to form the 2D stimuli.

      I'd be interested to hear the authors' thoughts on this account - whether they see it as an alternative to their own interpretation, and whether it can be ruled out on the basis of their existing data.

      We thank the Reviewer for their comments. We agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are incompatible with this alternative explanation.

      In Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is thus necessary to pay attention to both auditory dimensions and both visual dimensions to perform the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, L&S investigates the important general question of how humans achieve invariant behavior over stimuli belonging to one category given the widely varying input representation of those stimuli and more specifically, how they do that in arbitrary abstract domains. The authors start with the hypothesis that this is achieved by invariance transformations that observers use for interpreting different entries and furthermore, that these transformations in an arbitrary domain emerge with the help of the transformations (e.g. translation, rotation) within the spatial domain by using those as "scaffolding" during transformation learning. To provide the missing evidence for this hypothesis, L&S used behavioral category learning studies within and across the spatial, auditory, and visual domains, where rotated and translated 4-element token sequences had to be learned to categorize and then the learned transformation had to be applied in new feature dimensions within the given domain. Through single- and multiple-day supervised training and unsupervised tests, L&S demonstrated by standard computational analyses that in such setups, space and spatial transformations can, indeed, help with developing and using appropriate rotational mapping whereas the visual domain cannot fulfill such a scaffolding role.

      Strengths:

      The overall problem definition and the context of spatial mapping-driven solution to the problem is timely. The general design of testing the scaffolding effect across different domains is more advanced than any previous attempts clarifying the relevance of spatial coding to any other type of representational codes. Once the formulation of the general problem in a specific scientific framework is done, the following steps are clearly and logically defined and executed. The obtained results are well interpretable, and they could serve as a good stepping stone for deeper investigations. The analytical tools used for the interpretations are adequate. The paper is relatively clearly written.

      Weaknesses:

      Some additional effort to clarify the exact contribution of the paper, the link between analyses and the claims of the paper, and its link to previous proposals would be necessary to better assess the significance of the results and the true nature of the proposed mechanism of abstract generalization.

      (1) Insufficient conceptual setup: The original theoretical proposal (the Tolman-Eichenbaum-Machine, Whittington et al., Cell 2020) that L&S relate their work to proposes that just as in the case of memory for spatial navigation, humans and animals create their flexible relational memory system of any abstract representation by a conjunction code that combines on the one hand, sensory representation and on the other hand, a general structural representation or relational transformation. The TEM also suggests that the structural representation could contain any graph-interpretable spatial relations, albeit in their demonstration 2D neighbor relations were used. The goal of L&S's paper is to provide behavioral evidence for this suggestion by showing that humans use representational codes that are invariant to relational transformations of non-spatial abstract stimuli and moreover, that humans obtain these invariances by developing invariance transformers with the help of available spatial transformers. To obtain such evidence, L&S use the rotational transformation. However, the actual procedure they use actually solved an alternative task: instead of interrogating how humans develop generalizations in abstract spaces, they demonstrated that if one defines rotation in an abstract feature space embedded in a visual or auditory modality that is similar to the 2D space (i.e. has two independent dimensions that are clearly segregable and continuous), humans cannot learn to apply rotation of 4-piece temporal sequences in those spaces while they can do it in 2D space, and with co-associating a one-to-one mapping between locations in those feature spaces with locations in the 2D space an appropriate shaping mapping training will lead to the successful application of rotation in the given task (and in some other feature spaces in the given domain). While this is an interesting and challenging demonstration, it does not shed light on how humans learn and generalize, only that humans CAN do learning and generalization in this, highly constrained scenario. This result is a demonstration of how a stepwise learning regiment can make use of one structure for mapping a complex input into a desired output. The results neither clarify how generalizations would develop in abstract spaces nor the question of whether this generalization uses transformations developed in the abstract space. The specific training procedure ensures success in the presented experiments but the availability and feasibility of an equivalent procedure in a natural setting is a crucial part of validating the original claim and that has not been done in the paper.

      We thank the Reviewer for their detailed comments on our manuscript. We reply to the three main points in turn.

      First, concerning the conceptual grounding of our work, we would point out that the TEM model (Whittington et al., 2020), however interesting, is not our theoretical starting point. Rather, as we hope the text and references make clear, we ground our work in theoretical work from the 1990/2000s proposing that space acts as a scaffold for navigating abstract spaces (such as Gärdenfors, 2000). We acknowledge that the TEM model and other experimental work on the implication of the hippocampus, the entorhinal cortex and the parietal cortex in relational transformations of nonspatial stimuli provide evidence for this general theory. However, our work is designed to test a more basic question: whether there is behavioural evidence that space scaffolds learning in the first place. To achieve this, we perform behavioural experiments with causal manipulation (spatial pre-training vs no spatial pre-training) have the potential to provide such direct evidence. This is why we claim that:

      “This theory is backed up by proof-of-concept computational simulations [13], and by findings that brain regions thought to be critical for spatial cognition in mammals (such as the hippocampal-entorhinal complex and parietal cortex) exhibit neural codes that are invariant to relational transformations of nonspatial stimuli. However, whilst promising, this theory lacks direct empirical evidence. Here, we set out to provide a strong test of the idea that learning about physical space scaffolds conceptual generalisation.“

      Second, we agree with the Reviewer that we do not provide an explicit model for how generalisation occurs, and how precisely space acts as a scaffold for building representations and/or applying the relevant transformations to non-spatial stimuli to solve our task. Rather, we investigate in our Exp. 2-4 which aspects of the training are necessary for rotational generalisation to happen (and conclude that a simple training with the multimodal association task is sufficient for ~20% participants). We now acknowledge in the discussion the fact that we do not provide an explicit model and leave that for future work:

      “We acknowledge that our study does not provide a mechanistic model of spatial scaffolding but rather delineate which aspects of the training are necessary for generalisation to happen.”

      Finally, we also agree with the Reviewer that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      (2) Missing controls: The asymptotic performance in experiment 1 after training in the three tasks was quite different in the three tasks (intercepts 2.9, 1.9, 1.6 for spatial, visual, and auditory, respectively; p. 5. para. 1, Fig 2BFJ). It seems that the statement "However, our main question was how participants would generalise learning to novel, rotated exemplars of the same concept." assumes that learning and generalization are independent. Wouldn't it be possible, though, that the level of generalization depends on the level of acquiring a good representation of the "concept" and after obtaining an adequate level of this knowledge, generalization would kick in without scaffolding? If so, a missing control is to equate the levels of asymptotic learning and see whether there is a significant difference in generalization. A related issue is that we have no information on what kind of learning in the three different domains was performed, albeit we probably suspect that in space the 2D representation was dominant while in the auditory and visual domains not so much. Thus, a second missing piece of evidence is the model-fitting results of the ⦰ condition that would show which way the original sequences were encoded (similar to Fig 2 CGK and DHL). If the reason for lower performance is not individual stimulus difficulty but the natural tendency to encode the given stimulus type by a combo of random + 1D strategy that would clarify that the result of the cross-training is, indeed, transferring the 2D-mapping strategy.

      We agree with the Reviewer that a good further control is to equate performance during training. Thus, we have run a complementary analysis where we select only the participants that reach > 90% accuracy in the last block of training in order to equate asymptotic performance after training in Exp. 1. The results (see Author response image 1) replicates the results that we report in the main text: there is a large difference between groups (relative likelihood of 1D vs. 2D models, all BF > 100 in favour of a difference between the auditory and the spatial modalities, between the visual and the spatial modalities, in both near and far transfer, “decisive” evidence). We prefer not to include this figure in the paper for clarity, and because we believe this result is expected given the fact that 0/50 and 0/50 of the participants in the auditory and visual condition used a 2D strategy – thus, selecting subgroups of these participants cannot change our conclusions.

      Author response image 1.

      Results of Exp. 1 when selecting participants that reached > 90% accuracy in the last block of training. Captions are the same as Figure 2 of the main text.

      Second, the Reviewer suggested that we run the model fitting analysis only on the ⦰ condition (training) in Exp. 1 to reveal whether participants use a 1D or a 2D strategy already during training. Unfortunately, we cannot provide the model fits only in the ⦰ condition in Exp. 1 because all models make the same predictions for this condition (see Fig S4). However, note that this is done by design: participants were free to apply whatever strategy they want during training; we then used the generalisation phase with the rotated stimuli precisely to reveal this strategy. Further, we do believe that the strategy used by the participants during training and the strategy during transfer are the same, partly because – starting from block #4 – participants have no idea whether the current trial is a training trial or a transfer trial, as both trial types are randomly interleaved with no cue signalling the trial type. We have made this clear in the methods:

      “They subsequently performed 105 trials (with trialwise feedback) and 105 transfer trials including rotated and far transfer quadruplets (without trialwise feedback) which were presented in mixed blocks of 30 trials. Training and transfer trials were randomly interleaved, and no clue indicated whether participants were currently on a training trial or a transfer trial before feedback (or absence of feedback in case of a transfer trial).”

      Reviewer #3 (Public Review):

      Summary:

      Pesnot Lerousseau and Summerfield aimed to explore how humans generalize abstract patterns of sensory data (concepts), focusing on whether and how spatial representations may facilitate the generalization of abstract concepts (rotational invariance). Specifically, the authors investigated whether people can recognize rotated sequences of stimuli in both spatial and nonspatial domains and whether spatial pre-training and multi-modal mapping aid in this process.

      Strengths:

      The study innovatively examines a relatively underexplored but interesting area of cognitive science, the potential role of spatial scaffolding in generalizing sequences. The experimental design is clever and covers different modalities (auditory, visual, spatial), utilizing a two-dimensional feature manifold. The findings are backed by strong empirical data, good data analysis, and excellent transparency (including preregistration) adding weight to the proposition that spatial cognition can aid abstract concept generalization.

      Weaknesses:

      The examples used to motivate the study (such as "tree" = oak tree, family tree, taxonomic tree) may not effectively represent the phenomena being studied, possibly confusing linguistic labels with abstract concepts. This potential confusion may also extend to doubts about the real-life applicability of the generalizations observed in the study and raises questions about the nature of the underlying mechanism being proposed.

      We thank the Reviewer for their comments. We agree that we could have explained ore clearly enough how these examples motivate our study. The similarity between “oak tree” and “family tree” is not just the verbal label. Rather, it is the arrangement of the parts (nodes and branches) in a nested hierarchy. Oak trees and family trees share the same relational structure. The reason that invariance is relevant here is that the similarity in relational structure is retained under rigid body transformations such as rotation or translation. For example, an upside-down tree can still be recognised as a tree, just as a family tree can be plotted with the oldest ancestors at either top or bottom. Similarly, in our study, the quadruplets are defined by the relations between stimuli: all quadruplets use the same basic stimuli, but the categories are defined by the relations between successive stimuli. In our task, generalising means recognising that relations between stimuli are the same despite changes in the surface properties (for example in far transfer). We have clarify that in the introduction:

      “For example, the concept of a “tree” implies an entity whose structure is defined by a nested hierarchy, whether this is a physical object whose parts are arranged in space (such as an oak tree in a forest) or a more abstract data structure (such as a family tree or taxonomic tree). [...] Despite great changes in the surface properties of oak trees, family trees and taxonomic trees, humans perceive them as different instances of a more abstract concept defined by the same relational structure.”

      Next, the study does not explore whether scaffolding effects could be observed with other well-learned domains, leaving open the question of whether spatial representations are uniquely effective or simply one instance of a familiar 2D space, again questioning the underlying mechanism.

      We would like to mention that Reviewer #2 had a similar comment. We agree with both Reviewers that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      Further doubt on the underlying mechanism is cast by the possibility that the observed correlation between mapping task performance and the adoption of a 2D strategy may reflect general cognitive engagement rather than the spatial nature of the task. Similarly, the surprising finding that a significant number of participants benefited from spatial scaffolding without seeing spatial modalities may further raise questions about the interpretation of the scaffolding effect, pointing towards potential alternative interpretations, such as shifts in attention during learning induced by pre-training without changing underlying abstract conceptual representations.

      The Reviewer is concerned about the fact that the spatial pre-training could benefit the participants by increasing global cognitive engagement rather than providing a scaffold for learning invariances. It is correct that the participants in the control group in Exp. 2c have poorer performances on average than participants that benefit from the spatial pre-training in Exp. 2a and 2b. The better performances of the participants in Exp. 2a and 2b could be due to either the spatial nature of the pre-training (as we claim) or a difference in general cognitive engagement. .

      However, if we look closely at the results of Exp. 3, we can see that the general cognitive engagement hypothesis is not well supported by the data. Indeed, the participants in the control condition (Exp. 3c) have relatively similar performances than the other groups during training. Rather, the difference is in the strategy they use, as revealed by the transfer condition. The majority of them are using a 1D strategy, contrary to the participants that benefited from a spatial pre-training (Exp 3a and 3b). We have included a sentence in the results:

      “Further, the results show that participants who did not experience spatial pre-training were still engaged in the task, but were not using the same strategy as the participants who experienced spatial pre-training (1D rather than 2D). Thus, the benefit of the spatial pre-training is not simply to increase the cognitive engagement of the participants. Rather, spatial pre-training provides a scaffold to learn rotation-invariant representation of auditory and visual concepts even when rotation is never explicitly shown during pre-training.”

      Finally, Reviewer #1 had a related concern about a potential alternative explanation that involved a shift in attention. We reproduce our response here: we agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting (and potentially concerning) alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are not compatible with this alternative explanation.

      Indeed, in Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is necessary to pay attention to both auditory dimensions and both visual dimensions to perform well in the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants actually paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Conclusions:

      The authors successfully demonstrate that spatial training can enhance the ability to generalize in nonspatial domains, particularly in recognizing rotated sequences. The results for the most part support their conclusions, showing that spatial representations can act as a scaffold for learning more abstract conceptual invariances. However, the study leaves room for further investigation into whether the observed effects are unique to spatial cognition or could be replicated with other forms of well-established knowledge, as well as further clarifications of the underlying mechanisms.

      Impact:

      The study's findings are likely to have a valuable impact on cognitive science, particularly in understanding how abstract concepts are learned and generalized. The methods and data can be useful for further research, especially in exploring the relationship between spatial cognition and abstract conceptualization. The insights could also be valuable for AI research, particularly in improving models that involve abstract pattern recognition and conceptual generalization.

      In summary, the paper contributes valuable insights into the role of spatial cognition in learning abstract concepts, though it invites further research to explore the boundaries and specifics of this scaffolding effect.

      Reviewer #1 (Recommendations For The Authors):

      Minor issues / typos:

      P6: I think the example of the "signed" mapping here should be "e.g., ABAB maps to one category and BABA maps to another", rather than "ABBA maps to another" (since ABBA would always map to another category, whether the mapping is signed or unsigned).

      Done.

      P11: "Next, we asked whether pre-training and mapping were systematically associated with 2Dness...". I'd recommend changing to: "Next, we asked whether accuracy during pre-training and mapping were systematically associated with 2Dness...", just to clarify what the analyzed variables are.

      Done.

      P13, paragraph 1: "only if the features were themselves are physical spatial locations" either "were" or "are" should be removed.

      Done.

      P13, paragraph 1: should be "neural representations of space form a critical substrate" (not "for").

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors use in multiple places in the manuscript the phrases "learn invariances" (Abstract), "formation of invariances" (p. 2, para. 1), etc. It might be just me, but this feels a bit like 'sloppy' wording: we do not learn or form invariances, rather we learn or form representations or transformations by which we can perform tasks that require invariance over particular features or transformation of the input such as the case of object recognition and size- translation- or lighting-invariance. We do not form size invariance, we have representations of objects and/or size transformations allowing the recognition of objects of different sizes. The authors might change this way of referring to the phenomenon.

      We respectfully disagree with this comment. An invariance occurs when neurons make the same response under different stimulation patterns. The objects or features to which a neuron responds is shaped by its inputs. Those inputs are in turn determined by experience-dependent plasticity. This process is often called “representation learning”. We think that our language here is consistent with this status quo view in the field.

      Reviewer #3 (Recommendations For The Authors):

      • I understand that the objective of the present experiment is to study our ability to generalize abstract patterns of sensory data (concepts). In the introduction, the authors present examples like the concept of a "tree" (encompassing a family tree, an oak tree, and a taxonomic tree) and "ring" to illustrate the idea. However, I am sceptical as to whether these examples effectively represent the phenomena being studied. From my perspective, these different instances of "tree" do not seem to relate to the same abstract concept that is translated or rotated but rather appear to share only a linguistic label. For instance, the conceptual substance of a family tree is markedly different from that of an oak tree, lacking significant overlap in meaning or structure. Thus, to me, these examples do not demonstrate invariance to transformations such as rotations.

      To elaborate further, typically, generalization involves recognizing the same object or concept through transformations. In the case of abstract concepts, this would imply a shared abstract representation rather than a mere linguistic category. While I understand the objective of the experiments and acknowledge their potential significance, I find myself wondering about the real-world applicability and relevance of such generalizations in everyday cognitive functioning. This, in turn, casts some doubt on the broader relevance of the study's results. A more fitting example, or an explanation that addresses my concerns about the suitability of the current examples, would be beneficial to further clarify the study's intent and scope.

      Response in the public review.

      • Relatedly, the manuscript could benefit from greater clarity in defining key concepts and elucidating the proposed mechanism behind the observed effects. Is it plausible that the changes observed are primarily due to shifts in attention induced by the spatial pre-training, rather than a change in the process of learning abstract conceptual invariances (i.e., modifications to the abstract representations themselves)? While the authors conclude that spatial pre-training acts as a scaffold for enhancing the learning of conceptual invariances, it raises the question: does this imply participants simply became more focused on spatial relationships during learning, or might this shift in attention represent a distinct strategy, and an alternative explanation? A more precise definition of these concepts and a clearer explanation of the authors' perspective on the mechanism underlying these effects would reduce any ambiguity in this regard.

      Response in the public review.

      • I am wondering whether the effectiveness of spatial representations in generalizing abstract concepts stems from their special nature or simply because they are a familiar 2D space for participants. It is well-established that memory benefits from linking items to familiar locations, a technique used in memory training (method of loci). This raises the question: Are we observing a similar effect here, where spatial dimensions are the only tested familiar 2D spaces, while the other 2 spaces are simply unfamiliar, as also suggested by the lower performance during training (Fig.2)? Would the results be replicable with another well-learned, robustly encoded domain, such as auditory dimensions for professional musicians, or is there something inherently unique about spatial representations that aids in bootstrapping abstract representations?

      On the other side of the same coin, are spatial representations qualitatively different, or simply more efficient because they are learned more quickly and readily? This leads to the consideration that if visual pre-training and visual-to-auditory mapping were continued until a similar proficiency level as in spatial training is achieved, we might observe comparable performance in aiding generalization. Thus, the conclusion that spatial representations are a special scaffold for abstract concepts may not be exclusively due to their inherent spatial nature, but rather to the general characteristic of well-established representations. This hypothesis could be further explored by either identifying alternative 2D representations that are equally well-learned or by extending training in visual or auditory representations before proceeding with the mapping task. At the very least I believe this potential explanation should be explored in the discussion section.

      Response in the public review.

      I had some difficulty in following an important section of the introduction: "... whether participants can learn rotationally invariant concepts in nonspatial domains, i.e., those that are defined by sequences of visual and auditory features (rather than by locations in physical space, defined in Cartesian or polar coordinates) is not known." This was initially puzzling to me as the paragraph preceding it mentions: "There is already good evidence that nonspatial concepts are represented in a translation invariant format." While I now understand that the essential distinction here is between translation and rotation, this was not immediately apparent upon first reading. This crucial distinction, especially in the context of conceptual spaces, was not clearly established before this point in the manuscript. For better clarity, it would be beneficial to explicitly contrast and define translation versus rotation in this particular section and stress that the present study concerns rotations in abstract spaces.

      Done.

      • The multi-modal association is crucial for the study, however to my knowledge, it is not depicted or well explained in the main text or figures (Results section). In my opinion, the details of this task should be explained and illustrated before the details of the associated results are discussed.

      We have included an illustration of a multimodal association trial in Fig. S3B.

      Author response image 2.

      • The observed correlation between the mapping task performance and the adoption of a 2D strategy is logical. However, this correlation might not exclusively indicate the proposed underlying mechanism of spatial scaffolding. Could it also be reflective of more general factors like overall performance, attention levels, or the effort exerted by participants? This alternative explanation suggests that the correlation might arise from broader cognitive engagement rather than specifically from the spatial nature of the task. Addressing this possibility could strengthen the argument for the unique role of spatial representations in learning abstract concepts, or at least this alternative interpretation should be mentioned.

      Response in the public review.

      • To me, the finding that ~30% of participants benefited from the spatial scaffolding effect for example in the auditory condition merely through exposure to the mapping (Fig 4D), without needing to see the quadruplets in the spatial modality, was somewhat surprising. This is particularly noteworthy considering that only ~60% of participants adopted the 2D strategy with exposure to rotated contingencies in Experiment 3 (Fig 3D). How do the authors interpret this outcome? It would be interesting to understand their perspective on why such a significant effect emerged from mere exposure to the mapping task.

      • I appreciate the clarity Fig.1 provides in explaining a challenging experimental setup. Is it possible to provide example trials, including an illustration that shows which rotations produce the trail and an intuitive explanation that response maps onto the 1D vs 2D strategies respectively, to aid the reader in better understanding this core manipulation?

      • I like that the authors provide transparency by depicting individual subject's data points in their results figures (e.g. Figs. 2 B, F, J). However, with an n=~50 per condition, it becomes difficult to intuit the distribution, especially for conditions with higher variance (e.g., Auditory). The figures might be more easily interpretable with alternative methods of displaying variances, such as violin plots per data point, conventional error shading using 95%CIs, etc.

      • Why are the authors not reporting exact BFs in the results sections at least for the most important contrasts?

      • While I understand why the authors report the frequencies for the best model fits, this may become difficult to interpret in some sections, given the large number of reported values. Alternatives or additional summary statistics supporting inference could be beneficial.

      As the Reviewer states, there are a large number of figures that we can report in this study. We have chosen to keep this number at a minimum to be as clear as possible. To illustrate the distribution of individual data points, we have opted to display only the group's mean and standard error (the standard errors are included, but the substantial number of participants per condition provides precise estimates, resulting in error bars that can be smaller than the mean point). This decision stems from our concern that including additional details could lead to a cluttered representation with unnecessary complexity. Finally, we report what we believe to be the critical BFs for the comprehension of the reader in the main text, and choose a cutoff of 100 when BFs are high (corresponding to the label “decisive” evidence, some BFs are larger than 1012). All the exact BFs are in the supplementary for the interested readers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides a near-complete description of the mechanosensory bristles on the Drosophila melanogaster head and the anatomy and projection patterns of the bristle mechanosensory neurons that innervate them. The data presented are solid. The study has generated numerous invaluable resources for the community that will be of interest to neuroscientists in the field of circuits and behaviour, particularly those interested in mechanosensation and behavioural sequence generation.

      We express our gratitude to the Reviewers for their valuable suggestions, which significantly enhanced the manuscript. The revisions were undertaken, not with the expectation of acceptance, but rather driven by our sincere belief that these revisions would enhance the manuscript's impact for future readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Sensory neurons of the mechanosensory bristles on the head of the fly project to the sub esophageal ganglion (SEZ). In this manuscript, the authors have built on a large body of previous work to comprehensively classify and quantify the head bristles. They broadly identify the nerves that various bristles use to project to the SEZ and describe their region-specific innervation in the SEZ. They use dye-fills, clonal labelling, and electron microscopic reconstructions to describe in detail the phenomenon of somatotopy - conserved peripheral representations within the central brain - within the innervation of these neurons. In the process they develop novel tools to access subsets of these neurons. They use these to demostrate that groups of bristles in different parts of the head control different aspects of the grooming sequence.

      Reviewer #2 (Public Review):

      The authors combine genetic tools, dye fills and connectome analysis techniques to generate a "first-of-its-kind", near complete, synaptic resolution map of the head bristle neurons of Drosophila. While some of the BMN anatomy was already known based on previous work by the authors and other researchers, this is the first time a near complete map has been created for the head BMNs at electron microscopy resolution.

      Strengths:

      (1) The authors cleverly use techniques that allow moving back and forth between periphery (head bristle location) and brain, as well as moving between light microscopy and electron microscopy data. This allows them to first characterize the pathways taken by different head BMNs to project to the brain and also characterize anatomical differences among individual neurons at the level of morphology and connectivity.

      (2) The work is very comprehensive and results in a near complete map of all I’m head BMNs.

      (3) Authors also complement this anatomical characterization with a first-level functional analysis using optogenetic activation of BMNs that results in expected directed grooming behavior.

      Weaknesses:

      (1) The clustering analysis is compelling but cluster numbers seem to be arbitrarily chosen instead of by using some informed metrics.

      We made revisions to the manuscript that address this concern. Please see our response to “recommendations for authors” for a description of these revisions.

      (2) It could help provide context if authors revealed some of the important downstream pathways that could explain optogenetics behavioral phenotypes and previously shown hierarchical organization of grooming sequences.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      (3) In contrast to the rigorous quantitative analysis of the anatomical data, the behavioral data is analyzed using much more subjective methods. While I do not think it is necessary to perform a rigorous analysis of behaviors in this anatomy focused manuscript, the conclusions based on behavioral analysis should be treated as speculative in the current form e.g. calling "nodding + backward walking" as an avoidance response is not justified as it currently stands. Strong optogenetic activation could lead to sudden postural changes that due to purely biomechanical constraints could lead to a couple of backward steps as seen in the example videos. Moreover since the quantification is manual, it is not clear what the analyst interprets as backward walking or nodding. Interpretation is also concerning because controls show backward walking (although in fewer instances based on subjective quantification).

      While unbiased machine vision-based methods would nicely complement the present work, this type of analysis is not yet working to distinguish between different head grooming movements. Therefore, we are currently limited to manual annotation for our behavioral analysis. That said, we do not believe that our manual annotation is subjective. The grooming movements that we examine in this work are distinguishable from each other through frame-by-frame manual annotation of video at 30 fps. Our annotation of the grooming and backward motions performed by flies are based on previous publications that established a controlled vocabulary defining each movement (Hampel et al., 2020a, 2017, 2015; Seeds et al., 2014). In this work, we added head nodding to this controlled vocabulary that is described in the Materials and methods. We have added additional text to the third paragraph of the Material and methods section entitled “Behavioral analysis procedures” that we hope better describes our behavioral analysis. This description now reads:

      Head nodding was annotated when the fly tilted its head downward by any amount until it returned its head back in its original position. This movement often occurred in repeated cycles. Therefore, the “start” was scored at the onset of the first forward movement and the “stop” when the head returned to its original position on the last nod.

      We do not make any firm conclusions about the head movements (nodding) and backwards motions. We refer to nodding as a descriptive term that would allow the reader to better understand what the behavior looks like. We make no firm conclusions about any behavioral functional role that either the nodding or the backward motions might have, with the exception of nodding in the context of grooming. We only suggest that the behaviors appear to be avoidance responses. Furthermore, backward walking was not mentioned. Instead we refer to backward motions. We are only reporting our annotations of these movements that do occur, and are significantly different from controls. We speculate that these could be avoidance responses based on support from the literature. Future studies will be required to understand whether these movements serve real behavioral roles.

      Summary:

      The authors end up generating a near-complete map of head BMNs that will serve as a long-standing resource to the Drosophila research community. This will directly shape future experiments aimed at modeling or functionally analyzing the head grooming circuit to understand how somatotopy guides behaviors.

      Reviewer #3 (Public Review):

      Eichler et al. set out to map the locations of the mechanosensory bristles on the fly head, examine the axonal morphology of the bristle mechanosensory neurons (BMNs) that innervate them, and match these to electron microscopy reconstructions of the same BMNs in a previously published EM volume of the female adult fly brain. They used BMN synaptic connectivity information to create clusters of BMNs that they show occupy different regions of the subesophageal zone brain region and use optogenetic activation of subsets of BMNs to support the claim that the morphological projections and connectivity of defined groups of BMNs are consistent with the parallel model for behavioral sequence generation.

      The authors have beautifully cataloged the mechanosensory bristles and the projection paths and patterns of the corresponding BMN axons in the brain using detailed and painstaking methods. The result is a neuroanatomy resource that will be an important community resource. To match BMNs reconstructed in an electron microscopy volume of the adult fly brain, the authors matched clustered reconstructed BMNs with light-level BMN classes using a variety of methods, but evidence for matching is only summarized and not demonstrated in a way that allows the reader to evaluate the strength of the evidence. The authors then switch from morphology-based categorization to non-BMN connectivity as a clustering method, which they claim demonstrates that BMNs form a somatotopic map in the brain. This map is not easily appreciated, and although contralateral projections in some populations are clear, the distinct projection zones that are mentioned by the authors are not readily apparent. Because of the extensive morphological overlap between connectivity-based clusters, it is not clear that small projection differences at the projection level are what determines the post-synaptic connectivity of a given BMN cluster or their functional role during behavior. The claim the somatotopic organization of BMN projections is preserved among their postsynaptic partners to form parallel sensory pathways is not supported by the result that different connectivity clusters still have high cosine similarity in a number of cases (i.e. Clusters 1 and 3, or Clusters 1 and 2). Finally, the authors use tools that were generated during the light-level characterization of BMN projections to show that specifically activating BMNs that innervate different areas of the head triggers different grooming behaviors. In one case, activation of a single population of sensory bristles (lnOm) triggers two different behaviors, both eye and dorsal head grooming. This result does not seem consistent with the parallel model, which suggests that these behaviors should be mutually exclusive and rely on parallel downstream circuitry.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      This work will have a positive impact on the field by contributing a complete accounting of the mechanosensory bristles of the fruit fly head, describing the brain projection patterns of the BMNs that innervate them, and linking them to BMN sensory projections in an electron microscopy volume of the adult fly brain. It will also have a positive impact on the field by providing genetic tools to help functionally subdivide the contributions of different BMN populations to circuit computations and behavior. This contribution will pave the way for further mechanistic study of central circuits that subserve grooming circuits.

      Recommendations for the authors:

      All three reviewers appreciated the work presented in this manuscript. There were also a few overlapping concerns that were raised that are summarised below, should the authors wish to address them:

      Somatotopy: We recommend that the authors describe the extent of prior knowledge in more detail to highlight their contribution better.

      We made revisions that better highlight the extent of prior knowledge about somatotopy. We describe how previous studies showed bristle mechanosensory neurons in insects are somatotopically organized, but these studies were not comprehensive descriptions of complete somatotopic maps for the head or body. To our knowledge, our study provides the first comprehensive and synaptic resolution somatotopic map of a head for any animal. This sets the stage for the complete definition of the interface between somatotopically-organized mechanosensory neurons and postsynaptic circuits, which has broad implications for future studies on aimed grooming, and mechanosensation in general. Below we itemize revisions to the Introduction, Discussion, and Figures to provide a clearer statement of the significance of our study as it relates to somatotopy.

      (1) Newly added Figure 1 – figure supplement 1 more explicitly grounds the study in somatotopy, providing a working model of the organization of the circuit pathways that produce the grooming sequence. This model features somatotopy as shown in Figure 1 – figure supplement 1C.

      (2) Figure 1 – figure supplement 1 is incorporated into the Introduction in the second, third, and fourth paragraphs, the first paragraph of the Results section titled “Somatotopically-organized parallel BMN pathways”, and the second and third paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      (3) We added text to the end of the fourth paragraph of the Introduction that now reads: “In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.”

      (4) There is a Discussion section that further explains the extent of prior knowledge and our contributions on somatotopy that is titled “A synaptic resolution somatotopic map of the head BMNs”. Additionally, the previous version of this section had a paragraph on the broader implications of our work as it relates to somatotopy across species. In light of the reviewer comments, we decided to make this paragraph into its own Discussion section to better highlight the broader significance of our work. This section is titled “First synaptic resolution somatotopic map of the head”.

      The somatotopy isn't overtly obvious - perhaps they could try mapping presynaptic sites and provide landmarks to improve visualisation.

      We made the following revisions to better highlight the head BMN somatotopy. One point of confusion from the previous manuscript version stemmed from us not explicitly defining the somatotopic organization that we observed. There seemed to be confusion that we were defining the head somatotopy based only on the small projection differences among BMNs from neighboring head locations. While we believe that these small differences indeed correspond to somatotopy, we failed to highlight that there are overt differences in the brain projections of BMNs from distant locations on the head. For example, Figure 5B (right panel) shows the distinct projections between the LabNv (brown) and AntNv (blue) BMNs that innervate bristles on the ventral and dorsal head, respectively. Thus, BMN types innervating neighboring bristles show overlapping projections with small projection differences, whereas those innervating distant bristles show non overlapping projections into distinct zones.

      Our analysis of postsynaptic connectivity similarity also shows somatotopic organization among the BMN postsynaptic partners, as BMN types innervating the same or neighboring bristle populations show high connectivity similarity (Figure 8, old Figure 7). Below we highlight major revisions to the text and Figures that hopefully better reveal the head somatotopy.

      (1) In the last paragraph of the Introduction we added text that explicitly frames the experiments in terms of somatotopic organization: “This reveals somatotopic organization, where BMNs innervating neighboring bristles project to the same zones in the CNS while those innervating distant bristles project to distinct zones. Analysis of the BMN postsynaptic connectome reveals that neighboring BMNs show higher connectivity similarity than distant BMNs, providing evidence of somatotopically organized postsynaptic circuit pathways.”

      (2) We mention an example of overt somatotopy from Figure 5 in the Results section titled “EM-based reconstruction of the head BMN projections in a full adult brain”. The text reads “For example, BMNs from the Eye- and LabNv have distinct ventral and anterior projections, respectively. This shows how the BMNs are somatotopically organized, as their distinct projections correspond to different bristle locations on the head (Figure 5B,C).”

      (3) In new Figure 8 (part of old Figure 7), we modified panels that correspond to the cosine similarity analysis of postsynaptic connectivity. The major revision was to plot the cosine similarity clusters onto the head bristles so that the bristles are now colored based on their clusters (C). This shows how neighboring BMNs cluster together, and therefore show similar postsynaptic connectivity. We believe that this provides a nice visualization of somatotopic organization in BMN postsynaptic connectivity. We also added the clustering dendrogram as recommended by Reviewer #2 (Figure 8A).

      (4) In new Figure 8, we added new panels (D-F) that summarize our anatomical and connectomic analysis showing different somatotopic features of the head BMNs. Different BMN types innervate bristles at neighboring and distant proximities (D). BMNs that innervate neighboring bristles project into overlapping zones (E, example of reconstructed BM-Fr and -Ant neurons with non-overlapping BM-MaPa neurons) and show postsynaptic connectivity similarity (F, example connectivity map of three BM types on cosine similarity data).

      (5) To accompany the new Figure 8D-F panels, we added a paragraph to summarize the different somatotopic features of the head BMNs that were identified based on our anatomical and connectomic analysis. This is the last paragraph in the Results section titled “Somatotopically-organized parallel BMN pathways”:

      Our results reveal head bristle proximity-based organization among the BMN projections and their postsynaptic partners to form parallel mechanosensory pathways. BMNs innervating neighboring bristles project into overlapping zones in the SEZ, whereas those innervating distant bristles project to distinct zones (example of BM-Fr, -Ant, and -MaPa neurons shown in Figure 8D,E). Cosine similarity analysis of BMN postsynaptic connectivity revealed that BMNs innervating the same bristle populations (same types) have the highest connectivity similarity. Figure 8F shows example parallel connections for BM-Fr, -Ant, and -MaPa neurons (vertical arrows), where the edge width indicates the number of synapses from each BMN type to their major postsynaptic partners. Additionally, BMNs innervating neighboring bristle populations showed postsynaptic connectivity similarity, while BMNs innervating distant bristles show little or none. For example, BM-Fr and -Ant neurons have connections to common postsynaptic partners, whereas BM-MaPa neurons show only weak connections with the main postsynaptic partners of BM-Fr or -Ant neurons (Figure 8F, connections under 5% of total BMN output omitted). These results suggest that BMN somatotopy could have different possible levels of head spatial resolution, from specific bristle populations (e.g. Ant bristles), to general head areas (e.g. dorsal head bristles).

      We also refer to Figure 8D-F to illustrate the different somatotopic features in the Discussion. These references can be found in the following Discussion sections titled “A synaptic resolution somatotopic map of the head BMNs (fourth paragraph)”, and “Parallel circuit architecture underlying the grooming sequence (second paragraph)”.

      (6) In addition to improving the Figures, we provide additional tools that enable readers to explore the BMN somatotopy in a more interactive way. That is, we provide 5 different FlyWire.ai links in the manuscript Results section that enable 3D visualization of the different reconstructed BMNs (e.g. FlyWire.ai link 1).

      Note: In working on old Figure 7 to address this Reviewer suggestion, we also reordered panels A-E. We believe that this was a more logical ordering than in the previous draft. These panels are now the only data shown in Figure 7, as the cosine similarity analysis is now in Figure 8. We hope that splitting these panels into two Figures will improve manuscript readability.

      Light EM Mapping: A better description of methods by which this mapping was done would be helpful. Perhaps the authors could provide a few example parallel representations of the EM and light images in the main figure would help the reader better appreciate the strength of their approach.

      We have done as the Reviewers suggested and added panels to Figure 6 that show examples of the LM and EM image matching (Figure 6A,B). We added two examples that used different methods for labeling the LM imaged BMNs, including MCFO labeling of an individual BM-InOc neuron and driver line labeling of a major portion of BM-InOm neurons using InOmBMN-LexA. These panels are referred to in the first paragraph of the Results section titled “Matching the reconstructed head BMNs with their bristles”. Note that examples for all LM/EM matched BMN types are shown in Figure 6 – figure supplement 2.

      We had provided Figure 6 – figure supplement 2 in the reviewed manuscript that shows all the above requested “parallel representations of the EM and light images”. However, the Reviewer critiques made us realize that the purpose of this figure supplement was not clearly indicated. Therefore, we have revised Figure 6 – figure supplement 2 and its legend to make its purpose clearer. First, we changed the legend title to better highlight its purpose. The legend is now titled: “Matching EM reconstructed BMN projections with light microscopy (LM) imaged BMNs that innervate specific bristles”. Second, we added label designations to the figure panel rows that highlight the LM and EM comparisons. That is, the rows for light microscopy images of BMNs are indicated with LM and the rows for EM reconstructed BMN images are labeled with EM. Reviewer #3 had indicated that it was not clear what labeling methods were used to visualize the LM imaged BM-InOm neurons in Figure 6 – figure supplement 2N. Therefore, we added text to the figure and the legend to better highlight the different methods used. Panels A and B were also cropped to accommodate the above mentioned revisions.

      The manuscript also provides an extensive Materials and methods section that describes the different lines of evidence that were used to assign the reconstructed BMNs as specific types. We changed the title to better highlight the purpose of this methods section to “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. The evidence used to support the assignment of the different BMN types is also summarized in Figure 6 – figure supplement 3.

      Parallel circuit model: The authors motivate their study with this. We're recommending that they define expectations of such circuitry, its alternatives (including implications for downstream pathways), and behavior before they present their results. We're also recommending that they interpret their behavioural results in the context of these circuits.

      Our primary motivation for doing the experiments described in this manuscript was to help define the neural circuit architecture underlying the parallel model that drives the Drosophila grooming sequence. This manuscript provides a comprehensive assessment of the first layer of this circuit architecture. A byproduct of this work is a contribution that offers immediate utility and significance to the Drosophila connectomics community. Namely, the description of the majority of mechanosensory neurons on the head, with their annotation in the recently released whole brain connectome dataset (FlyWire.ai). In writing this manuscript, we tried to balance both of these things, which was difficult to write. We very much appreciate the Reviewers' comments that have highlighted points of confusion in our original draft. We hope that the revised draft is now clearer and more logically presented. We have made revisions to the text and provided a new figure supplement (Figure 1 - figure supplement 1) and new panels in Figure 8. Below we highlight the major revisions.

      (1) The Introduction was revised to more explicitly ground the study in the parallel model, while also removing details that were not pertinent to the experiments presented in the manuscript.

      The first paragraph introduces different features of the parallel model. To better focus the reader on the parts of the model that were being assessed in the manuscript, we removed the following sentences: “Performance order is established by an activity gradient among parallel circuits where earlier actions have the highest activity and later actions have the lowest. A winner-take-all network selects the action with the highest activity and suppresses the others. The selected action is performed and then terminated to allow a new round of competition and selection of the next action.” Note that these sentences are included in the third and fourth paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      The first paragraph of the Introduction now introduces a bigger picture view of the model that emphasizes the two main features: 1) a parallel circuit architecture that ensures all mutually exclusive actions to be performed in sequence are simultaneously readied and competing for output, and 2) hierarchical suppression among the parallel circuits, where earlier actions suppress later actions.

      (2) Newly added Figure 1 – figure supplement 1 provides a working model of grooming (Reviewer # 1 suggestion). We now more strongly emphasize that the study aimed to define the parallel neural circuit architecture underlying the grooming sequence, focusing on the mechanosensory layer of this architecture. In particular, we refer to the new Figure 1 – figure supplement 1 that has been added to better convey the hypothesized grooming neural circuit architecture. Figure 1 – figure supplement 1 is incorporated into the Introduction (paragraphs two, three, and four), Results section titled “Somatotopically-organized parallel BMN pathways (first paragraph)”, and last Discussion section titled “Parallel circuit architecture underlying the grooming sequence (second and third paragraphs)”.

      (3) New panels in Figure 8 update the model of parallel circuit organization as it relates to somatotopy (D-F). These panels show the parallel circuits hypothesized by the model, but also indicate convergence, with different possible levels of head resolution for these circuits. We describe above where these panels are referenced in the text.

      (4) We added a new paragraph in the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence” that better incorporates the results from this manuscript into the working model of grooming. This paragraph is shown below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      Aside from this summary of major concerns, the detailed recommendations are attached below.

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the quality and exhaustive body of work presented in this manuscript. I have a few comments that the authors may want to consider:

      (1) The authors motivate this study by posing that it would allow them to uncover whether the complex grooming behaviour of flies followed a parallel model of circuit function. It would have been nice to have been introduced to what the alternative model might be and what each would mean for organisation of the circuit architecture. Some guiding schematics would go a long way in illustrating this point. Modifying the discussion along these lines would also be helpful.

      We made several revisions to the manuscript that address this recommendation. Among these revisions, we added Figure 1 – figure supplement 1 that includes a working model for grooming. Please see above for a description of these revisions.

      (2) The authors mention the body of work that has mapped head bristles and described somatotopy. It would be useful to discuss in more detail what these studies have shown and highlight where the gaps are that their study fills.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (3) The dye-fills and reconstructions that are single colour could use a boundary to demarcate the SEZ. This would help in orienting the reader.

      We agree with Reviewer #1 that Figure 4 and its supplements could use some indicator that would orient the reader with respect to the dye filled or stochastically labeled neurons. The images are of the entire SEZ in the ventral brain, and in the case of some panels, the background staining enables visualization of the brain (e.g. Figure 4H,M,N. To help orient the reader in this region, we added a dotted line to indicate the approximate SEZ midline. This also enables the reader to more clearly see which of the BMN types cross the midline.

      Midline visual guides were added for Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      (4) The comparison between the EM and the fills/clones are not obvious. And particularly because they are not directly determined, it would be nice to have the EM reconstruction alongside the dye-fills. This would work very nicely in the supplementary figure with the multiple fills of the same bristles. I think this would really drive home the point.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (5) Are there unnoticed black error-bars floating around in many of the gray-scale images?

      The black bars were masking white scale bars in the images. We have removed the black bars and remade the images without scale bars. This was done for the following Figures: Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) The only point in the paper I found myself going back and forth between methods/supp and text was when authors discuss about the clustering. I think it would help the reader if a few sentences about cosine clustering used for connectivity based clustering were included in the main text. Also, for NBLAST hierarchical clustering, it would help if some informed metrics could be used for defining cluster numbers (e.g. Braun et al, 2010 PLOS ONE shows how Ward linkage cost could be used for hierarchical clustering).

      Depending on where the cut height is placed on the dendrogram for cosine similarity of BMNs, different features of the BMN type postsynaptic connectivity are captured. As the number of clusters is increased (lower cut height), clustering is mainly among BMNs of the same type, showing that these BMNs have the highest connectivity similarity. As the number of clusters is reduced (higher cut height), BMNs innervating neighboring bristles on the head are clustered, revealing three general clusters corresponding to the dorsal, ventral, and posterior head. This reveals somatotopy based clustering among same and neighboring BMN types. The cut height shown in Figure 8 and Figure 8 – figure supplement 2 was chosen because it highlighted both of these features.

      The NBLAST clustering shows similar results to the connectivity based clustering with respect to neighboring and distant BMN types. As the number of clusters increases BMNs of the same type are clustered, and these types can be further subdivided into morphologically distinct subtypes. As the number of clusters is reduced, the clustering captures neighboring BMNs. Thus, neighboring BMN types showed high morphology similarity (and proximity) with each other, and low similarity with distant BMN types.

      Please see our responses to a Reviewer #3 critique below for further description of the clustering results.

      On the same lines it would help if the clustering dendrograms were included in the main figure.

      We thank Reviewer #2 for this comment. We have added the dendrogram to Figure 8A, a change that we feel makes this Figure much easier to understand.

      (2) It could help provide intuition if the authors revealed some of the downstream targets and their implication in explaining the behavioral phenotypes.

      While this will be the subject of at least two forthcoming manuscripts, we have added text to the present manuscript that provides insight into BMN postsynaptic targets. Our previous work (Hampel et al. 2015) described a mechanosensory connected neural circuit that elicits grooming of the antennae. While this previous study demonstrated that the Johnston’s organ mechanosensory neurons are synaptically and functionally connected with this circuit, our preliminary analysis indicates that it is also connected with BM-Ant neurons. We hypothesize that there are additional such circuits that are responsible for eliciting grooming of other head locations.

      To better highlight potential downstream targets in the manuscript, we now mention the antennal circuit in the Introduction. This text reads: In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.

      There is also text in the Discussion that addresses this Reviewer comment. It describes the antennal circuit and mentions the possibility that other similar circuits may exist. This can be found in the third paragraph of the section titled “Circuits that elicit aimed grooming of specific head locations”.

      (3) Authors find that opto activation of BMNs leads to grooming of targeted as well as neighboring areas. Is there any sequence observed here? i.e. first clean targeted area and then clean neighboring area? I wonder if the answer to this is something as simple as common post-synaptic targets which is essentially reducing the resolution of the BMN sensory map. Some more speculation on this interesting result could be helpful.

      We appreciate and agree with this point from Reviewer #2, and have tried to better emphasize the possible implications for grooming that the overlapping projections and connectivity among BMNs innervating neighboring bristles may have. This is now better addressed in the Results and Discussion sections. Below we highlight where this is addressed:

      (1) In the second paragraph of the Results section titled “Activation of subsets of head BMNs elicits aimed grooming of specific locations” we added text that suggests the possibility that grooming of the stimulated and neighboring locations could be due to the overlapping projections and connectivity. This text reads: This suggested that head BMNs elicit aimed grooming of their corresponding bristle locations, but also neighboring locations. This result is consistent with our anatomical and connectomic data indicating that BMNs innervating neighboring bristles show overlapping projections and postsynaptic connectivity similarity (see Discussion).

      (2) In the fourth paragraph of the Discussion section titled “A synaptic resolution somatotopic map of the head BMNs”, we added a sentence to the end of the fourth paragraph that alludes to further discussion of this topic. This sentence reads: This overlap may have implications for aimed grooming behavior. For example, neighboring BMNs could connect with common neural circuits to elicit grooming of overlapping locations (discussed more below).

      (3) In the fourth paragraph of the Discussion section titled “Circuits that elicit aimed grooming of specific head locations” there is a paragraph that mentions the possibility of mechanosensory convergence onto common postsynaptic circuits to promote grooming of the stimulated area, along with neighboring areas. This paragraph is below.

      We find that activation of specific BMN types elicits both aimed grooming of their corresponding bristle locations and neighboring locations. This suggests overlap in the locations that are groomed with the activation of different BMN types. Such overlap provides a means of cleaning the area surrounding the stimulus location. Interestingly, our NBLAST and cosine similarity analysis indicates that neighboring BMNs project into overlapping zones in the SEZ and show common postsynaptic connectivity. Thus, we hypothesize that neighboring BMNs connect with common neural circuits (e.g. antennal grooming circuit) to elicit overlapping aimed grooming of common head locations.

      (4) In the new second paragraph of the Discussion section titled “Parallel circuit architecture underlying the grooming sequence” we further discuss the issue of the BMN “sensory map. This paragraph is below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      (4) If authors were to include a summary table that shows all known attributes about BMN type as columns that could be very useful as a resource to the community. Table columns could include attributes like "bristle name", "nerve tract", "FlyWire IDs of all segments corresponding to the bristle class". "split-Gal4 line or known enhancer" , etc.

      We provided a table that includes much of this information after the manuscript had already gone out for review. We regret that this was not available. This is now provided as Supplementary file 3. This table provides the following information for each reconstructed BMN: BMN name, bristle type, nerve, flywire ID, flywire coordinates, NBLAST cluster (cut height 1), NBLAST cluster (cut height 5), and cosine cluster (cut height 4.5). Note that the driver line enhancers for targeting specific BMN types are shown in Figure 3I.

      Specific Points:

      Figure 4C-V:

      • I find it a bit difficult to distinguish ipsi- from contra-lateral projections. Maybe indicate the midline as a thin, stippled line?

      We thank the Reviewer #2 for this suggestion. We have now added lines in the panels in Figure 4C-V to indicate the approximate location of the midline. We also added lines to the Figure 4 – figure supplements as described above.

      I think this Fig reference is wrong "the red-light stimulus also elicited backward motions with control flies (Figure 6B,C, control, black trace, Video 5)." should be Fig 8B,C

      We have fixed this error.

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      Motivating this study in terms of understanding the neural mechanisms that execute the parallel model seems to overstate what you will achieve with the current study. If you want to motivate it this way, I suggest focusing on the grooming sequence of the head along (eyes, antennae, proboscis).

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that many of the revisions focus on the head grooming sequence. We also made minor revisions to the Introduction that further emphasize the focus on head grooming.

      Results:

      Figure 1. Please indicate that this is a male fly in either the figure title or in the figure itself.

      We added a male symbol to Figure 1A.

      Figure 3. Panel J is referenced in the main body text and in the figure caption, but there is no Fig 3J.

      Panel J is shown in the upper right corner of Figure 3. We realize that the placement of this panel is not ideal, but this was the only place that we could fit it. Additionally, the panel works nicely at that location to better enable comparison with panel C. We have revised the text in the Figure 3 legend to better highlight the location of this Figure panel: “Shown in the upper right corner of the figure are the aligned expression patterns of InOmBMN-LexA (red), dBMN-spGAL4 (green), and TasteBMN-spGAL4 (brown).”

      We also added text to a sentence in the results section entitled “Head BMNs project into discrete zones in the ventral brain” that indicates the panel location. This text reads: To further visualize the spatial relationships between these projections, we computationally aligned the expression patterns of the different driver lines into the same brain space (Figure 3J, upper right corner).

      Matching the BMNs to EM reconstructions: why cut the dendrogram at H=5? Would be better to determine cluster number using an unbiased method.

      To match the morphologically distinct EM reconstructed BMNs to their specific bristles, we relied on different lines of evidence, including NBLAST results (discussed more below), dye fill/stochastic labeling/driver line labeling matches, published morphology, nerve projection, bristle number, proximity to other BMNs, and postsynaptic connectivity (summarized in Figure 6 – figure supplement 3). The following Materials and methods section provides a detailed description of the evidence used to assign each BMN type in “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. In many cases, BMN type could be assigned with confidence solely based on morphological comparisons with our light level data (e.g. dye fills), in conjunction with bristle counts to indicate an expected number of BMNs showing similar morphology. Thus, the LM/EM matches and NBLAST clustering were largely complementary.

      The EM reconstructed BMNs were matched as particular BMN types, in part based on examination of the NBLAST data at different cut heights. NBLAST clustering of the BMNs revealed general trends at higher and lower cut heights (Figure 6 – figure supplement 1A, Supplementary file 3). The lowest cut heights included mostly BMNs of the same type innervating the same bristle populations, and smaller clusters that subdivided into morphologically distinct subtypes (see Supplementary file 3 for clusters produced at cut height 1). This revealed that BMNs of the same type tended to show the highest morphological similarity with each other, but they also showed intratype morphological diversity. Higher cut heights produced clusters of BMNs innervating neighboring bristles populations (e.g. ventral head BMNs), showing high morphological similarity among neighboring BMN types.

      We selected the cut height 5 shown in Figure 6 – figure supplement 1A,B because it captures examples of both same and neighboring type clustering. For example, it captures a cluster of mostly BM-Taste neurons (Cluster 16), and neighboring BMN types, including those from the dorsal head (Cluster 14) or ventral head (Cluster 15).

      Based on reviewer comments, we realized that the way we wrote the BMN matching section in the Results indicated more reliance on the NBLAST clustering than what was actually necessary, distorting the way we actually matched the BMNs. Therefore, we softend the first couple of sentences to place less emphasis on the importance of the NBLAST. We also indicated that the readers can find the resulting clusters at different cut heights, referring to Figure 6 – figure supplement 1A and Supplementary file 3. The first two sentences of the first paragraph in the Results section titled “Matching the reconstructed head BMNs with their bristles” now read:

      The reconstructed BMN projections were next matched with their specific bristle populations. The projections were clustered based on morphological similarity using the NBLAST algorithm (example clustering at cut height 5 shown in Figure 6 – figure supplement 1A,B, Supplementary file 3, FlyWire.ai link 2) (Costa et al., 2016). Clusters could be assigned as BMN types based on their similarity to light microscopy images of BMNs known to innervate specific bristles.

      The number of reconstructed BMNs is remarkably similar to what is expected based on bristle counts for each group except for lnOm. Why do you think there is such a large discrepancy there?

      We believe that there is a discrepancy between the number of reconstructed BM-InOm neurons and the number expected based on InOm bristle counts because these bristle counts were based on few flies and these numbers appear to be variable. We did not further investigate the numbers of InOm bristles in this manuscript because we only needed an estimate of their numbers, given that there is over an order of magnitude difference in the eye bristles versus any other head bristle population. Therefore, we could relatively easily conclude that the head BMNs were related to the InOm bristles, based on their sheer numbers and their morphology.

      Figure 6 - figure supplement 2N, please describe these panels better. Main text says the upper image is from lnOmBMN-LexA, but the figure legend doesn't agree.

      We have added text to the figure legend that now makes the contents of panel 2N clear to the reader. Further, we now indicate in the figure legend for each panel, the method used to obtain the labeled neurons (i.e. fill, MCFO, driver), to avoid similar confusion for the other panels.

      Figure 6 - figure supplement 4D. How frequently is there a mismatch between the number of BMNs for a given type across hemispheres?

      Although the full reconstruction of the BMNs on both sides of the brain was beyond the scope of this work, the BMNs on both sides have since been reconstructed and annotated (Schlegal et al. 2023). We plan to provide more analysis of BMNs on both sides of the brain in a forthcoming manuscript. However, the BMN numbers tend to show agreement on both sides of the brain. The table below shows a comparison between the two sides:

      Author response table 1.

      Figures 6 and 7. It would be helpful to include a reference brain in all panels that show cluster morphology. Without landmarks there is nothing to anchor the eye to allow the reader to see the described differences in BMN projection zones and patterns.

      While we apologize for not making this specific change, we have made revisions to other parts of the manuscript to better highlight the somatotopic organization among the BMNs (revisions described above). Please note that we now provide FlyWire.ai publicly available links that enable readers to view the BMN projections in 3D. They can also toggle a brain mesh on and off to provide spatial reference.

      "BMN somatotopic map": It would be helpful to show or describe in more detail what the unique branch morphology for each zone is. It is quite difficult to appreciate, as the groups also have a lot of overlap. Would the unique regions that the BMN groups innervate be easier to see if you plotted presynaptic sites by group? I am left unsure about whether there is a somatotopic map here.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that we did not examine the fine branch morphological differences between BMN types having overlapping projections. Showing these differences would require more extensive anatomical analysis that is beyond the scope of this work. For showing definitive somatotopy, we focused on the overt differences between BMNs innervating bristles at distant locations on the head.

      Overall the strict adherence to the parallel model impacts the interpretation of the data. It would be helpful for the authors to discuss which aspects of the current study are consistent with the parallel model and which results are not consistent.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      Discussion:

      "Circuits that elicit aimed grooming of specific head locations": In the previous paragraph you mention "BMN types innervating neighboring bristle populations have overlapping projections into zones that correspond roughly to the dorsal, ventral, and posterior head. The overlap is likely functionally significant, as cosine similarity analysis revealed that neighboring head BMN types have common postsynaptic partners. However, overlap between neighboring BMN types is only partial, as they show differing projections and postsynaptic connectivity." Then in this paragraph, you say, "How do the parallel-projecting head BMNs interface with postsynaptic neural circuits to elicit aimed grooming of specific head locations? Different evidence supports the hypothesis that the BMNs connect with parallel circuits that each elicit a different aimed grooming movement (Seeds et al., 2014)." The overlapping postsynaptic BMN connectivity seems in conflict with the claim that the circuits are parallel.

      We apologize for this confusion. We now better describe this apparent discrepancy between our results and the parallel model of grooming behavior. We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      We have made additional changes to the manuscript:

      (1) We added Supplementary file 2 that includes links for downloading the image stacks used to generate panels in Figure 1, Figure 2, Figure 3, Figure 4, and figure supplements for these figures. These image stacks are stored in the Brain Image Library (BIL). Rows in the spreadsheet correspond to each image stack. Columns provide information about each stack including: figure panels that each image stack contributed to, image stack title, DOI for each stack (link provides metadata for each stack and file download link), image stack file name, genotype of imaged fly, and information about image stack. References to this file have been made at different locations throughout the text and Figure legends. We also added a section on the BIL data in the Materials and methods entitled “Light microscopy image stack storage and availability”. Old Supplementary file 2 has been renamed Supplementary file 3.

      (2) We added a new reference for FlyWire.ai (Dorkenwald et al. 2023) that was posted as a preprint during the revision of this manuscript.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      • Is the coronal slice in Figure 2 the corresponding mid-coronal plane to compute Dice scores? If so, the authors could mention it so that readers have an idea where the selected slice is.

      This is indeed a good point. The coronal slice in Figure 2 is not part of the set of slices that we used to compute Dice scores. Showing such a slice is important, so we have added a small figure to the appendix with one of these slices, along with the corresponding automated segmentations.

      • SIFT descriptors were adopted to detect fiducials only. Maybe it could also be applied to align stacked photographs of brain slices.

      While SIFT is robust against changes in pose (e.g., object rotation), perspective, and lightning, it is not robust against changes in the object itself – such as changes between one slice to the next, as is the case in our work. We have added a sentence to the methods section clarifying this issue.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Weaknesses:

      Start site fidelity in purified recons5tuted systems can be drama5cally altered in different buffer condi5ons. Interpreta5on of the observed changes to start site selec5on in mRNAs in the absence or presence of Ded1 using only the one buffer condi5on used is therefore limited.

      This is an excellent point and is something we could explore in future studies using the Rec-Seq system. We have added this caveat to the Discussion on lines 797-809. We have previously studied the fidelity of start codon recogni>on in the recons>tuted system (Kolitz et al., [2009] RNA, 15:138-152) and found that under our standard buffer condi>ons the codon specificity generally reflects what we observed in vivo using a dual-luciferase reporter assay, with the most stable 48S complexes forming on AUG codons, followed by first posi>on mismatches (GUG, UUG, CUG), with second and third posi>on mismatches leading to significantly less stable complexes. However, as the reviewer notes, there are some devia>ons: ACG and AUA are poor codons in the in vitro system under the buffer condi>ons used but allowed rela>vely strong expression in our in vivo reporter assay. It should also be noted that the hierarchy of nearcognate start codon usage in vivo in yeast differs according to the study and the reporter used, making it difficult to establish a “ground truth” for start codon fidelity.

      I have some specific comments to strengthen the manuscript and address some minor issues.

      It is not clear to me whether the authors refold the purified mRNA aEer phenol/chloroform extrac5on? Have the authors observed different results if the mRNA is refolded or not? This is appropriate since the authors compare their Rec-Seq data to PARS scores that were generated from refolded mRNAs. One assumes that the total mRNA used is refolded in the same way as the PARS score study, but this is not clearly stated. The authors should make this point clear in the text and methods.

      This is an excellent point. We did not use the final refolding protocol that Kertesz et al. used when they developed their PARS scores and now clarify this in the Methods sec>on (lines 962967). It is possible that we would have seen stronger correla>ons in the analyses using PARS scores had we followed the renatura>on protocol, although the fact that we observed significant correla>ons (e.g., Fig. 3E-H) suggests the structures in the Kertesz et al. mRNAs were similar to those in our mRNAs.

      It is not clear how the authors determine the concentra5on of total mRNA that is used in the assay - reported as 60 nM? Are the authors assuming a molecular weight of an average mRNA to determine the concentra5on? The authors should provide more detail for how they quan5fy their mRNA concentra5on and its stoichiometry compared to 43S PICs.

      We thank the reviewer for poin>ng out this oversight and have now included this informa>on on lines 849-855 of the Methods sec>on.

      Comments regarding start site fidelity in the recons5tuted system:

      The authors use in vitro transcribed tRNAi-Met. Since tRNA modifica5ons may play a role in start site fidelity, the authors should perhaps men5on that this will need to be inves5gated in a future study in the discussion.

      This is a good point and we now note it as a caveat in the Discussion on lines 806-809.

      The authors state that Ded1 promotes leaky scanning regardless of the mAUG start site context (page 24; lines 533-534). The authors then state on page 25 that the level of iAUG ini5a5on rela5ve to mAUG ini5a5on does depend on the mAUG context (lines 545-546). This seems contradictory unless I am not understanding this correctly? It would certainly be surprising that mAUG context didn't regulate leaky scanning in the recons5tuted system given the fact that ini5a5on codon context regulates selec5on in cells (when Ded1 is present).

      These statements are correct as wrihen. As shown in Figure 5O, the frequency of leaky scanning (as measured by rela>ve ribosome occupancy of the internal region of the ORF, not including the main start codon, to the whole ORF, including the main start codon; RRO) decreases as the context score around the start codon gets stronger (green and purple lines). The RRO is increased to the same extent when 500 nM Ded1 is added, regardless of the strength of the start codon context, indica>ng that Ded1 enhances leaky scanning equally (compare slopes of the green line without Ded1 to the purple line with Ded1). Because of this, the effect of Ded1 on RRO (DRR0) is constant across context score bins (orange line). There is no discrepancy between our two conclusions that leaky scanning of the mAUG increases as context score decreases and that Ded1 increases leaky scanning equally for good and bad mAUG contexts, indica>ng that Ded1 does not inspect the mAUG context and simply decreases the dwell >me equally at all contexts.

      Further to the start site context ques5on. It is possible that the fidelity of the recons5tuted system (i.e. buffer condi5ons) is not fully reflec5ng in vivo-like start site selec5on. A rigorous characteriza5on of commercially available re5culocyte lysate systems iden5fied buffer condi5ons that provided similar start site fidelity to that observed in live cells (Kozak. Nucleic Acids Res. 1990 May 11;18(9):2828). While I feel that it is beyond the context of the current work to undertake a similar rigorous buffer characteriza5on, one must be careful about interpre5ng the results about leaky scanning and upstream ini5a5on sites in the current work. Perhaps one would observe similar results to Guenther et al. if the fidelity (buffer condi5ons) of the recons5tuted system were different? I appreciate that the authors state that their results only apply to their recons5tuted system and do not necessarily suggest that previous data are incorrect, but with only one buffer condi5on being tested in the current study it may be appropriate to further soEen the interpreta5on of the current results when compared to published data in live cells.

      This point is well-taken. As noted above, we have added a caveat about possible effects of buffer condi>ons on start codon fidelity to the Discussion (lines 797-809). In terms of the possibility that upstream ini>a>on is more frequent in vivo than we observe in the in vitro RecSeq system, we previously studied 5’UTR transla>on in vivo using ribosome profiling (Kulkarni et al. [2019] BMC Biol., 17:101). The ra>o of RPFs in 5’UTRs to coding sequences in this study was 0.0027, very similar to the value measured in the in vitro Rec-Seq system in the presence of Ded1 (0.0016-0.0017). Thus, it does not seem that the frequency of upstream ini>a>on is drama>cally higher in vivo than in our in vitro system. We have now made note of this point in the Results (lines 594-598). Guenther et al. employed a ribosome profiling protocol in which they added cycloheximide to their cells prior to lysis, which has been shown to create significant ar>facts, par>cularly in 5’UTR transla>on (e.g., Gerashchenko and Gladyshev [2014] Nucleic Acids Res., 42:e134). Nevertheless, as suggested by the reviewer, we have modified the text in the Results and Discussion to somen the interpreta>on somewhat (lines 582-583; 616-618; 761763).

      Reviewer #2

      Weaknesses:

      Several findings in this report are quite surprising and may require addi5onal work to fully interpret. Primary among these is the finding that Ded1p s5mulates accumula5on of PICs at internal site in mRNA coding sequences at an incidence of up to ~50%. The physiological relevance of this is unclear.

      We agree with the reviewer that understanding the physiological significance, if any, of the apparent leaky scanning of main AUG start codons induced by Ded1 is an unanswered ques>on that will require addi>onal studies. It is possible that rapid 60S subunit joining and forma>on of the 80S ini>a>on complex amer start codon recogni>on on most mRNAs reduces the leaky scanning effect in vivo. We now bring up this possibility in the Discussion sec>on (lines 804809). However, as noted in lines 568-580, mRNAs that display significantly decreased mRPFs at 500 nM Ded1 in the Rec-Seq system also tend to have TEs that are increased in the ded1-cs- mutant rela>ve to WT yeast in in vivo ribosome profiling experiments, sugges>ng that Ded1 ac>vity also diminishes ini>a>on on mAUG codons in these mRNAs in vivo.

      A limita5on of the methodology is that, as an endpoint assay, Rec-Seq does not readily decouple effects of Ded1p on PIC-mRNA loading from those on the subsequent scanning step where the PIC locates the start codon. Considering that Ded1p ac5vity may influence each of these ini5a5on steps through dis5nct mechanisms - i.e., binding to the mRNA cap-recogni5on factor eIF4F, or direct mRNA interac5on outside eIF4F - addi5onal studies may be needed to gain deeper mechanis5c insights.

      We agree that this is a limita>on of the Rec-Seq assay and now men>on this point in the Discussion sec>on (lines 810-817). It is possible that future work using cross-linking agents to stabilize 43S complexes bound near the cap and scanning the 5’UTR, similar to the methodology used in 40S ribosome profiling, could enable us or others to disentangle these steps from one another.

      As the authors note, the achievable Ded1p concentra5ons in Rec-Seq may mask poten5al effects of Ded1p-based granule forma5on on transla5on ini5a5on. Addi5onal factors present in the cell could poten5ally also promote this mechanism. Consequently, the results do not fully rule out granule forma5on as a poten5al parallel Ded1p-mediated transla5on-inhibitory mechanism in cells.

      We agree. As stated in the Discussion sec>on (lines 735-741): “It is possible that at higher concentra>ons of Ded1 than were achievable in these in vitro experiments or in the presence of addi>onal factors that modify Ded1’s ATPase or RNA binding ac>vi>es the factor could directly inhibit a subset of mRNAs, by ac>ng as an mRNA clamp that impedes scanning by the PIC, or by sequestering the mRNAs in insoluble condensates. It might be interes>ng in the future to test candidate factors in Rec-Seq to determine if they switch Ded1 from being a s>mulatory helicase to an inhibitory mRNA clamp that removes transcripts from the soluble phase.”

      It is certainly clear why the 15-minute 5mepoint was chosen for these assays. However, I wondered whether data from an earlier 5mepoint would provide useful informa5on. The descrip5on on line 210 of the compiled PDF suggests data from different 5mepoints may be available; if it is, in my view it could be a useful addi5on. More generally, including language about the single-turnover nature of these reac5ons may be helpful for the benefit of a broad audience.

      In preliminary experiments, we have used the Rec-Seq system to measure the kine>cs of 48S PIC forma>on transcriptome-wide. As you probably can imagine, this is a challenging experiment and requires addi>onal work before we would feel comfortable publishing it. We very much agree with the reviewer that resolving the kine>cs of these events will provide important addi>onal informa>on. As suggested, we have added caveats about the endpoint and single-turnover nature of the assay to the Discussion (lines 821-828).

      I wondered whether it might be useful to present addi5onal informa5on on the mRNAs not found in the assay. For example, are these the least abundant mRNAs, which may not have had 5me to recruit the 43S PIC?

      75% of mRNAs (2719 of 3640) not observed in the Rec-Seq analysis had densi>es below the median (2.3 reads per nucleo>de). We now men>on this in the Methods sec>on (lines 855856).

      The Rec-Seq recruitment reac5ons were carried out at 22C˚ . Considering that remodeling of RNA structure by helicase enzymes is a focal point of the study, linking the results to the recruitment landscape at a closer-to-physiological temperature may bolster the conclusions.

      In the future, it would be interes>ng to test the effects of temperature on 48S PIC forma>on using the Rec-Seq system. As the reviewer suggests, the interplay between temperature and mRNA structure could reveal interes>ng phenomenon. It is worth no>ng, however, that there is no clear “physiological” temperature for S. cerevisiae. For consistency and convenience, lab yeast is usually grown at 30 ˚C, but in the wild yeast live at a wide range of temperatures, which generally change throughout the day. From this standpoint, 22 ˚C seems reasonably physiological.

      Results from Rec-seq experiments conducted at 15° C might be more directly comparable to in vivo Ribo-seq data with the ded1-cs mutant. However, already ~90% of the Ded1hyperdependent mRNAs iden>fied by Ribo-seq analysis of that mutant were iden>fied here as Ded1-s>mulated mRNAs in Rec-Seq experiments at 22°C. The Ribo-seq experiments conducted by Guenther et al. were conducted on the ded1-ts mutant at 37°C; thus, any structures that confer Ded1-dependent leaky-scanning through uORFs detected in that study should have been stable in our Rec-Seq experiments.

      The introduc5on provides an important, detailed exposi5on of the state of the field with respect to Ded1p ac5vity. Nevertheless, in my view, it is quite lengthy and could be streamlined for clarity. As just one example, the proposed func5on of Ded1p in the nucleus seems like a detail that could be dispensed with for the present work.

      We have ahempted to shorten the Introduc>on, as suggested. However, we did not remove the short sec>on describing Ded1’s possible roles in the nucleus and ribosome biogenesis because we felt it was important to emphasize that one of the strengths of the Rec-Seq system is that it allows us to isolate the early steps of transla>on ini>a>on from later steps and from other cellular processes. In addi>on, at the sugges>on of Reviewer #3, we added a brief explana>on of Ded1’s possible role in the subunit joining step of transla>on.

      Reviewer #3

      Weaknesses:

      The slow nature of the biochemical experiments could bias results.

      We agree that the 15-minute >me point used could mask effects that are manifested at a purely kine>c level. It should be noted that we have measured the observed rate constants for 48S forma>on on a variety of mRNAs in the in vitro recons>tuted system in the presence of satura>ng Ded1 (Gupta et al. [2018] eLife, hhps://elifesciences.org/ar>cles/38892 ) and found that they are generally in the range of es>mates of rate constants for transla>on ini>a>on in vivo in yeast (~1-10 min-1; e.g., Siwiak and Zielenkiewicz [2010], PLOS Comput. Biol., 6: e100865). In preliminary experiments, we have used the Rec-Seq system to measure the kine>cs of 48S PIC forma>on transcriptome-wide in the absence of Ded1 and find that the mean rate constant observed (~2 min-1) is also within the range of es>mates of the rate of transla>on ini>a>on in vivo in yeast. We hope to publish this analysis in a future manuscript.

      It has been suggested that Ded1 and its human homolog DDX3X could play a role in subunit joining postscanning (Wang et al. 2022, Cell and Geissler et al. 2012 Nucleic Acids Res). Could the authors poten5ally inves5gate this by adding GTP, eIF5B and 60S subunits into the reac5on mixture and isola5ng 80S complexes?

      This is a very interes>ng sugges>on. One of our plans with the Rec-Seq system is to see if we can also observe 80S forma>on with it and dis>nguish 80S from 48S complexes. Although we haven’t yet tried this and there might be technical obstacles to doing it, if it works we would like to examine the poten>al effects of Ded1, as suggested. We now men>on this possibility in the Discussion sec>on (lines 709-716 and 810-817).

      An incuba5on 5me of 15 minutes is quite long on the 5mescale of transla5on ini5a5on. Presumably, the compe55on for 40S among mRNAs is par5ally kine5cally controlled so it would be interes5ng if the authors could do a 5me series on the incuba5on 5me. Does Ded1 increase ini5a5on on more structured UTRs even at shorter incuba5ons or are those only observed with longer incuba5ons?

      We agree. See the response to the ques5on about kine5cs above.

      Does GDPNP lead to off-pathway events? What happens when GTP is used in the TC? Presumably in the absence of eIF5B the 48S PIC should remain stalled at the start codon.

      In previous experiments in the recons>tuted system, we showed that using GTP instead of GDPNP resulted in 48S complexes that were less stable than those stalled prior to GTP hydrolysis (e.g., Algire et al. [2002] RNA 8:382-397). This is presumably because eIF2•GDP and eIF5 release from the complex and the Met-tRNAi can dissociate in the absence of subunit joining. Although we haven’t tried it in the Rec-Seq system, we suspect that the resul>ng PICs would fall apart during sucrose gradient sedimenta>on.

      The authors use assembly of a 48S PIC at the start codon as evidence of scanning but could use more evidence to back this claim up. Does removing the cap structure on the two luciferase mRNA controls disrupt ini5a5on using this approach? That would be direct evidence of 5' end 40S loading and scanning to the start codon.

      In previous work using the recons>tuted system, we studied the effect of the 5’-cap on 48S PIC forma>on (Mitchell et al. [2010] Mol. Cell 39:950-962; Yourik et al. [2017] eLife hhps://elifesciences.org/ar>cles/31476 ). We found that stable 48S PIC forma>on is strongly dependent on the presence of the 5’-cap. In addi>on, the cap prevents off-pathway events and enforces a requirement for the full set of ini>a>on factors to achieve efficient 48S PIC forma>on. As the reviewer indicates, the cap-dependence of the system supports the conclusion that 5’end loading and scanning take place. We have now added this informa>on and the relevant cita>ons to the Introduc>on (lines 147-153). We thank the reviewer for poin>ng out this oversight. It should also be noted that the cases of mRNAs in which 5’UTR transla>on is increased by addi>on of Ded1 support the conclusion that the factor promotes ahachment of the PIC to the 5’ ends of mRNAs and subsequent 5’ to 3’ scanning, as noted in lines 608-618.

      The authors state that "The correla5on between CDS length and RE could be indirect because CDS length also correlates with 5'UTR length". Could the authors bin the transcripts into different 5' UTR length ranges and then probe for CDS length differences on RE for each 5' UTR length bin? This could be useful to truly parse the mechanism by which CDS length is influencing RE.

      This was an excellent sugges>on. We now include this analysis in a new supplementary figure, Figure 3S-2. Corresponding text was added in lines 380-387:

      “Importantly, correlations between Ded1 stimulation and 5’ UTR lengths are evident for all three groups of mRNAs containing distinct ranges of CDS lengths (Fig. 3-S2A-C). In contrast, a marked correlation between Ded1 stimulation and CDS length was detected only for the group of mRNAs with longest 5’UTRs (Fig. 3-S2D-F), and only the latter group showed a clear correlation between 5’UTR length and CDS length (Fig. 3-S2G-I). Thus, the correlation between Ded1 stimulation and CDS length appears to be indirect, driven by the tendency for the mRNAs with the longest 5’UTRs to also have correspondingly longer CDSs.”

      We thank the reviewer for this very useful idea.

      In Figure 3I, why does RE dip for the middle bins of CDS length in both 100 nM and 500 nM condi5ons, and then rise back up for the later bins? In other words, why do the shortest and longest CDS have the best RE in the presence of ded1?

      We do not know the reason for this dip and now say this in the Results on lines 377-378.

      The discussion sec5on would be well served to discuss proposed roles of Ded1 post-scanning and how those fit, if at all, with the data presented throughout the manuscript.

      We have now added this to the Discussion (lines 709-716 and 810-817). We thank the reviewer for poin>ng out this oversight.

      Minor comments:

      • Define bins on figures rather than using bin number for axis labels. For example, Figure 3A-D x-axis labels indicate the length range of each bin.

      Thank you for the sugges>on. We have made this change.

      • Figure 3I: the data seem to indicate that shortest CDSs have a ded1 dependency similar to the longest CDSs. This result seems inconsistent with the given rela5onship between UTR length, structure, CDS length. Please clarify.

      See answer to this ques>on above.

      • Replace qualita5ve statements, such as "substan5ally smaller reduc5ons" with percent change, numbers, etc.

      We have tried to replace qualita>ve statements with quan>ta>ve ones, where possible.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies the mitotic localization mechanism for Aurora B and INCENP (parts of the chromosomal passenger complex, CPC) in Trypanosoma brucei. The mechanism is different from that in the more commonly studied opisthokonts and there is solid support from RNAi and imaging experiments, targeted mutations, immunoprecipitations with crosslinking/mass spec, and AlphaFold interaction predictions. The results could be strengthened by biochemically testing proposed direct interactions and demonstrating that the targeting protein KIN-A is a motor. The findings will be of interest to parasitology researchers as well as cell biologists working on mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. Please note that the conserved glycine residue in the Switch II helix in KIN-A was mistakenly labelled as G209 in the original manuscript. We now corrected it to G210 in the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The CPC plays multiple essential roles in mitosis such as kinetochore-microtubule attachment regulation, kinetochore assembly, spindle assembly checkpoint activation, anaphase spindle stabilization, cytokinesis, and nuclear envelope formation, as it dynamically changes its mitotic localization: it is enriched at inner centromeres from prophase to metaphase but it is relocalized at the spindle midzone in anaphase. The business end of the CPC is Aurora B and its allosteric activation module IN-box, which is located at the C-terminal part of INCENP. In most well-studied eukaryotic species, Aurora B activity is locally controlled by the localization module of the CPC, Survivin, Borealin, and the N-terminal portion of INCENP. Survivin and Borealin, which bind the N terminus of INCENP, recognize histone residues that are specifically phosphorylated in mitosis, while anaphase spindle midzone localization is supported by the direct microtubule-binding capacity of the SAH (single alpha helix) domain of INCENP and other microtubule-binding proteins that specifically interact with INCENP during anaphase, which are under the regulation of CDK activity. One of these examples includes the kinesin-like protein MKLP2 in vertebrates.

      Trypanosoma is an evolutionarily interesting species to study mitosis since its kinetochore and centromere proteins do not show any similarity to other major branches of eukaryotes, while orthologs of Aurora B and INCENP have been identified. Combining molecular genetics, imaging, biochemistry, cross-linking IP-MS (IP-CLMS), and structural modeling, this manuscript reveals that two orphan kinesin-like proteins KIN-A and KIN-B act as localization modules of the CPC in Trypanosoma brucei. The IP-CLMS, AlphaFold2 structural predictions, and domain deletion analysis support the idea that (1) KIN-A and KIN-B form a heterodimer via their coiled-coil domain, (2) Two alpha helices of INCENP interact with the coiled-coil of the KIN-A-KIN-B heterodimer, (3) the conserved KIN-A C-terminal CD1 interacts with the heterodimeric KKT9-KKT11 complex, which is a submodule of the KKT7-KKT8 kinetochore complex unique to Trypanosoma, (4) KIN-A and KIN-B coiled-coil domains and the KKT7-KKT8 complex are required for CPC localization at the centromere, (5) CD1 and CD2 domains of KIN-A support its centromere localization. The authors further show that the ATPase activity of KIN-A is critical for spindle midzone enrichment of the CPC. The imaging data of the KIN-A rigor mutant suggest that dynamic KIN-A-microtubule interaction is required for metaphase alignment of the kinetochores and proliferation. Overall, the study reveals novel pathways of CPC localization regulation via KIN-A and KIN-B by multiple complementary approaches.

      Strengths:

      The major conclusion is collectively supported by multiple approaches, combining site-specific genome engineering, epistasis analysis of cellular localization, AlphaFold2 structure prediction of protein complexes, IP-CLMS, and biochemical reconstitution (the complex of KKT8, KKT9, KKT11, and KKT12).

      We thank the reviewer for her/his positive assessment of our manuscript.

      Weaknesses:

      • The predictions of direct interactions (e.g. INCENP with KIN-A/KIN-B, or KIN-A with KKT9-KKT11) have not yet been confirmed experimentally, e.g. by domain mutagenesis and interaction studies.

      Thank you for this point. It is true that we do not have evidence for direct interactions between KIN-A with KKT9-KKT11. However, the interaction between INCENP with KIN-A/KIN-B is strongly supported by our cross-linking IP-MS of native complexes. Furthermore, we show that deletion of the INCENPCPC1 N-terminus predicted to interact with KIN-A:KIN-B abolishes kinetochore localization.

      • The criteria used to judge a failure of localization are not clearly explained (e.g., Figure 5F, G).

      As suggested by the reviewer in recommendation #14, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      • It remains to be shown that KIN-A has motor activity.

      We thank the reviewer for this important comment. Indeed, motor activity remains to demonstrated using an in vitro system, which is beyond the scope of this study. What we show here is that the motor domain of KIN-A effectively co-sediments with microtubules and that spindle localization of KIN-A is abolished upon deletion of the motor domain. Moreover, mutation of a conserved Glycine residue in the Switch II region (G210) to Alanine (‘rigor mutation’, (Rice et al., 1999)), renders KIN-A incapable of translocating to the central spindle, suggesting that its ATPase activity is required for this process. To clarify this point in the manuscript, we have replaced all instances, where we refer to ‘motor activity’ of KIN-A with ‘ATPase activity’ when referring to experiments performed using the KIN-A rigor mutant. In addition, we have included a Multiple Sequence Alignment (MSA) of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9 in Figure 6A and S6A, showing the conservation of key motifs required for ATP coordination and tubulin interaction. In the corresponding paragraph in the main text, we describe these data as follows:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      • The authors imply that KIN-A, but not KIN-B, interacts with microtubules based on microtubule pelleting assay (Fig. S6), but the substantial insoluble fractions of 6HIS-KINA and 6HIS-KIN-B make it difficult to conclusively interpret the data. It is possible that these two proteins are not stable unless they form a heterodimer.

      This is indeed a possibility. We are currently aiming at purifying full-length recombinant KIN-A and KIN-B (along with the other CPC components), which will allow us to perform in vitro interaction studies and to investigate biochemical properties of this complex (including the role of the motor domains of KIN-A and KIN-B) within the framework of an in-depth follow-up study. To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      • For broader context, some prior findings should be introduced, e.g. on the importance of the microtubule-binding capacity of the INCENP SAH domain and its regulation by mitotic phosphorylation (PMID 8408220, 26175154, 26166576, 28314740, 28314741, 21727193), since KIN-A and KIN-B may substitute for the function of the SAH domain.

      We have modified the introduction to include the following text and references mentioned by the reviewer: ‘The localization module comprises Borealin, Survivin and the N-terminus of INCENP, which are connected to one another via a three-helical bundle (Jeyaprakash et al., 2007, 2011; Klein et al., 2006). The two modules are linked by the central region of INCENP, composed of an intrinsically disordered domain and a single alpha helical (SAH) domain. INCENP harbours microtubule-binding domains within the N-terminus and the central SAH domain, which play key roles for CPC localization and function (Samejima et al., 2015; Kang et al., 2001; Noujaim et al., 2014; Cormier et al., 2013; Wheatley et al., 2001; Nakajima et al., 2011; Fink et al., 2017; Wheelock et al., 2017; van der Horst et al., 2015; Mackay et al., 1993).’

      Reviewer #2 (Public Review):

      How the chromosomal passenger complex (CPC) and its subunit Aurora B kinase regulate kinetochore-microtubule attachment, and how the CPC relocates from kinetochores to the spindle midzone as a cell transitions from metaphase to anaphase are questions of great interest. In this study, Ballmer and Akiyoshi take a deep dive into the CPC in T. brucei, a kinetoplastid parasite with a kinetochore composition that varies greatly from other organisms.

      Using a combination of approaches, most importantly in silico protein predictions using alphafold multimer and light microscopy in dividing T. brucei, the authors convincingly present and analyse the composition of the T. brucei CPC. This includes the identification of KIN-A and KIN-B, proteins of the kinesin family, as targeting subunits of the CPC. This is a clear advancement over earlier work, for example by Li and colleagues in 2008. The involvement of KIN-A and KIN-B is of particular interest, as it provides a clue for the (re)localization of the CPC during the cell cycle. The evolutionary perspective makes the paper potentially interesting for a wide audience of cell biologists, a point that the authors bring across properly in the title, the abstract, and their discussion.

      The evolutionary twist of the paper would be strengthened 'experimentally' by predictions of the structure of the CPC beyond T. brucei. Depending on how far the authors can extend their in-silico analysis, it would be of interest to discuss a) available/predicted CPC structures in well-studied organisms and b) structural predictions in other euglenozoa. What are the general structural properties of the CPC (e.g. flexible linkers, overall dimensions, structural differences when subunits are missing etc.)? How common is the involvement of kinesin-like proteins? In line with this, it would be good to display the figure currently shown as S1D (or similar) as a main panel.

      We thank the reviewer for her/his encouraging assessment of our manuscript and the appreciation on the extent of the evolutionary relevance of our work. As suggested, we have moved the phylogenetic tree previously shown in Fig. S1D to the main Fig. 1F. Our AF2 analysis of CPC proteins and (sub)complexes from other kinetoplastids failed to predict reliable interactions among CPC proteins except for that between Aurora B and the IN box. It therefore remains unclear whether CPC structures are conserved among kinetoplastids. Because components of CPC remain unknown in other euglenozoa (other than Aurora B and INCENP), we cannot perform structural predictions of CPC in diplonemids or euglenids.

      It remains unclear how common the involvement of kinesin-like proteins with the CPC is in other eukaryotes, partly because we could not identify an obvious homolog of KIN-A/KIN-B outside of kinetoplastids. Addressing this question would require experimental approaches in various eukaryotes (e.g. immunoprecipitation and mass spectrometry of Aurora B) as we carried out in this manuscript using Trypanosoma brucei.

      Reviewer #3 (Public Review):

      Summary:

      The protein kinase, Aurora B, is a critical regulator of mitosis and cytokinesis in eukaryotes, exhibiting a dynamic localisation. As part of the Chromosomal Passenger Complex (CPC), along with the Aurora B activator, INCENP, and the CPC localisation module comprised of Borealin and Survivin, Aurora B travels from the kinetochores at metaphase to the spindle midzone at anaphase, which ensures its substrates are phosphorylated in a time- and space-dependent manner. In the kinetoplastid parasite, T. brucei, the Aurora B orthologue (AUK1), along with an INCENP orthologue known as CPC1, and a kinetoplastid-specific protein CPC2, also displays a dynamic localisation, moving from the kinetochores at metaphase to the spindle midzone at anaphase, to the anterior end of the newly synthesised flagellum attachment zone (FAZ) at cytokinesis. However, the trypanosome CPC lacks orthologues of Borealin and Survivin, and T. brucei kinetochores also have a unique composition, being comprised of dozens of kinetoplastid-specific proteins (KKTs). Of particular importance for this study are KKT7 and the KKT8 complex (comprising KKT8, KKT9, KKT11, and KKT12). Here, Ballmer and Akiyoshi seek to understand how the CPC assembles and is targeted to its different locations during the cell cycle in T. brucei.

      Strengths & Weaknesses:

      Using immunoprecipitation and mass-spectrometry approaches, Ballmer and Akiyoshi show that AUK1, CPC1, and CPC2 associate with two orphan kinesins, KIN-A and KIN-B, and with the use of endogenously expressed fluorescent fusion proteins, demonstrate for the first time that KIN-A and KIN-B display a dynamic localisation pattern similar to other components of the CPC. Most of these data provide convincing evidence for KIN-A and KIN-B being bona fide CPC proteins, although the evidence that KIN-A and KIN-B translocate to the anterior end of the new FAZ at cytokinesis is weak - the KIN-A/B signals are very faint and difficult to see, and cell outlines/brightfield images are not presented to allow the reader to determine the cellular location of these faint signals (Fig S1B).

      We thank the reviewer for their thorough assessment of our manuscript and the insightful feedback to further improve our study. To address the point above, we have acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      They then demonstrate, by using RNAi to deplete individual components, that the CPC proteins have hierarchical interdependencies for their localisation to the kinetochores at metaphase. These experiments appear to have been well performed, although only images of cell nuclei were shown (Fig 2A), meaning that the reader cannot properly assess whether CPC components have localised elsewhere in the cell, or if their abundance changes in response to depletion of another CPC protein.

      We chose to show close-ups of the nucleus to highlight the different localization patterns of CPC proteins under the different RNAi conditions. In none of these conditions did we observe mis-localization of CPC subunits to the cytoplasm. To clarify this point, we added the following sentence in the legend for Figure 2A:

      ‘A) Representative fluorescence micrographs showing the localization of YFP-tagged Aurora BAUK1, INCENPCPC1, KIN-A and KIN-B in 2K1N cells upon RNAi-mediated knockdown of indicated CPC subunits. Note that nuclear close-ups are shown here. CPC proteins were not detected in the cytoplasm. RNAi was induced with 1 μg/mL doxycycline for 24 h (KIN-B RNAi) or 16 h (all others). Cell lines: BAP3092, BAP2552, BAP2557, BAP3093, BAP2906, BAP2900, BAP2904, BAP3094, BAP2899, BAP2893, BAP2897, BAP3095, BAP3096, BAP2560, BAP2564, BAP3097. Scale bars, 2 μm.’

      Ballmer and Akiyoshi then go on to determine the kinetochore localisation domains of KIN-A and KIN-B. Using ectopically expressed GFP-tagged truncations, they show that coiled-coil domains within KIN-A and KIN-B, as well as a disordered C-terminal tail present only in KIN-A, but not the N-terminal motor domains of KIN-A or KIN-B, are required for kinetochore localisation. These data are strengthened by immunoprecipitating CPC complexes and crosslinking them prior to mass spectrometry analysis (IP-CLMS), a state-of-the-art approach, to determine the contacts between the CPC components. Structural predictions of the CPC structure are also made using AlphaFold2, suggesting that coiled coils form between KIN-A and KIN-B, and that KIN-A/B interact with the N termini of CPC1 and CPC2. Experimental results show that CPC1 and CPC2 are unable to localise to kinetochores if they lack their N-terminal domains consistent with these predictions. Altogether these data provide convincing evidence of the protein domains required for CPC kinetochore localisation and CPC protein interactions. However, the authors also conclude that KIN-B plays a minor role in localising the CPC to kinetochores compared to KIN-A. This conclusion is not particularly compelling as it stems from the observation that ectopically expressed GFP-NLS-KIN-A (full length or coiled-coil domain + tail) is also present at kinetochores during anaphase unlike endogenously expressed YFP-KIN-A. Not only is this localisation probably an artifact of the ectopic expression, but the KIN-B coiled-coil domain localises to kinetochores from S to metaphase and Fig S2G appears to show a portion of the expressed KIN-B coiled-coil domain colocalising with KKT2 at anaphase. It is unclear why KIN-B has been discounted here.

      As the reviewer points out, a small fraction of GFP-NLS-KIN-B317-624 is indeed detectable at kinetochores in anaphase, although most of the protein shows diffuse nuclear staining. There are various explanations for this phenomenon: It is conceivable that the KIN-B motor domain may contribute to microtubule binding and translocation of the CPC from kinetochores onto the spindle in anaphase. In our experiments, ectopically expressed KIN-B317-624 likely outcompetes a fraction of endogenous KIN-B for binding to KIN-A, which could interfere with this translocation process, leaving a population of CPC ‘stranded’ at kinetochores in anaphase. Another possibility, hinted at by the reviewer, is that the C-terminus of KIN-B interacts with receptors at the kinetochore/centromere. Although we do not discount this possibility, we nevertheless decided to focus on KIN-A in this study, because the anaphase kinetochore retention phenotype for both full-length GFP-NLS-KIN-A and -KIN-A309-862 is much stronger than for KIN-B317-624. Two additional reasons were that (i) KIN-A is highly conserved within kinetoplastids, whereas KIN-B orthologs are missing in some kinetoplastids, and (ii) no convincing interactions between KIN-B and kinetochore proteins were predicted by AF2.

      To address the reviewer’s point, we decided to include KIN-B in the title of this manuscript, which now reads: ‘Dynamic localization of the chromosomal passenger complex is controlled by the orphan kinesins KIN-A and KIN-B in the kinetoplastid parasite Trypanosoma brucei’.

      Moreover, we modified the corresponding paragraph in the results section as follows:

      ‘Intriguingly, unlike endogenously YFP-tagged KIN-A, ectopically expressed GFP fusions of both full-length KIN-A and KIN-A310-862 clearly localized at kinetochores even in anaphase (Figs. 2, F and H). Weak anaphase kinetochore signal was also detectable for KIN-B317-624 (Fig. S2F). GFP fusions of the central coiled-coil domain or the C-terminal disordered tail of KIN-A did not localize to kinetochores (data not shown). These results show that kinetochore localization of the CPC is mediated by KIN-A and KIN-B and requires both the central coiled-coil domain as well as the C-terminal disordered tail of KIN-A.’

      Next, using a mixture of RNAi depletion and LacI-LacO recruitment experiments, the authors show that kinetochore proteins KKT7 and KKT9 are required for AUK1 to localise to kinetochores (other KKT8 complex components were not tested here) and that all components of the KKT8 complex are required for KIN-A kinetochore localisation. Further, both KKT7 and KKT8 were able to recruit AUK1 to an ectopic locus in the S phase, and KKT7 recruited KKT8 complex proteins, which the authors suggest indicates it is upstream of KKT8. However, while these experiments have been performed well, the reciprocal experiment to show that KKT8 complex proteins cannot recruit KKT7, which could have confirmed this hierarchy, does not appear to have been performed. Further, since the LacI fusion proteins used in these experiments were ectopically expressed, they were retained (artificially) at kinetochores into anaphase; KKT8 and KIN-A were both able to recruit AUK1 to LacO foci in anaphase, while KKT7 was not. The authors conclude that this suggests the KKT8 complex is the main kinetochore receptor of the CPC - while very plausible, this conclusion is based on a likely artifact of ectopic expression, and for that reason, should be interpreted with a degree of caution.

      We previously showed that RNAi-mediated depletion of KKT7 disrupts kinetochore localization of KKT8 complex members, whereas kinetochore localization of KKT7 is unaffected by disruption of the KKT8 complex (Ishii and Akiyoshi, 2020). Moreover, in contrast to the KKT8 complex, KKT7 remains at kinetochores in anaphase (Akiyoshi and Gull, 2014). These data show that KKT7 is upstream of the KKT8 complex. In this context, the LacI-LacO tethering approach can be very useful to probe whether two proteins (or domains of proteins) could interact in vivo either directly or indirectly. However, a recruitment hierarchy cannot be inferred from such experiments because the data just shows whether X can recruit Y to an ectopic locus (but not whether X is upstream of Y or vice versa). Regarding the retention of Aurora BAUK1 at kinetochores in anaphase upon ectopic expression of GFP-KKT8-LacI, we agree with the reviewer that these data need to be carefully interpreted. Nevertheless, the notion that the KKT7-KKT8 complex recruits the CPC to kinetochores is also strongly supported by IP-MS, RNAi experiments, and AF2 predictions. For clarification and to address the reviewer’s point, we re-formulated the corresponding paragraph in the main text:

      ‘We previously showed that KKT7 lies upstream of the KKT8 complex (Ishii and Akiyoshi, 2020). Indeed, GFP-KKT72-261-LacI recruited tdTomato-KKT8, -KKT9 and -KKT12 (Fig. S4E). Expression of both GFP-KKT72-261-LacI and GFP-KKT8-LacI resulted in robust recruitment of tdTomato-Aurora BAUK1 to LacO foci in S phase (Figs. 4, E and F). Intriguingly, we also noticed that, unlike endogenous KKT8 (which is not present in anaphase), ectopically expressed GFP-KKT8-LacI remained at kinetochores during anaphase (Fig. 4F). This resulted in a fraction of tdTomato-Aurora BAUK1 being trapped at kinetochores during anaphase instead of migrating to the central spindle (Fig. 4F). We observed a comparable situation upon ectopic expression of GFP-KIN-A, which is retained on anaphase kinetochores together with tdTomato-KKT8 (Fig. S4F). In contrast, Aurora BAUK1 was not recruited to LacO foci marked by GFP- KKT72-261-LacI in anaphase (Fig. 4E).’

      Further IP-CLMS experiments, in combination with recombinant protein pull-down assays and structural predictions, suggested that within the KKT8 complex, there are two subcomplexes of KKT8:KKT12 and KKT9:KKT11, and that KKT7 interacts with KKT9:KKT11 to recruit the remainder of the KKT8 complex. The authors also assess the interdependencies between KKT8 complex components for localisation and expression, showing that all four subunits are required for the assembly of a stable KKT8 complex and present AlphaFold2 structural modelling data to support the two subcomplex models. In general, these data are of high quality and convincing with a few exceptions. The recombinant pulldown assay (Fig. 4H) is not particularly convincing as the 3rd eluate gel appears to show a band at the size of KKT11 (despite the labelling indicating no KKT11 was present in the input) but no pulldown of KKT9, which was present in the input according to the figure legend (although this may be mislabeled since not consistent with the text). The text also states that 6HIS-KKT8 was insoluble in the absence of KKT12, but this is not possible to assess from the data presented.

      We thank the reviewer for pointing out an error in the text: ‘Removal of both KKT9 and KKT11 did not impact formation of the KKT8:KKT12 subcomplex’ should read ‘Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex’. Regarding the very faint band perceived to be KKT11 in the 3rd eluate: This band runs slightly lower than KKT11 and likely represents a bacterial contaminant (which we have seen also in other preps in the past). We have made a note of this in the corresponding legend (new Fig. 4I). Moreover, we provide the estimated molecular weights for each subunit, as suggested by the reviewer in recommendation #14 (see below):

      ‘(I) Indicated combinations of 6HIS-tagged KKT8 (~46 kDa), KKT9 (~39 kDa), KKT11 (~29 kDa) and KKT12 (~23 kDa) were co-expressed in E. coli, followed by metal affinity chromatography and SDS-PAGE. The asterisk indicates a common contaminant.’

      The corresponding paragraph in the results section now reads:

      To validate these findings, we co-expressed combinations of 6HIS-KKT8, KKT9, KKT11 and KKT12 in E. coli and performed metal affinity chromatography (Fig. 4I). 6HIS-KKT8 efficiently pulled down KKT9, KKT11 and KKT12, as shown previously (Ishii and Akiyoshi, 2020). In the absence of KKT9, 6HIS-KKT8 still pulled down KKT11 and KKT12. Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex. In contrast, 6HIS-KKT8 could not be recovered without KKT12, indicating that KKT12 is required for formation of the full KKT8 complex. These results support the idea that the KKT8 complex consists of KKT8:KKT12 and KKT9:KKT11 subcomplexes.’

      It is also surprising that data showing the effects of KKT8, KKT9, and KKT12 depletion on KKT11 localisation and abundance are not presented alongside the reciprocal experiments in Fig S4G-J.

      YFP-KKT11 is delocalized upon depletion of KKT8 and KKT9 (see below). Unfortunately, we were unsuccessful in our attempts at deriving the corresponding KKT12 RNAi cell line, rendering this set of data incomplete. Because these data are not of critical importance for this study, we decided not to invest more time in attempting further transfections.

      Author response image 1.

      The authors also convincingly show that AlphaFold2 predictions of interactions between KKT9:KKT11 and a conserved domain (CD1) in the C-terminal tail of KIN-A are likely correct, with CD1 and a second conserved domain, CD2, identified through sequence analysis, acting synergistically to promote KIN-A kinetochore localisation at metaphase, but not being required for KIN-A to move to the central spindle at anaphase. They then hypothesise that the kinesin motor domain of KIN-A (but not KIN-B which is predicted to be inactive based on non-conservation of residues key for activity) determines its central spindle localisation at anaphase through binding to microtubules. In support of this hypothesis, the authors show that KIN-A, but not KIN-B can bind microtubules in vitro and in vivo. However, ectopically expressed GFP-NLS fusions of full-length KIN-A or KIN-A motor domain did not localise to the central spindle at anaphase. The authors suggest this is due to the GPF fusion disrupting the ATPase activity of the motor domain, but they provide no evidence that this is the case. Instead, they replace endogenous KIN-A with a predicted ATPase-defective mutant (G209A), showing that while this still localises to kinetochores, the kinetochores were frequently misaligned at metaphase, and that it no longer concentrates at the central spindle (with concomitant mis-localisation of AUK1), causing cells to accumulate at anaphase. From these data, the authors conclude that KIN-A ATPase activity is required for chromosome congression to the metaphase plate and its central spindle localisation at anaphase. While potentially very interesting, these data are incomplete in the absence of any experimental data to show that KIN-A possesses ATPase activity or that this activity is abrogated by the G209A mutation, and the conclusions of this section are rather speculative.

      Thank you for this important comment, which relates to a similar point raised by Reviewer 1 (see above). Indeed, ATPase and motor activity of KIN-A remain to demonstrated biochemically using recombinant proteins, which is beyond the scope of this study. We generated MSAs of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9, which are now presented in Figure 6A and S6A. These clearly show that key motifs required for ATP or tubulin binding in other kinesins are highly conserved in KIN-A (but not KIN-B). This includes the conserved glycine residue in the Switch II helix (G234 in human Kinesin-1, G210 in T. brucei KIN-A), which forms a hydrogen bond with the γ-phosphate of ATP, and upon mutation has been shown to impair ATPase activity and trap the motor head in a strong microtubule (‘rigor’) state (Rice et al., 1999; Sablin et al., 1996). The prominent rigor phenotype of KIN-AG210A is consistent with KIN-A having ATPase activity. In addition to the data in Fig. 6A and S6A, we made following changes to the main text:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).

      Ectopically expressed GFP-KIN-A and -KIN-A2-309 partially localized to the mitotic spindle but failed to concentrate at the midzone during anaphase (Figs. 2, F and G), suggesting that N-terminal tagging of the KIN-A motor domain may interfere with its function. To address whether the ATPase activity of KIN-A is required for central spindle localization of the CPC, we replaced one allele of KIN-A with a C-terminally YFP-tagged G210A ATP hydrolysis-defective rigor mutant (Fig. 6A) (Rice et al., 1999) and used an RNAi construct directed against the 3’UTR of KIN-A to deplete the untagged allele. The rigor mutation did not affect recruitment of KIN-A to kinetochores (Figs. S6, C and D). However, KIN-AG210A-YFP marked kinetochores were misaligned in ~50% of cells arrested in metaphase, suggesting that ATPase activity of KIN-A promotes chromosome congression to the metaphase plate (Figs. S6, E-H).’

      Impact:

      Overall, this work uses a wide range of cutting-edge molecular and structural predictive tools to provide a significant amount of new and detailed molecular data that shed light on the composition of the unusual trypanosome CPC and how it is assembled and targeted to different cellular locations during cell division. Given the fundamental nature of this research, it will be of interest to many parasitology researchers as well as cell biologists more generally, especially those working on aspects of mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the reviewer for his/her feedback and thoughtful and thorough assessment of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Why did the authors omit KIN-B from the title?

      We decided to add KIN-B in the title. Please see our response to Reviewer #3 (public review).

      (2) Abstract, line 28, "Furthermore, the kinesin motor activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset." This must be revised - see public review.

      We changed this section of the abstract as follows:

      ‘Furthermore, the ATPase activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset. Thus, KIN-A constitutes a unique ‘two-in-one’ CPC localization module in complex with KIN-B, which directs the CPC to kinetochores (from S phase until metaphase) via its C-terminal tail, and to the central spindle (in anaphase) via its N-terminal kinesin motor domain.’

      (3) Line 87-90. The findings by Li et al., 2008 (KIN-A and KIN-B interacting with Aurora B and epistasis analysis) should be introduced more comprehensively in the Introduction section.

      We added the following sentence in the introduction:

      ‘In addition, two orphan kinesins, KIN-A and KIN-B, have been proposed to transiently associate with Aurora BAUK1 during mitosis (Li et al., 2008; Li, 2012).’

      (4) Figure 1B. The way the Trypanosoma cell cycle is defined should be briefly explained in the main text, rather than just referring to the figure.

      The ‘KN’ annotation of the trypanosome cell cycle is explained in the Figure 1 legend. We now also added a brief description in the main text:

      ‘We next assessed the localization dynamics of fluorescently tagged KIN-A and KIN-B over the course of the cell cycle (Figs. 1, B-E). T. brucei possesses two DNA-containing organelles, the nucleus (‘N’) and the kinetoplast (‘K’). The kinetoplast is an organelle found uniquely in kinetoplastids, which contains the mitochondrial DNA and replicates and segregates prior to nuclear division. The ‘KN’ configuration serves as a good cell cycle marker (Woodward and Gull, 1990; Siegel et al., 2008).’

      (5) Line 118. Throughout the paper, it is not clear why GFP-NLS fusion was used instead of GFP fusion. Please justify the fusion of NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (6) Line 121, "Unexpectedly". It is not clear why this was unexpected.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      (7) Line 127-129. Defining homologs and orthologs is tricky - there are many homologs and paralogs of kinesin-like proteins. The method to define the presence or absence of KIN-A/KIN-B homologs should be described in the Materials and Methods section.

      Due to the difficulty in defining true orthologs for kinesin-like proteins, we took a conservative approach: reciprocal best BLAST hits. We first searched KIN-A homologs using BLAST in the TriTryp database or using hmmsearch using manually prepared hmm profiles. When the top hit in a given organism found T. brucei KIN-A in a reciprocal BLAST search in T. brucei proteome, we considered the hit as a true ortholog. We modified the Materials and Methods section as below.

      ‘Searches for homologous proteins were done using BLAST in the TriTryp database (Aslett et al., 2010) or using hmmsearch using manually prepared hmm profiles (HMMER version 3.0; Eddy, 1998). The top hit was considered as a true ortholog only if the reciprocal BLAST search returned the query protein in T. brucei.’

      (8) Line 156. For non-experts of Trypanosoma cell biology, it is not clear how the nucleolar localization is defined.

      The nucleolus in T. brucei is discernible as a DAPI-dim region in the nucleus.

      (9) Fig.2G and Fig.S2F. These data imply that the coiled-coil and C-terminal tail domains of KIN-A/KIN-B are important for anaphase spindle midzone enrichment. However, it is odd that this was not mentioned. This reviewer recommends that the authors quantify the midzone localization data of these constructs and discuss the role of the coiled-coil domains.

      One possibility is that KIN-A and KIN-B need to form a complex (via their coiled-coil domains) to localize to the spindle midzone. Another likely possibility, which is discussed in the manuscript, is that N-terminal tagging of KIN-A impairs motor activity. This is supported by the fact that the central spindle localization is also disrupted in full-length GFP-KIN-A. We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      (10) Line 288-289, "pLDDT scores improved significantly for KIN-A CD1 in complex with KKT9:KKT11 (>80) compared to KIN-A CD1 alone (~20) (Figs. S3, A and B)." I can see that pLDDT score is about 20 at KIN-A CD1 from Figs S3A, but the basis of pLDDT > 80 upon inclusion go KKT9:KKT11 is missing.

      We added the pLDDT and PAE plots for the AF2 prediction of KIN-A700-800 in complex with KKT9:KKT11 in Fig. S5B.

      (11) Fig. 5A. Since there is no supporting biochemical data for KIN-A-KKT9-KKT11 interaction, it is important to assess the stability of AlphaFold-based structural predictions of the KIN-A-KKT9-KKT11 interaction. Are there significant differences among the top 5 prediction results, and do these interactions remain stable after the "simulated annealing" process used in the AlphaFold predictions? Are predicted CD1-interacting regions/amino residues in KKT9 and KKT11 evolutionarily conserved?

      See above. The interaction was predicted in all 5 predictions as shown in Fig. S5B. Conservation of the CD1-interacting regions in KKT9 and KKT11 are shown below:

      Author response image 2.

      KKT9 (residues ~53 – 80 predicted to interact with KIN-A in T. brucei)

      Author response image 3.

      KKT11 (residues 61-85 predicted to interact with KIN-A in T. brucei)

      (12) Line 300, Fig. S5D and E, "failed to localize at kinetochores". From this resolution of the microscopy images, it is not clear if these proteins fail to localize at kinetochores as the KKT and KIN-A310-716 signals overlap. Perhaps, "failed to enrich at kinetochores" is a more appropriate statement.

      We changed this sentence according to the reviewer’s suggestion.

      (13) Line 309 and Fig 5D and F, "predominantly localized to the mitotic spindle". From this image shown in Fig 5D, it is not clear if KIN-A∆CD1-YFP and Aurora B are predominantly localized to the spindle or if they are still localized to centromeres that are misaligned on the spindle. Without microtubule staining, it is also not clear how microtubules are distributed in these cells. Please clarify how the presence or absence of kinetochore/spindle localization was defined.

      As shown in Fig. S5E and S5F, deletion of CD1 clearly impairs kinetochore localization of KIN-A (kinetochores marked by tdTomato-KKT2). Moreover, misalignment of kinetochores, as observed upon expression of the KIN-AG210A rigor mutant, would result in an increase in 2K1N cells and proliferation defects, which is not the case for the KIN-A∆CD1 mutant (Fig. 5H, Fig. S5I). KIN-A∆CD1-YFP appears to localize diffusely along the entire length of the mitotic spindle, whereas we still observe kinetochore-like foci in the rigor mutant. Unfortunately, we do not have suitable antibodies that would allow us to distinguish spindle microtubules from the vast subpellicular microtubule array present in T. brucei and hence need to rely on tagging spindle-associated proteins such as MAP103.

      (14) Fig. 5F, G, S5F. Along the same lines, it would be helpful to show example images for each category - "kinetochores", "kinetochores + spindle", and "spindle".

      As suggested by the reviewer, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      (15) Line 332 and Fig. S6A. The experiment may be repeated in the presence of ATP or nonhydrolyzable ATP analogs.

      We thank the reviewer for the suggestion. We envisage such experiments for an in-depth follow-up study.

      (16) Line 342, "motor activity of KIN-A". Until KIN-A is shown to have motor activity, the result based on the rigor mutant does not show that the motor activity of KIN-A promotes chromosome congression. The result suggests that the ATPase activity of KIN-A is important.

      We changed that sentence as suggested by the reviewer.

      (17) Line 419 -. The authors base their discussion on the speculation that KIN-A is a plus-end directed motor. Please justify this speculation.

      Indeed, the notion that KIN-A is a plus-end directed motor remains a hypothesis, which is based on sequence alignments with other plus-end directed motors and the observation that the KIN-A motor domain is involved in translocation of the CPC to the central spindle in anaphase. We have modified the corresponding section in the discussion as follows:

      ‘It remains to be investigated whether KIN-A truly functions as a plus-end directed motor. The role of the KIN-B in this context is equally unclear. Since KIN-B does not possess a functional kinesin motor domain, we deem it unlikely that the KIN-A:KIN-B heterodimer moves hand-over-hand along microtubules as do conventional (kinesin-1 family) kinesins. Rather, the KIN-A motor domain may function as a single-headed unit and drive processive plus-end directed motion using a mechanism similar to the kinesin-3 family kinesin KIF1A (Okada and Hirokawa, 1999).’

      (18) Line 422-423, "plus-end directed motion using a mechanism similar to kinesin-3 family kinesins (such as KIF1A)." Please cite a reference supporting this statement.

      See above. We cited a paper by (Okada and Hirokawa, 1999).

      Reviewer #2 (Recommendations For The Authors):

      Please provide a quantification of data shown in Figure 2F-H and described in lines 151-166.

      We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      It appears as if the paper more or less follows a chronological order of the experiments that were performed before AF multimer enabled the insightful and compelling structural analysis. That is a matter of style, but in some cases, the writing could be updated, shortened, or re-arranged into a more logical order. Concrete examples:

      (i) Line 144: "we did not include CPC2 for further analysis in this study" Although CPC2 features at a prominent and interesting position in the predicted structures of the kinetoplastid CPC, shown in later main figures.

      We attempted RNAi-mediated depletion of CPC2 using two different shRNA constructs. However, we cannot exclude the possibility that the knockdown of CPC2 was less efficient compared with the other CPC subunits. For this reason, we decided to remove all the data on CPC2 from Fig. S2.

      (ii) The work with the KIN-A motor domain only and KIN-A ∆motor domain (Fig 2) begs the question about a more subtle mutation to interfere with the motor domain. Which is ultimately presented in Fig 6. I think that the final paragraph and Figure 6 follow naturally after Figure 2.

      We appreciate the suggestion. However, we would like to keep Figure 6 there.

      (iii) The high-confidence structural predictions in Fig 3 and Fig 4 are insightful. The XL-MS descriptions that precede them are not so helpful (Fig 3A and 4G and in the text). To emphasize their status as experimental support for the predicted structures, which is very important, it would be good to discuss the XL-MS after presenting the models.

      As suggested, we have re-arranged the text and/or figures such that the AF2 predictions are discussed first and the CLMS data are brought in afterwards.

      Figure 1A prominently features an arbitrary color code and a lot of protein IDs without a legend. That is not a very convincing start. Figure S1 is more informative, containing annotated protein names and results of the KIN-A and KIN-B IPs. Please improve Figure 1A, for example by presenting a modified version of Figure S1. In all these types of figures, please list both protein names and gene IDs.

      We agree with the reviewer that the IP-MS data in Fig. S1 is more informative and hence decided to swap the heatmaps in Fig. 1A and Fig. S1A. We further annotated the heatmap corresponding to the Aurora BAUK1 IP-MS (now presented in Fig. S1) as suggested by the reviewer.

      The visualization of the structural predictions is not consistent among figures:

      (i) The structure in Fig 4I is important and could be displayed larger. The pLDDT scores, and especially those of the non-displayed models, do not add much information and should not be a main panel. If the authors want to display the pLDDT scores, I recommend a panel (main or supplement) of the structure colored for local prediction confidences, as in Fig 5A.

      (ii) In Figure 5A itself, it is hard to follow the chains in general, and KIN-A in particular, since the structure is pLDDT-coloured. Please present an additional panel colored by chain (consistent with Fig 4I, as mentioned above).

      (iii) The summarizing diagram, currently displayed as Fig 4J, should be placed after Fig 5A and take the discovered KIN-A - KKT9-11 connection into account. Ideally, it also covers the suspected importance of the motor domain and serves as a summarising diagram.

      We thank the reviewer for the constructive comments. For each structure prediction, we now present two images side by side; one coloured by chain and one colored by pLDDT. We recently re-ran AF2 for the full CPC and also for the KKT7N-KKT8 complex, and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction. We also increased the size of the structures shown in Fig. 4. Furthermore, we decided to remove the summarizing diagram from Fig. 4 and instead made a new main Fig. 7, which shows a more detailed schematic, which also takes into account the proposed function of the KIN-A motor domain, as suggested by the reviewer, and other points addressed in the Discussion.

      The methods section for the structural predictions lacks essential information. Predictions can only be reproduced if the version of AF2 multimer v2.x is specified and key parameters are mentioned.

      As suggested, we have added the details in the Materials and Methods section as follows.

      ‘Structural predictions of KIN-A/KIN-B, KIN-A310-862/KIN-B317-624, CPC1/CPC2/KIN-A300-599/KIN-B 317-624, and KIN-A700-800/KKT9/KKT11 were performed using ColabFold version 1.3.0 (AlphaFold-Multimer version 2), while those of AUK1/CPC1/CPC2/KIN-A1-599/KIN-B, KKT71-261/KKT9/KKT11/KKT8/KKT12, KKT9/KKT11/KKT8/KKT12, and KKT71-261/KKT9/KKT11 were performed using ColabFold version 1.5.3 (AlphaFold-Multimer version 2.3.1) using default settings, accessed via https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.3.0/AlphaFold2.ipynb and https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.3/AlphaFold2.ipynb.’

      Line 121, please explain the "Unexpectedly" by including a reference to the work from Li and colleagues. A statement with some details would be useful, as the difference between both studies appears to be crucial for the novelty of this paper. Alternatively, refer to this being covered in the discussion.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      Line 285 refers to "conserved" regions in the C-terminal part of KIN-A, referring to Figure 5. Please expand the MSA in Figure 5B to get an idea about the conservation/variation outside CD1 and CD2.

      We now present the full MSA for KIN-A proteins in kinetoplastids in Fig. S5A.

      Please specify what is meant by Line 367-369 for someone who is not familiar with the work by Komaki et al. 2022. Either clarify in the text or clarify in the text with data to support it.

      We updated the corresponding section in the discussion as follows:

      ‘Komaki et al. recently identified two functionally redundant CPC proteins in Arabidopsis, Borealin Related Interactor 1 and 2 (BORI1 and 2), which engage in a triple helix bundle with INCENP and Borealin using a conserved helical domain but employ an FHA domain instead of a BIR domain to read H3T3ph (Komaki et al., 2022).’

      Data presented in Figure S6A, the microtubule co-sedimentation assay, is not convincing since a substantial amount of KIN-A/B is pelleted in the absence of microtubules. Did the authors spin the proteins in BRB80 before the assay to continue with soluble material and reduce sedimentation in the absence of microtubules? If the authors want to keep the wording in lines 331-332, the MT-binding properties of KIN-A and KIN-B need to be investigated in more detail, for example with a titration and a quantification thereof. Otherwise, they should change the text and replace "confirms" with "is consistent with". In any case, the legend needs to be expanded to include more information.

      To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      We have also updated the main text in the results section:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      Details:

      The readability of the pAE plots could be improved by arranging sequences according to their position in the structure. For example in Fig4I, KKT8 could precede KKT12. If it is easy to update this, the authors might want to do so.

      We re-ran the AF2 predictions for the KKT7N – KKT8 complex in Fig. 4/S4 and changed the order according to the reviewer’s suggestion (KKT9:KKT11:KKT8:KKT12).

      The same paper is referred to as Je Van Hooff et al. 2017 and as Van Hooff et al. 2017

      Thank you for pointing this out. We have corrected the citation.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please state at the end of the introduction/start of the results section that this work was performed in procyclic trypanosomes. Given that the cell cycles of procyclic and bloodstream forms differ, this is important.

      We added this information at the end of the introduction:

      ‘Here, by combining biochemical, structural and cell biological approaches in procyclic form T. brucei, we show that the trypanosome CPC is a pentameric complex comprising Aurora BAUK1, INCENPCPC1, CPC2 and the two orphan kinesins KIN-A and KIN-B.’

      (2) Please define NLS at first use (line 118), and for clarity, explain the rationale for using GFP with an NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (3) Lines 148-150 - it would strengthen this claim if KIN-A/B protein levels were assessed by Western blot.

      We now present a Western blot in Fig. S2C, showing that bulk KIN-B levels are clearly reduced upon KIN-A RNAi. The same is true also to some extent for KIN-A levels upon KIN-B RNAi, although this is less obvious, possibly due to the lower efficiency of KIN-B compared to KIN-A RNAi as judged by fluorescence microscopy (quantified in Fig. 2D and 2E).

      (4) Line 253 - the text mentions the removal of both KKT9 and KKT11, which is not consistent with the figure (Fig 4H) - do you mean the removal of either KKT9 or KKT11?

      Yes, we thank the reviewer for pointing out this mistake in the text, which has now been corrected.

      (5) Line 337 - please include a reference for the G209A ATPase-defective rigor mutant - has this been shown to result in KIN-A being inactive previously?

      Please see above our answer in public review.

      (6) It is not always obvious when fluorescent fusion proteins are being expressed endogenously or ectopically, or when they are being expressed in an RNAi background or not without tracing the cell lines in Table S1 - please ensure this is clearly stated throughout the manuscript.

      We now made sure that this is clearly stated in the main text as well as in the figure legends.

      (7) Line 410 - 'KIN-A C-terminal tail is stuffed full of conserved CDK1CRK3 sites' - what does 'stuffed full' really mean (this is rather imprecise) and what are the consensus sites - are these CDK1 consensus sites that are assumed to be conserved for CRK3? I'm not aware of consensus sites for CRK3 having been determined, but if they have, this should be referenced.

      We have modified the corresponding section in the discussion as follows:

      ‘In support of this, the KIN-A C-terminal tail harbours many putative CRK3 sites (10 sites matching the minimal S/T-P consensus motif for CDKs) and is also heavily phosphorylated by Aurora BAUK1 in vitro (Ballmer et al. 2024). Finally, we speculate that the interaction of KIN-A motor domain with microtubules, coupled to the force generating ATP hydrolysis and possibly plus-end directed motion, eventually outcompetes the weakened interactions of the CPC with the kinetochore and facilitates the extraction of the CPC from chromosomes onto spindle microtubules during anaphase. Indeed, deletion of the KIN-A motor domain or impairment of its motor function through N-terminal GFP tagging causes the CPC to be trapped at kinetochores in anaphase. Central spindle localization is additionally dependent on the ATPase activity of the KIN-A motor domain as illustrated by the KIN-A rigor mutant.’

      (8) Lines 412-416: this proposal is written rather definitively - given no motor activity has been demonstrated for KIN-A, please make clear that this is still just a theory.

      See above.

      (9) Fig 1: KKT2 is not highlighted in Fig 1A - given this has been used for colocalization in Fig 1C-E, was it recovered, and if not, why not? Fig 1B-E: the S phase/1K1N terminology is somewhat misleading. Not all S phase cells will have elongated kinetoplasts - usually an asterisk is used to signify replicated DNA, not kinetoplast shape. If it is to be used here for elongation, then for consistency, N should be used for G2/mitotic cells.

      Fig. 1A (now Fig. S1A) only shows the tip 30 hits. KKT2 was indeed recovered with Aurora BAUK1 (see Table S2) and is often used as a kinetochore marker in trypanosomes by our lab and others since the signal of fluorescently tagged KKT2 is relatively bright and KKT2 localizes to centromeres throughout the cell cycle.

      (10) A general comment for all image figures is that these do not have accompanying brightfield images and it is therefore difficult to know where the cell body is, or sometimes which nuclei and kinetoplasts belong to which cell where DNA from more than one cell is within the image. It would be beneficial if brightfield images could be added, or alternatively, the cell outlines were traced onto DAPI or merged images. Also, brightfield images would allow the stage of cytokinesis (pre-furrowing/furrowing/abscission) in anaphase cells to be determined.

      Since this study primarily addresses the recruitment mechanism of the CPC to kinetochores and to the central spindle from S phase to metaphase and in anaphase, respectively, and CPC proteins are not observed outside of the nucleus during these cell cycle stages, we did not present brightfield images in the figures. However, this point is particularly valid for discerning the localization of KIN-A and KIN-B to the new FAZ tip from late anaphase onwards. Hence, we acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      (11) Fig 2A: legend should state that the micrographs show the localisation of the proteins within the nucleus as whole cells are not shown. 2C: can INCENP not be split into 2 lines - the 'IN' looks like 1N at first glance, which is confusing.

      We have applied the suggested change in Fig. 2.

      (12) Fig 3 (and other AF2 figures): Could the lines for satisfied & not satisfied in the key be thicker so they more closely resemble the lines in the figure and are less likely to be confused with the disordered regions of the CPC components?

      We have now made those lines thicker.

      (13) Why were different E value thresholds used in Fig 3 and Fig 4?

      The CLMS data in Fig. 3 and Fig. 4 now both use the same E value threshold of E-3 (previously E-4 was used in Fig. 4). To determine a sensible significance threshold, we included some yeast protein sequences (‘false positives’) in the database used in pLink2 for identification of crosslinked peptides. Note that we recently also re-ran AF2 for the full CPC and for the KKT7N-KKT8 complex and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction.

      (14) Fig 4H legend - please give the expected sizes of these recombinant proteins & check the 3rd elution panel (see public review comments).

      See above response in public review.

      (15) Fig 4I - please explain what the colours of the PAE plot and the values in the key signify, as well as how the Scored Residue values are arrived at. Please also define the pIDDT in the legend.

      We have cited DeepMind’s 2021 methods paper, in which the outputs of AlphaFold are explained in detail. We also added a short description of the pLDDT and PAE scores and the corresponding colour coding in the legends of Fig. 3 and Fig. 4, respectively.

      From figure 3 legend:

      ‘(B) Cartoon representation showing two orientations of the trypanosome CPC, coloured by protein on the left (Aurora BAUK1: crimson, INCENPCPC1: green, CPC2: cyan, KIN-A: magenta, and KIN-B: yellow) or according to their pLDDT values on the right, assembled from AlphaFold2 predictions shown in Figure S3. The pLDDT score is a per-residue estimate of the confidence in the AlphaFold prediction on a scale from 0 – 100. pLDDT > 70 (blue, cyan) indicates a reasonable accuracy of the model, while pLDDT < 50 (red) indicates a low accuracy and often reflects disordered regions of the protein (Jumper et al., 2021). BS3 crosslinks in (B) were mapped onto the model using PyXlinkViewer (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å) (Schiffrin et al., 2020).’

      From Figure 4 legend:

      ‘(G) AlphaFold2 model of the KKT7 – KKT8 complex, coloured by protein (KKT71-261: green, KKT8: blue, KKT12: pink, KKT9: cyan and KKT11: orange) (left) and by pLDDT (center). BS3 crosslinks in (H) were mapped onto the model using PyXlinkViewer (Schiffrin et al., 2020) (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å). Right: Predicted Aligned Error (PAE) plot of model shown on the left (rank_2). The colour indicates AlphaFold’s expected position error (blue = low, red = high) at the residue on the x axis if the predicted and true structures were aligned on the residue on the y axis (Jumper et al., 2021).’

      (16) Fig 6 legend - Line 730 should say (F) not (C).

      Thank you for pointing out this typo.

      (17) Fig S1A - a key is missing for the colours. Fig S1B/C - cell outlines or a brightfield image are really needed here - see earlier comment. Fig S1D - there doesn't seem to be a method for how this tree was generated.

      See above response in public review regarding Fig. S1A and S1B/C. The tree in Fig. S1D is based on (Butenko et al., 2020).

      (18) Fig S2: A: how was protein knockdown validated (especially for CPC2 where there was little obvious phenotype)? Fig S2B: the y-axis should read proportion of cells, not percentage. Fig S2E - NLS should be labelled.

      Thank you for pointing out the mistake in the labelling.

      (19) Fig S3: PAE plots should be labelled with protein names, not A-E. Similarly, the pIDDT plots should be labelled as in Fig 4I.

      We have corrected the labelling in Fig. S3.

      (20) Fig S5A-D - cell cycle stage labels are missing from images.

      Thank you for pointing out the missing cell cycle stage labels.

      Addition by editor:

      In line 126 the statement that KIN-A and KIN-B "associate with Aurora-AUK1, INCENP-CPC1 and CPC2 throughout the cell cycle" seems too strong. There is no direct evidence for this. Please re-phrase as "likely associate" or "suggest... that ... may...".

      We have modified that sentence according to the editor’s suggestion.

      References:

      Akiyoshi, B., and K. Gull. 2014. Discovery of Unconventional Kinetochores in Kinetoplastids. Cell. 156. doi:10.1016/j.cell.2014.01.049.

      Butenko, A., F.R. Opperdoes, O. Flegontova, A. Horák, V. Hampl, P. Keeling, R.M.R. Gawryluk, D. Tikhonenkov, P. Flegontov, and J. Lukeš. 2020. Evolution of metabolic capabilities and molecular features of diplonemids, kinetoplastids, and euglenids. BMC Biology 2020 18:1. 18:1–28. doi:10.1186/S12915-020-0754-1.

      Cormier, A., D.G. Drubin, and G. Barnes. 2013. Phosphorylation regulates kinase and microtubule binding activities of the budding yeast chromosomal passenger complex in vitro. J Biol Chem. 288:23203–23211. doi:10.1074/JBC.M113.491480. Endow, S.A., F.J. Kull, and H. Liu. 2010. Kinesins at a glance. J Cell Sci. 123:3420. doi:10.1242/JCS.064113.

      Fink, S., K. Turnbull, A. Desai, and C.S. Campbell. 2017. An engineered minimal chromosomal passenger complex reveals a role for INCENP/Sli15 spindle association in chromosome biorientation. J Cell Biol. 216:911–923. doi:10.1083/JCB.201609123.

      van der Horst, A., M.J.M. Vromans, K. Bouwman, M.S. van der Waal, M.A. Hadders, and S.M.A. Lens. 2015. Inter-domain Cooperation in INCENP Promotes Aurora B Relocation from Centromeres to Microtubules. Cell Rep. 12:380–387. doi:10.1016/J.CELREP.2015.06.038.

      Ishii, M., and B. Akiyoshi. 2020. Characterization of unconventional kinetochore kinases KKT10/19 in Trypanosoma brucei. J Cell Sci. doi:10.1242/jcs.240978.

      Jeyaprakash, A.A., C. Basquin, U. Jayachandran, and E. Conti. 2011. Structural Basis for the Recognition of Phosphorylated Histone H3 by the Survivin Subunit of the Chromosomal Passenger Complex. Structure. 19:1625–1634. doi:10.1016/J.STR.2011.09.002.

      Jeyaprakash, A.A., U.R. Klein, D. Lindner, J. Ebert, E.A. Nigg, and E. Conti. 2007. Structure of a Survivin–Borealin–INCENP Core Complex Reveals How Chromosomal Passengers Travel Together. Cell. 131. doi:10.1016/j.cell.2007.07.045.

      Jumper, J., R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S.A.A. Kohl, A.J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A.W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 2021 596:7873. 596:583–589. doi:10.1038/s41586-021-03819-2.

      Kang, J.S., I.M. Cheeseman, G. Kallstrom, S. Velmurugan, G. Barnes, and C.S.M. Chan. 2001. Functional cooperation of Dam1, Ipl1, and the inner centromere protein (INCENP)-related protein Sli15 during chromosome segregation. J Cell Biol. 155:763–774. doi:10.1083/JCB.200105029.

      Klein, U.R., E.A. Nigg, and U. Gruneberg. 2006. Centromere targeting of the chromosomal passenger complex requires a ternary subcomplex of Borealin, Survivin, and the N-terminal domain of INCENP. Mol Biol Cell. 17:2547–2558. doi:10.1091/MBC.E05-12-1133.

      Komaki, S., E.C. Tromer, G. De Jaeger, N. De Winne, M. Heese, and A. Schnittger. 2022. Molecular convergence by differential domain acquisition is a hallmark of chromosomal passenger complex evolution. Proc Natl Acad Sci U S A. 119. doi:10.1073/PNAS.2200108119/-/DCSUPPLEMENTAL.

      Li, Z. 2012. Regulation of the Cell Division Cycle in Trypanosoma brucei. Eukaryot Cell. 11:1180. doi:10.1128/EC.00145-12.

      Li, Z., J.H. Lee, F. Chu, A.L. Burlingame, A. Günzl, and C.C. Wang. 2008. Identification of a Novel Chromosomal Passenger Complex and Its Unique Localization during Cytokinesis in Trypanosoma brucei. PLoS One. 3. doi:10.1371/journal.pone.0002354.

      Mackay, A.M., D.M. Eckley, C. Chue, and W.C. Earnshaw. 1993. Molecular analysis of the INCENPs (inner centromere proteins): separate domains are required for association with microtubules during interphase and with the central spindle during anaphase. J Cell Biol. 123:373–385. doi:10.1083/JCB.123.2.373.

      Marchetti, M.A., C. Tschudi, H. Kwon, S.L. Wolin, and E. Ullu. 2000. Import of proteins into the trypanosome nucleus and their distribution at karyokinesis. J Cell Sci. 113 ( Pt 5):899–906. doi:10.1242/JCS.113.5.899.

      Nakajima, Y., A. Cormier, R.G. Tyers, A. Pigula, Y. Peng, D.G. Drubin, and G. Barnes. 2011. Ipl1/Aurora-dependent phosphorylation of Sli15/INCENP regulates CPC-spindle interaction to ensure proper microtubule dynamics. J Cell Biol. 194:137–153. doi:10.1083/JCB.201009137.

      Noujaim, M., S. Bechstedt, M. Wieczorek, and G.J. Brouhard. 2014. Microtubules accelerate the kinase activity of Aurora-B by a reduction in dimensionality. PLoS One. 9. doi:10.1371/JOURNAL.PONE.0086786.

      Okada, Y., and N. Hirokawa. 1999. A processive single-headed motor: Kinesin superfamily protein KIF1A. Science (1979). 283:1152–1157. doi:10.1126/SCIENCE.283.5405.1152.

      Rice, S., A.W. Lin, D. Safer, C.L. Hart, N. Naber, B.O. Carragher, S.M. Cain, E. Pechatnikova, E.M. Wilson-Kubalek, M. Whittaker, E. Pate, R. Cooke, E.W. Taylor, R.A. Milligan, and R.D. Vale. 1999. A structural change in the kinesin motor protein that drives motility. Nature 1999 402:6763. 402:778–784. doi:10.1038/45483.

      Sablin, E.P., F.J. Kull, R. Cooke, R.D. Vale, and R.J. Fletterick. 1996. Crystal structure of the motor domain of the kinesin-related motor ncd. Nature 1996 380:6574. 380:555–559. doi:10.1038/380555a0.

      Samejima, K., M. Platani, M. Wolny, H. Ogawa, G. Vargiu, P.J. Knight, M. Peckham, and W.C. Earnshaw. 2015. The Inner Centromere Protein (INCENP) Coil Is a Single α-Helix (SAH) Domain That Binds Directly to Microtubules and Is Important for Chromosome Passenger Complex (CPC) Localization and Function in Mitosis. J Biol Chem. 290:21460–21472. doi:10.1074/JBC.M115.645317.

      Schiffrin, B., S.E. Radford, D.J. Brockwell, and A.N. Calabrese. 2020. PyXlinkViewer: A flexible tool for visualization of protein chemical crosslinking data within the PyMOL molecular graphics system. Protein Sci. 29:1851–1857. doi:10.1002/PRO.3902.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study advances our understanding of the cell specific treatment of cone photoreceptor degeneration by Txnip. The evidence supporting the conclusions is convincing with rigorous genetic manipulation of Txnip mutations, however, there are a few areas in which the article may be improved through further analysis and application of the data. The work will be of broad interest to vision researchers, cell biologists and biochemists.

      Reviewer #1 (Public Review):

      Summary:

      This is a follow-up study to the authors' previous eLife report about the roles of an alpha-arrestin called protein thioredoxin interacting protein (Txnip) in cone photoreceptors and in the retinal pigment epithelium. The findings are important because they provide new information about the mechanism of glucose and lactate transport to cone photoreceptors and because they may become the basis for therapies for retinal degenerative diseases.

      Strengths:

      Overall, the study is carefully done and, although the analysis is fairly comprehensive with many different versions of the protein analyzed, it is clearly enough described to follow. Figure 4 greatly facilitated my ability to follow, understand and interpret the study.

      Weaknesses:

      I have just one concern that I would like the authors to address. It is about the text that begins at line 133: "We assayed their ability to clear GLUT1 from the RPE surface (Figure 2A)". Please provide more details about this. From the figure it appears that n = 1 for this experiment, but given how careful the authors are with these types of studies that seems unlikely. How did the authors quantify the ability to clear GLUT1 from the surface? Was it cleared from both the apical and basal surface? (It is hard to resolve the apical and basal surfaces in the images provided). The experiments shown in Fig. 1H and Fig. 1I of PMID 31365873 shows how GLUT1 disappears only from the apical surface (under the conditions of that experiment and through the mechanism described in their text). It would be helpful for the authors to discuss their current results in the context of that experiment.

      We repeated all eight AAV-Best1-Txnip alleles for RPE GLUT1 staining with more than three eyes of each condition. We also quantified the GLUT1 intensity on the RPE basal surface. A new Figure 2-figure supplement 1 with these data has been added to this submission. The results and conclusions are similar to those in our initial submission.

      As mentioned in our provisional responses: GLUT1 on the basal surface of the RPE is more easily scored than that on the apical surface. The photoreceptor inner segments and Müller glia microvilli also have GLUT1, and their processes are juxtaposed and/or intertwined with the apical processes of the RPE, making the apical process GLUT1 staining of the RPE much more difficult to score. In some sections where the RPE and the retina separate, we can score the apical process GLUT1 staining of the RPE, but we do not always have this situation in our sections. The current quantification in the new Figure 2-figure supplement 1 thus concerns only the basal staining.

      As a separate issue, Reviewer #1 mentioned the work of another group (Wang et al., 2019, PMID: 31365873), which claimed that, on the apical surface of the RPE, GLUT1 is down-regulated in a RP mouse strain, RhoP23H. We have not consistently observed such a down-regulation of GLUT1 in other RP mouse strains such as rd1, rd10 or Rho-/- (unpublished data; see review Xue and Cepko, 2023, PMID: 37460158). However, as we pointed out above, it is difficult to score GLUT1 staining on the RPE apical surface. It is even more difficult in the degenerating retina where RPE and photoreceptor processes degenerate. For reference, one can see images of degenerating RPE apical processes in Wu et al. 2021 (PMID: 33491671).

      Reviewer #2 (Public Review):

      The hard work of the authors is much appreciated. With overexpression of a-arrestin Txnip in RPE, cones and the combined respectively, the authors show a potential gene agnostic treatment that can be applied to retinitis pigmentosa. Furthermore, since Txnip is related to multiple intracellular signaling pathway, this study is of value for research in the mechanism of secondary cone dystrophy as well.

      There are a few areas in which the article may be improved through further analysis and application of the data, as well as some adjustments that should be made in to clarify specific points in the article.

      Reviewer #3 (Public Review):

      Summary:

      Xue et al. extended their groundbreaking discovery demonstrating the protective effect of Txnip on cone photoreceptor survival. This was achieved by investigating the protection of cone degeneration through the overexpression of five distinct mutated variants of Txnip within the retinal pigment epithelium (RPE). Moreover, the study explored the roles of two proteins, HSP90AB1 and Arrdc4, which share similarities or associations with Txnip. They found the protection of Txnip in RPE cells and its mechanism is different from its protection in cone cells. These discoveries have significant implications for advancing our understanding of the mechanisms underlying Txnip's protection on cone cells.

      Strengths: (1) Identify the roles of different Txnip mutations in RPE and their effects on the expression of glucose transporter

      (2) Dissect the mechanism of Txnip in RPE vs Cone photoreceptors in retinal degeneration models.

      (3) Explore the functions of ARrdc4, a protein similar to Txnip and HSP90AB1 in cone degeneration.

      Weaknesses:

      (1) Arrdc4 has deleterious effect on cone survival but no discussion on its mechanism.

      (2) Inhibition of HSP90 is known to cause retinal generation. It is unclear why inhibition enhances the protection of Txnip.

      As mentioned in our provisional responses, little was known about the function of Arrdc4 or HSP90AB1 in cones. We summarize some of the recent discoveries regarding these two proteins in the new Discussion:

      “Arrdc4, the most similar α-arrestin protein to Txnip that also has Arrestin N- and C- domains, accelerated RP cone death when transduced via AAV (Figure 1). This observation suggests that Txnip has unique functions that protect RP cones. Recently, Arrdc4 has been proposed to be critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al. 2023). The implication of this potential role in RP cone survival is unclear, but interestingly, the activation of the insulin/mTORC1 pathway is beneficial to RP cone survival (Punzo et al. 2009; Venkatesh et al. 2015).”

      “Little is known about the function of HSP90AB1. Knocking down Hsp90ab1 improved mitochondrial metabolism of skeletal muscle in a diabetic mouse model (Jing et al. 2018). Knocking out HSP90AA1, a paralog of HSP90AB1 which has 14% different amino acids, led to rod death and correlated with PDE6 dysregulation (Munezero et al. 2023). Inhibiting HSP90AA1 with small molecules transiently delayed cone death in human retinal organoids under low glucose conditions (Spirig et al. 2023). However, the exact role of HSP90AA1 in photoreceptors needs to be clarified, and the implications for HSP90AB1 in RP cones are still unclear. ”

      In addition, we used AlphaFold Multimer, an AI algorithm based on AlphaFold-2, to explore the possible interaction between TXNIP, PARP1 and HSP90AB1 in the revision. One of the predicted models is shown as the new Figure 5-figure supplement 2. The C-terminus of Txnip is predicted to link HSP90AB1 and PARP1 together in this model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have just one concern that I would like the authors to address. It is about the text that begins at line 133: "We assayed their ability to clear GLUT1 from the RPE surface (Figure 2A)". Please provide more details about this. From the figure it appears that n = 1 for this experiment, but given how careful the authors are with these types of studies that seems unlikely. How did the authors quantify the ability to clear GLUT1 from the surface? Was it cleared from both the apical and basal surface? (It is hard to resolve the apical and basal surfaces in the images provided). The experiments shown in Fig. 1H and Fig. 1I of PMID 31365873 shows how GLUT1 disappears only from the apical surface (under the conditions of that experiment and through the mechanism described in their text). It would be helpful for the authors to discuss their current results in the context of that experiment.

      See our responses to Review #1’s public review section above.

      Also, is the clearance from the RPE plasma membrane homogenous throughout the RPE monolayer?

      In the area of AAV infection, the effects are very homogenous. In the uninfected area, the clearance does not occur, and we consider the uninfected area of the same eye to be an excellent internal control.

      A statistical analysis (as was provided for other experiments in the manuscript) would help to make the surprising conclusion about C.Txhniip.C247S more convincing.

      In this revision, we used the Mann-Whitney U test with the Bonferroni correction for GLUT1 intensity quantification. For the cone survival statistics, we used the t-test or ANOVA with Dunnett multiple comparison test. The information has been added to each figure legend.

      Another improvement I suggest for this figure is to include normal full length Txnip as a positive control to show how completely it removes GLUT1 from the surface.

      Added. See the new Figure 2-figure supplement 1.

      Another point that should be discussed is - when Txnip prevents GLUT1 from reaching the surface does all the GLUT1 get fully degraded within the cell. A brief description of how Txnip influences GLUT1 stability and localization would be helpful.

      We are unable to track the fate of the GLUT1 after it is removed, i.e. we do not see definitive intracellular staining. We do not know if this is due to degradation or a hidden epitope.

      Minor point

      (1) Confusing citation on lines 99-100: "We previously showed that overexpressing the Txnip wt allele in the RPE using an RPE specific promoter, derived from the Best1 gene (Esumi et al. 2009),.." makes it sound like Esumi et al. is the citation for their previous study, which is not correct.

      We have amended this to: "We previously showed (Xue et al. 2021) that overexpressing the Txnip wt allele in the RPE using an RPE-specific promoter, derived from the Best1 gene (Esumi et al., 2009), did not improve RP cone survival."

      Reviewer #2 (Recommendations For The Authors):

      Regarding the manuscript, here are some suggestions that authors can take into consideration for the completeness of the study:

      (1) The text references the relationship between α-arrestin and glucose metabolism in cone cells, but fails to provide an explanation for its specific involvement in glucose metabolism. Consequently, readers may struggle to discern the targeted metabolic pathway.

      We understand this point from Reviewer, and would love to know more about its mechanism, which is one reason why we undertook the current study. The mechanism(s) by which Txnip affects metabolism remains to be elucidated. To summarize our findings from our previous study, we showed that LDHB, which converts lactate to pyruvate, was required for Txnip-mediated rescue. Addition of the LDHB gene, however, did not boost rescue. We also showed that mitochondrial size and membrane potential were improved, and the Na/K pump function was improved, in Txnip-treated cones. Improved mitochondria were not sufficient, however, as revealed by a PARP-1 KO mouse with improved mitochondria that did not extend cone survival. In addition, using a Txnip mutant that does not remove the glucose transporter, we still saw cone rescue, so this function cannot be required for Txnip-mediated rescue. How does Txnip lead to improved mitochondria and to a reliance on lactate? We do not know.

      (2) Although the author conducted an experiment on arrdc14 due to its similarity to Txnip, the lack of clarification on why arrdc4, with a 60% amino acid similarity, did not yield the same effects as Txnip remains unaddressed. Highlighting structural disparities or differences in intracellular signaling pathways could potentially shed light on this incongruity. Subsequently, an additional experiment may be warranted to test the hypothesis regarding the effective component of α-arrestin for cone rescue.

      Additional experiments are needed to learn of the relevant differences between Arrdc4 and Txnip, but are beyond the scope of our work at the present. However, we have added a paragraph on newly published data on the function of Arrdc4 in the new Discussion:

      “Arrdc4, the most similar α-arrestin protein to Txnip that also has Arrestin N- and C- domains, accelerated RP cone death when transduced by AAV (Figure 1). This observation suggests that Txnip has unique functions that protect RP cones. Recently, Arrdc4 has been proposed to be critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al. 2023). The implication of this potential role regarding RP cone survival is unclear, but interestingly, the activation of the insulin/mTORC1 pathway is beneficial to RP cone survival (Punzo et al. 2009; Venkatesh et al. 2015).”

      (3) The utilization of distinct mutant Txnip variants to impact RPE, cones, and their combined influence is noted. A comparative table elucidating the impact of cone rescue on these three targets would greatly enhance clarity.

      We presented these data in Figure 4 in a table format.

      Additionally, the text does not definitively establish whether Txnip.C247S.LL351 and 352AA, as well as Txnip.C247S, indeed manifest discrepancies when exclusively affecting RPE.

      We edited a sentence in Results to: “Similar to Best1-wt Txnip (Xue et al., 2021), Best1-Txnip.C247S did not show significant improvement of cone survival, ruling out the C247S mutation alone as promoting the cone survival by Best1-Txnip.C247S.LL351 and 352AA.”

      (4) While the text mentions that Txnip stimulates lactate utilization within cones, it remains unclear whether this effect extends to RPE. If applicable, this trait could potentially contribute to its role in cone rescue.

      We agree with the Reviewer, and hope to address this question in our next study.

      (5) The discussion introduces the notion that one potential mechanism for cone rescue by Txnip.C247S involves facilitating unhindered movement of Thioredoxin for redox processes. To validate this hypothesis and elucidate the mechanics of Txnip's involvement in cone rescue, it may be prudent to conduct further experiments concentrating on the interaction between Txnip and thioredoxin. Alternatively, an experiment aimed at upregulating Thioredoxin expression would be a valuable addition.

      We hope to address this question in the future. However, the effect may be more complicated than our simple hypothesis regarding release of Thioredoxin. More than a dozen proteins were found to differentially interact with Txnip vs. Txnip.C247S (Forred et al. 2016).

      Reviewer #3 (Recommendations For The Authors):

      (1) Glucose transporter 1 is identified as an important mechanism in the protection of cone degeneration. It is unclear why GLut1 is upregulated in retinal cells although the expression of Txnip mutants are specifically in the RPE in Figure 2.

      This retinal GLUT1 upregulation was not consistently observed in the treated eyes, so we did not comment on it in the text.

      (2) Mutant N. Txnip was mentioned in the discussion that it causes obvious retinal degeneration. The quantification of retinal thickness from Figure 2 will be more rigorous.

      Unlike the robust effects of Best1-N.Txnip on RPE GLUT1 level, this negative effect of Best1-N.Txnip on ONL thickness was not consistent. This result does not undermine the other major conclusions. Therefore, we deleted the related sentence of the original text: “This hypothesis is supported by the observation that N.Txnip led to an obvious thinning of the outer nuclear layer of the wt retina, reflecting a loss of photoreceptors”. We did leave in the related finding as follows:

      “The N-terminal half of Txnip (1-228aa) might exert harmful effects in the RPE, that negate the beneficial effects from the C-terminal half, suggested by the observation that its removal, in the C-terminal 149-397 allele, led to better cone survival when expressed in the RPE (Figure 2). In cones, the C-terminal half, including the C-terminal IDR tail, may cooperate with the N-terminal half, or negate its negative effects, to benefit RP cone survival. However, the C-terminal half is not sufficient for cone rescue when expressed in cones, as the 149-397 allele did not rescue.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels.

      Strengths:

      The experiments were thorough and well designed. The results are compelling and support the main claim. The development and the use of the DrosoX two-choice assay put forward for a more quantitative and automatic/unbiased assessment for ingestion volume and preference.

      Weaknesses:

      There are a few inconsistencies with respect the the exact role by which IR60b neurons limit high salt consumption and the contribution of external (labellar) high-salt sensors in regulating high salt consumption. These weaknesses do not significantly impact the main conclusion, however.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Sang et al. set out to identify gustatory receptors involved in salt taste sensation in Drosophila melanogaster. In a two-choice assay screen of 30 Ir mutants, they identified that Ir60b is required for avoidance of high salt. In addition, they demonstrate that activation of Ir60b neurons is sufficient for gustatory avoidance using either optogenetics or TRPV1 to specifically activate Ir60b neurons. Then, using tip recordings of labellar gustatory sensory neurons and proboscis extension response behavioral assays in Ir60b mutants, the authors demonstrate that Ir60b is dispensable for labellar taste neuron responses to high salt and the suppression of proboscis extension by high salt. Since external gustatory receptor neurons (GRNs) are not implicated, they look at Poxn mutants, which lack external chemosensory sensilla but have intact pharyngeal GRNs. High salt avoidance was reduced in Poxn mutants but was still greater than Ir60b mutants, suggesting that pharyngeal gustatory sensory neurons alone are sufficient for high salt avoidance. The authors use a new behavioral assay to demonstrate that Ir60b mutants ingest a higher volume of sucrose mixed with high salt than control flies do, suggesting that the action of Ir60b is to limit high salt ingestion. Finally, they identify that Ir60b functions within a single pair of gustatory sensory neurons in the pharynx, and that these neurons respond to high salt but not bitter tastants.

      Strengths:

      A great strength of this paper is that it rigorously corroborates previously published studies that have implicated specific Irs in salt taste sensation. It further introduces a new role for Ir60b in limiting high salt ingestion, demonstrating that Ir60b is necessary and sufficient for high salt avoidance and convincingly tracing the action of Ir60b to a particular subset of gustatory receptor neurons. Overall, the authors have achieved their aim by identifying a new gustatory receptor involved in limiting high salt ingestion. They use rigorous genetic, imaging, and behavioral studies to achieve this aim, often confirming a given conclusion with multiple experimental approaches. They have further done a great service to the field by replicating published studies and corroborating the roles of a number of other Irs in salt taste sensation. An aspect of this study that merits further investigation is how the same gustatory receptor neurons and Ir in the pharynx can be responsible for regulating the ingestion of both appetitive (sugar) and aversive tastants (high salt).

      A previous report published in eLife from John Carlson’s lab (Joseph et al, 2017) showed that the Ir60b GRN in the pharynx responds to sucrose resulting in sucrose repulsion. Thus, stimulation of this pharyngeal GRN results in gustatory avoidance only, not both attraction and avoidance. (lines 205-207)

      Weaknesses:

      There are several weaknesses that, if addressed, could greatly improve this work.

      (1) The authors combine the results and discussion but provide a very limited interpretation of their results. More discussion of the results would help to highlight what this paper contributes, how the authors interpret their results, and areas for future study.

      We agree and have now separated the Results and Discussion, and in so doing have greatly expanded discussion of the results.

      (2) The authors rename previously studied populations of labellar GRNs to arbitrary letters, which makes it difficult to understand the experiments and results in some places. These GRN populations would be better referred to according to the gustatory receptors they are known to express.

      One of the corresponding authors (Craig Montell) introduced this alternative GRN nomenclature in a review in 2021: Montell, C. (Drosophila sensory receptors—a set of molecular Swiss Army Knives. Genetics 217, 1-34) (Montell, 2021). We are not fans of referring to different classes of GRNs based on the receptors that they express since it is not obvious which receptors to use. For example, the GRNs that respond to bitter compounds all express multiple GR co-receptors. The same is true for the GRNs that respond to sugars. The former system of referring to GRNs simply as sugar, bitter, salt and water GRNs is also not ideal since the repertoire of chemicals that stimulates each class is complex. For example, the Class A GRNs (formerly sugar GRNs) are also activated by low Na+, glycerol, fatty acids, and acetic acid, while the B GRNs (former bitter GRNs) are also stimulated by high Na+, acids, polyamines, and tryptophan. In addition, there are five classes of GRNs. At first mention of the Class A—E GRNs, we mention the most commonly used former nomenclature of sugar, bitter, salt and water GRNs. In addition, for added clarify, we now also include a mention of one of the receptors that mark each class. (lines 51-59)

      (3) The conclusion that GRNs responsible for high salt aversion may be inhibited by those that function in low salt attraction is not well substantiated. This conclusion seems to come from the fact that overexpression of Ir60b in salt attraction and salt aversion sensory neurons still leads to salt aversion, but there need not be any interaction between these two types of sensory neurons if they act oppositely on downstream circuits.

      We did not make this claim.

      (4) The authors rely heavily on a new Droso-X behavioral apparatus that is not sufficiently described here or in the previous paper the authors cite. This greatly limits the reader's ability to interpret the results.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      Reviewer #3 (Public Review):

      Summary:

      Sang et al. successfully demonstrate that a set of single sensory neurons in the pharynx of Drosophila promotes avoidance of food with high salt concentrations, complementing previous findings on Ir7c neurons with an additional internal sensing mechanism. The experiments are well-conducted and presented, convincingly supporting their important findings and extending the understanding of internal sensing mechanisms. However, a few suggestions could enhance the clarity of the work.

      Strengths:

      The authors convincingly demonstrate the avoidance phenotype using different behavioral assays, thus comprehensively analyzing different aspects of the behavior. The experiments are straightforward and well-contextualized within existing literature.

      Weaknesses:

      Discussion

      While the authors effectively relate their findings to existing literature, expanding the discussion on the surprising role of Ir60b neurons in both sucrose and salt rejection would add depth. Additionally, considering Yang et al. 2021's (https://doi.org/10.1016/j.celrep.2021.109983) result that Ir60b neurons activate feeding-promoting IN1 neurons, the authors should discuss how this aligns with their own findings.

      Yang et al. demonstrated that the activation of Ir60b neurons can trigger the activation of IN1 neurons akin to pharyngeal multimodal (PM) neurons, potentially leading to enhanced feeding (Yang et al, 2021). However, our research reveals a specific pattern of activation for Ir60b neurons. Instead of being generalists, they are specialized for certain sugars, such as sucrose and high salt. Consequently, while Ir60b GRNs activate IN1 neurons, we contend that there are other neurons in the brain responsible for inhibiting feeding. (lines 412-417)

      Lines 187: The discussion primarily focuses on taste sensillae outside the labellum, neglecting peg-type sensillae on the inner surface. Clarification on whether these pegs contribute to the described behaviors and if the Poxn mutants described also affect the pegs would strengthen the discussion.

      We added the following to the Discussion section. “We also found that the requirement for Ir60b appears to be different when performing binary liquid capillary assay (DrosoX), versus solid food binary feeding assays. When we employed the DrosoX assay to test mutants that were missing salt aversive GRNs in labellar bristles but still retained functional Ir60b GRNs, the flies behaved the same as wild-type flies (e.g. Figure 3J and 3L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al, 2015), displayed repulsion to high salt food that was intermediate between control flies and the Ir60b mutant (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), and these hairless taste organs become exposed to food only when the labial palps open. We suggest that there are high-salt sensitive GRNs associated with taste pegs, which are accessed when the labellum contacts a solid substrate, but not when flies drink from the capillaries used in DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B).”. (lines 430-444)

      In line 261 the authors state: "We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, similar to what was observed with Ir56b 8; however, this did not generate a salt receptor (Figures S6A)"

      An obvious explanation would be that these neurons are missing the identified necessary co-receptors Ir76b and Ir25a. The authors should discuss here if the Gr33a neurons they target also express these co-receptors, if yes this would strengthen their conclusion that an additional receptor might be missing.

      We clarified this point in the Discussion section as follows, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al, 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al, 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites”. (lines 464-477)

      Methods

      The description of the Droso-X assay seems to be missing some details. Currently, it is not obvious how the two-choice is established. Only one capillary is mentioned, I assume there were two used? Also, the meaning of the variables used in the equation (DrosoX and DrosoXD) are not explained.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      The description of the ex-vivo calcium imaging prep. is unclear in several points:

      (1) It is lacking information on how the stimulus was applied (was it manually washed in? If so how was it removed?).

      We expanded the description of the apparatus in the ex vivo calcium imaging section of the Materials and Methods. (lines 682-716)

      (2) The authors write: "A mild swallow deep well was prepared for sample fixation." I assume they might have wanted to describe a "shallow well"?

      We deleted the word “deep.”.(line 691)

      (3) "...followed by excising a small portion of the labellum in the extended proboscis region to facilitate tastant access to pharyngeal organs." It is not clear to me how one would excise a small portion of the labellum, the labellum depicts the most distal part of the proboscis that carries the sensillae and pegs. Did the authors mean to say that they cut a part of the proboscis?

      Yes. We changed the sentence to “…followed by excising a small portion of the extended proboscis to facilitate tastant access to the pharyngeal organs.”.(lines 693)-695

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels. In general, I find the collective evidence presented by the authors convincing. But I feel the MS can benefit from having a discussion session and a few simple experiments. Below I listed some inconsistencies I hope the authors can address or at least discuss.

      We have now added a Discussion section, and expanded the discussion.

      (1) The role of IR60b neurons on suppressing PER appeared inconsistent. On the one hand, optogenetic activation of these neurons suppressed PER (Fig 1D), on the other hand, IR60b mutants were as competent to suppress PER in response to high salt as WT (Fig 2G). Are pharyngeal neurons expected to modulate PER? It might be worth including a retinal-free or genotype control to ascertain the PER suppression exhibited by IR60b>CsChrimson is genuine.

      Please note that Figure 2G is now Figure 2H.

      Our interpretation is that activation of aversive GRNs by high salt either in labellar bristles or in the pharynx is sufficient to inhibit repulsion to high salt. Consistent with this conclusion, optogenetic activation of Ir60b GRNs, which are specific to the pharynx, is sufficient to reduce the PER to sucrose containing food (Figure 1D). However, mutation of Ir60b has no impact on the PER to sucrose plus high (300 mM) NaCl since the high-salt activated GRNs in labellar bristles are not impaired by the Ir60b mutation. In contrast, Ir25a and Ir76b are required in both labellar bristles and in the pharynx to reject high salt. As a consequence, mutation of either Ir25a or Ir76b impairs the repulsion to high salt. Thus, there is no inconsistency between the optogenetics and PER results. We clarified this point in the Discussion section. In terms of controls for IR60b>CsChrimson, we show that UAS-CsChrimson alone or UAS-CsChrimson in combination with the Gr5a driver has no impact on the PER (Figure 1D). In addition, we now include a retinal free control (Figure 1D). These findings provide the key genetic controls and are described in the Results section. (lines 167-170)

      (2) The role of labellar high-salt sensors in regulating salt intake appeared inconsistent. On the one hand, they appeared to have a role in limiting high salt consumption because poxn mutants were significantly more receptive to high salt than WT (Fig. 2J). On the other hand, selectively restoring IR76b or IR25a in only the IR60b neurons in these mutants - thus leaving the labellar salt sensors still defective - reverted the flies to behave like WT when given a choice between sucrose vs. sucrose+high salt (Fig 3J, L).

      We now offer an explanation for these seemingly conflicting results in the Discussion section. When we employed the DrosoX assay with mutants with functional Ir60b GRNs, but were missing salt aversive GRNs in labellar bristles, the flies behaved the same as control flies (e.g. Figure 3J and L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al., 2015), display aversion high salt food intermediate between control and Ir60b mutant flies (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), which are exposed to food substrates only when the labial palps open. We suggest that the taste pegs harbor high salt sensitive GRNs, and they may be exposed to solid substrates, but not to the liquid in capillary tubes used in the DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B). (lines 433-444)

      (3) The behavior sensitivity of IR60b mutant to high salt again appeared somewhat inconsistent when assessed in the two different choice assays. IR60b mutant flies were indifferent to 300 mM NaCl when assayed with DrosoX (Fig 3A, B) but were clearly still sensitive to 300 mM NaCl when assayed with "regular" assay - they showed much reduced preference for 5 mM sucrose over 1 mM sucrose when the 5 mM sucrose was adulterated with 300 mM NaCl (Fig 1B).

      The explanation provided above may also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but not when selecting between 300 mM NaCl and 5 mM sucrose versus 1 mM sucrose in the solid food binary assay (Figure 1B). Alternatively, the different behavioral responses might be due to the variation in sucrose concentrations in each of these two assays, which employed 5 mM sucrose in the solid food binary assay, as opposed to 100 mM sucrose in the DrosoX assay. This disparity in attractive valence between these two concentrations of sucrose might consequently impact feeding amount and preference. This point is now also included in the Discussion section. (lines 441-449)

      (4) Given the IR60b neurons exhibited clear IR60b/IR25a/IR76b-dependent sucrose sensitivity, too, I am curious how the various mutant animals behave when given a choice between 100 mM sorbitol vs. 100 mM sorbitol + 300 mM NaCl, a food choice assay not complicated by the presence of sucrose. Similarly, I am curious if the Ca2+ response of IR60 neurons differs significantly when presented with 100 mM sucrose vs. when presented with 100 mM sucrose + 300 mM NaCl. In principle, the magnitude for the latter should be significantly larger than the former as animals appeared to be capable of discriminating these two choices solely relying on their IR60b neurons.

      To investigate the aversion induced by high salt in the absence of a highly attractive sugar, such as sucrose, we combined 300 mM salt with 100 mM sorbitol, which is a tasteless but nutritive sugar (Burke & Waddell, 2011; Fujita & Tanimura, 2011). Using two-way choice assays, we found that the Ir25a, Ir60b, and Ir76b mutants exhibited substantial reductions in high salt avoidance (Figure 3—figure supplement 2A). In addition, we performed DrosoX assays using 100 mM sorbitol alone, or sorbitol mixed with 300 mM NaCl. Sorbitol alone provoked less feeding than sucrose since it is a tasteless sugar (Figure 3—figure supplement 2B and C). Nevertheless, addition of high salt to the sorbitol reduced food consumption (Figure 3—figure supplement 2B and C). (lines 300-308)

      We also conducted a comparative analysis of the Ca2+ responses within the Ir60b GRN, examining its reaction to various stimuli, including 100 mM sucrose alone, 300 mM NaCl alone, and a combination of 100 mM sucrose and 300 mM NaCl. We found that the Ca2+ responses were significantly higher when we exposed the Ir60b GRN to 300 mM NaCl alone, compared with the response to 100 mM sucrose alone (Figure 4—figure supplement 1D). However, the GCaMP6f responses was not higher when we presented 100 mM sucrose with 300 mM NaCl, compared with the response to 300 mM NaCl alone (Figure 4—figure supplement 1D). (lines 360-367)

      Minor issues

      (1) The labels of sucrose concentration on Figure 2D were flipped.

      This has been corrected.

      (2) The phrasing of the sentence that begins in line 196 (i.e., "This suggests the internal sensor ...") is not as optimal.

      We changed the sentence to, “We found that the aversive behavior to high salt was reduced in the Poxn mutants relative to the control (Figure 2J), consistent with previous studies demonstrating roles for GRNs in labellar bristles in high salt avoidance (Jaeger et al, 2018; McDowell et al, 2022; Zhang et al, 2013).”. (lines 217-219)

      (3) In Line 231, I am not sure why the authors think ectopic expressing IR60b in labellar neurons would allow them to become activated by Na+. It seems highly unlikely to me, especially given IR60b also plays a role in sensing sugar.

      We added the following paragraph to the Discussion addressing this point, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al., 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al., 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites.”. (lines 464-477)

      Reviewer #2 (Recommendations For The Authors):

      Line 41, acutely excessive salt ingestion can lead to death, not just health issues

      We now state that, “consumption of excessive salt can contribute to various health issues in mammals, including hypertension, osteoporosis, gastrointestinal cancer, autoimmune diseases, and can lead to death.”. (lines 41-43)

      Line 46, delete the comma after flies

      Done. (line 47)

      Lines 51-56: This description is unnecessarily confusing and does not cite proper sources. Renaming these GRNs arbitrarily can only create confusion, plus this description lacks nuance. If E GRNs are Ir94e positive, this description is out of date. Furthermore, If D GRNs are ppk23 and Gr66a positive then they will respond to both bitter and high salt.

      Papers to consult: https://elifesciences.org/articles/37167 10.1016/j.cell.2023.04.038

      We have now added citations. We prefer the A—E nomenclature, which was introduced in a 2021 Genetics review by one of the authors of this manuscript (Montell) (Montell, 2021) since naming different classes of GRNs on the basis of markers or as sweet, bitter, salt and water GRNs is misleading and an oversimplification. We cite the Genetics 2021 review, and for added clarity include both types of former names (markers and sweet, bitter, salt and water). Class D GRNs are not marked by Gr66a. The eLife reference cited above provided the initial rationale for stating that Class E GRNs are marked by Ir94e and activated by low salt. According to the Taisz et al reference (Cell 2023), the Class E GRNs, which are marked by Ir94e, are also activated by pheromones, which we now mention (Taisz et al, 2023). (lines 51-59)

      Line 62, E GRNs are not required for low salt behaviors

      We do not state that E GRNs are required for low salt behaviors, only that they sense low Na+ levels. (line 58)

      Line 70-81 - Great deal of emphasis on labellar GRNs but then no mention of how pharyngeal GRNs fit into categories A-E

      We devote the following paragraph to pharyngeal GRNs. We do not mention how they fit in with the A—E categories because it is not clear.

      “In addition to the labellum and taste bristles on other external structures, such as the tarsi, fruit flies are endowed with hairless sensilla on the surface of the labellum (taste pegs), and three internal taste organs lining the pharynx, the labral sense organ (LSO), the ventral cibarial sense organ (VCSO), and the dorsal cibarial sense organ (DCSO), which also function in the decision to keep feeding or reject a food (Chen & Dahanukar, 2017, 2020; LeDue et al., 2015; Nayak & Singh, 1983; Stocker, 1994). A pair of GRNs in the LSO express a member of the gustatory receptor family, Gr2a, and knockdown of Gr2a in these GRNs impairs the avoidance to slightly aversive levels of Na+ (Kim et al, 2017). Pharyngeal GRNs also promote the aversion to bitter tastants, Cu2+, L-canavanine, and bacterial lipopolysaccharides (Choi et al, 2016; Joseph et al., 2017; Soldano et al, 2016; Xiao et al, 2022). Other pharyngeal GRNs are stimulated by sugars and contribute to sugar consumption (Chen & Dahanukar, 2017; Chen et al, 2021; LeDue et al., 2015). Remarkably, a pharyngeal GRN in each of the two LSOs functions in the rejection rather the acceptance of sucrose (Joseph et al., 2017).”. (lines 74-89)

      Line 89, aversive --> aversion

      We changed this part.

      Line 90, gain of aversion capsaicin avoidance suggests they are sufficient for avoidance, not essential for avoidance.

      We changed “essential” to “sufficient.”. (line 100)

      Line 104, what are you recording from here? Labellar or pharyngeal GRNs

      We added “S-type and L-type sensilla” to the sentence. (line 119)

      Line 107, How are A GRNS marked with tdTomato? It is important to mention how you are defining A GRNs.

      We modified the sentence as follows: “Using Ir56b-GAL4 to drive UAS-mCD8::GFP, we also confirmed that the reporter was restricted to a subset of Class A GRNs, which were marked with LexAop-tdTomato expressed under the control of the Gr64f-LexA (Figure 1—figure supplement 1D—F).”. (lines 120-123)

      Line 124, should read "concentrated as sea water."

      We made the change. (line 142)

      Line 125, I am not sure what is meant by "alarm neurons"

      We changed “additional pain or alarm neurons” to “nociceptive neurons.”. (line 144)

      Line 141, Are you definitely A GRNs as only labellar GRNs, i.e. the Gr5a-GAL4 pattern with labellar plus few pharyngeal GRNs? Or are the defining it as Gr64f-GAL4 (i.e. labellar plus many pharyngeal GRNs)

      We refer to the Class A—E GRNs as labellar GRNs. Therefore, in this instance, we removed the reference to A GRNs and B GRNs, and simply mention the drivers that we used (Gr5a-GAL4 and Gr66a-GAL4) to express UAS-CsChrimson. The modified sentence is, “As controls we drove UAS-CsChrimson under control of either the Gr5a-GAL4 or the Gr66a-GAL4.”. (lines 51-59, 160-161)

      Line 180, labellar hairs--> labellar taste bristles

      We made the change. (line 204)

      Line 190, possess only --> only possess

      We made the change. (line 216)

      Line 202, Should this read increased?

      Yes. We changed “reduced” to “increased.”. (line 225)

      Line 206, The information provided here and in reference 47 was not sufficient for me to understand how the Droso-X system works and whether it has been validated. Better diagrams and much more description is required for the reader to understand this system and assess its validity

      We now explain that the DrosoX “system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary is then monitored automatically over the course of 6 hours and recorded on a computer.”. (lines 238-243)

      Line 218-219, It would be helpful to expand on this to explain how the previous paper detected no difference. Is this because the contact time with the food is the same but the rate of ingestion is slower?

      Yes. This is correct. We now clarify this point by stating that, “In a prior study, it was observed that the repulsion to high salt exhibited by the Ir60b mutant was indistinguishable from wild-type (Joseph et al., 2017). Specifically, the flies were presented with drop of liquid (sucrose plus salt) at the end of a probe, and the Ir60b mutant flies fed on the food for the same period of time as control flies (Joseph et al., 2017). However, this assay did not discern whether or not the volume of the high salt-containing food consumed by the Ir60b mutant flies was reduced relative to control flies. Therefore, to assess the volume of food ingested, we used the DrosoX system, which we recently developed (Figure 3—figure supplement 1A) (Sang et al, 2021). This system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary was then monitored automatically over the course of 6 hours and recorded on a computer. We found that control flies consuming approximately four times more of the 100 mM sucrose than the sucrose mixed with 300 mM NaCl (Figure 3A). In contrast, the Ir25a, Ir60b, and Ir76b mutants consumed approximately two-fold less of the sucrose plus salt (Figure 3A). Consequently, they ingested similar amounts of the two food options (Figure 3B; ingestion index). Thus, while the Ir60b mutant and control flies spend similar amounts of time in contact with high salt-containing food when it is the only option (Joseph et al., 2017), the mutant consumes considerably less of the high salt food when presented with a sucrose option without salt.”. (lines 226-251)

      Lines 231-235, Is this evidence for this, that Ir60b expression in the Ir25a or Ir76b pattern will induce high salt responses in the labellum? You should elaborate on this to clearly state what you mean rather than implying it. I do not think that overexpression of one Ir is enough evidence for this sweeping conclusion.

      We agree. We eliminated this point. (lines 227-232)

      Lines 261-263, Please elaborate here, how did you target the I-type sensilla and where are these neurons? So they already express Ir76b and Ir25a?

      We now explain in the Results that, “We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. Gr33a is co-expressed with Gr66a (Moon et al., 2009), which has been shown to be co-expressed Ir25a and Ir76b (Li et al., 2023). When we performed tip recordings from I7 and I10 sensilla, we did not observe a significant increase in action potentials in response to 300 mM NaCl (Figure 4—figure supplement 1A), indicating that ectopic expression of Ir60b in combination with Ir25a and Ir76b is not sufficient to generate a high salt receptor.”. (lines 324-330)

      Lines 300-303, The discussion needs to be greatly expanded. What is the proposed mechanism by which the same neurons/receptors can inhibit sucrose and high salt feeding? What is the author's interpretation of what this study adds to our understanding of taste aversion?

      We have now added a Discussion section and greatly expanded the discussion.

      Reviewer #3 (Recommendations For The Authors):

      In line 73 there is a typo in "esophagus"

      We changed this part.

      In line 331, the use of a mixture of sucrose and "saponin" seems to be a mistake; "NaCl" is likely intended.

      We made the correction. (lines 546 and 640)

      On several occasions, the authors refer to the pharynx as a taste organ (for example 1st sentence of the abstract). I am not sure this is correct, the actual pharyngeal taste organs are the LSO, DSCO, and VSCO which are located in the pharynx.

      We made the corrections. (lines 24, 90, 92, 93, and 356)

      In line 155 the authors refer to Ir25a and Ir76b as "broadly tuned". I think it is not correct to refer to co-receptors this way, I'd suggest to just call them co-receptors.

      We made the correction. (lines 177-178)

      In line 182, stating "Gr2a is also expressed in the proboscis" is unclear. Clarify whether it refers to sensillae, pharyngeal taste organs, etc.

      We clarified it refers to pharyngeal taste organs. (lines 206-207)

      Line 253: "These finding imply that all three Irs are coexpressed in the pharynx." "The pharynx" is very unspecific, did the authors mean to say "the same neuron"?

      We now clarify by saying “in the Ir60b GRN in the pharynx.”. (line 317)

      Figures & Legends

      I found it confusing that the same color scale is being reused for different panels with different meanings repeatedly and in inconsistent ways. For example in Figure 2, red and blue are being used for Ir25a² mutants, while blue is also being used for Gr64f-Gal4 and S type sensilla. It is also not easily visible nor mentioned in the caption which of the 3 color scales presented belong to which panels.

      We modified the colors in the figures so that they are used in a consistent way. We now also define the colors in the legends.

      In Figure 2 F-I, indicating the stimulus sequence in each panel would enhance clarity. The color scale in Figure 3 could benefit from explicit explanations of different shades in the caption for easier interpretation.

      For example: "The ingestion of (a, dark color) 100 mM sucrose alone and (b, light color) in combination with 300 mM"

      We made the suggested modification.

      In Figure 4a the authors highlight that Ir76b and Ir25a label 2 neurons in the LSO. Did the imaging in 4c also capture the second cell, and if so did it respond to their stimulation?

      No, the focal plane differs, and the signal in Figure 4C is considerably weaker compared to the immunohistochemistry shown in Figure 4A. Notably, the other neuron did not exhibit a response to NaCl.

      In Figure 4f a legend for the color scale is missing, or the color might not be necessary at all. Also, the asterisks seem to be shifted to the right.

      We fixed the shifted asterisks and eliminated the color.

      Figure 4i is mislabeled 4f

      We made the correction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study highlights new insights into the mechanism of pheochromocytoma pathogenesis that remains poorly understood. In the context of hereditary syndromes, such as multiple endocrine neoplasia 2 (MEN-2), where RET mutation is the major driver of thyroid, parathyroid, and adrenal pathologies, including pheochromocytoma, this mechanistic dissection of RET and TMEM127 is fundamentally sound. While the significance was deemed important, the strength of the evidence was found to be solid,

      Recognizing the limitations of models available for study of neuroendocrine cancers, and specifically for pheochromocytomas, we have revised and clarified the text of the current manuscript version and provide specific responses to the additional comments provided below, highlighting changes and new data.

      Reviewer #1 (Recommendations For The Authors):

      A current lack of pheochromocytoma cell lines and the use of generated cell lines for mechanistic studies presents a significant challenge that may undermine the inferred value of these findings in mock in vitro systems and question reproducibility in pheochromocytoma. Consideration for 3-dimensional patient-derived pheochromocytoma organoid in vitro and patient-derived organoid xenograft in vivo models will enable confirmation or refute novel findings described by the authors.

      We agree completely with Reviewer 1 that ideally, we should replicate these findings with PCC-derived cells in vitro and in organoids. Despite many attempts, PCC cell lines have proved a major challenge for the field of neuroendocrine cancers. Cell line models are not available and PDOs have proven poorly growing and resistant to manipulations, such as CRISPR KOs or siRNA KD. In studies completed since the submission and review of the present manuscript, and subsequently published elsewhere, we have shown that RET protein is highly expressed in TMEM127-mutant PCC by immunohistochemistry. We also showed that the TMEM127-KO SH-SY5Y cell model does grow more robustly than Mock-KO cells in nude mice and that RET inhibition (Selpercatinib) does lead to tumor regression (Guo et al., 2023), suggesting that our findings may be reproducible in vivo. These findings, and potential caveats of the cell models used have been further discussed in the text.

      Reviewer #2 (Recommendations For The Authors):

      Most notably, all experiments are conducted in an isogenic single-cell line. This exposes the whole story to be potentially confounded by unknown variables.

      In addition, studies would benefit from the adding back of TMEM127, or other methods to modulate endosome and plasma membrane dynamics to mechanistically secure the cause of the findings.

      As suggested by Reviewer 2, we have generated a TMEM127 KO in HEK293, an unrelated cell line which expressed low levels of TMEM127 but does not express RET. Consistent with our findings in SH-SY5Y, we saw increased membrane accumulation of endogenous membrane proteins N-cadherin and transferrin receptor-1 in these cells in the absence of TMEM127. Additionally, re-expression of a wildtype TMEM127 (FLAG-TMEM127) in these cells led to dramatic decreases in membrane localization of these proteins (Supplemental Figure 1D). These data suggest that membrane accumulation is indeed TMEM127 dependent, and that these processes are not directly dependent on RET expression.

      References

      Guo, Q., Z.M. Cheng, H. Gonzalez-Cantu, M. Rotondi, G. Huelgas-Morales, P. Ethiraj, Z. Qiu, J. Lefkowitz, W. Song, B.N. Landry, H. Lopez, C.M. Estrada-Zuniga, S. Goyal, M.A. Khan, T.J. Walker, E. Wang, F. Li, Y. Ding, L.M. Mulligan, R.C.T. Aguiar, and P.L.M. Dahia. 2023. TMEM127 suppresses tumor development by promoting RET ubiquitination, positioning, and degradation. Cell Rep. 42:113070.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript by DeHaro-Arbona et al., the authors wish to understand how a signaling pathway (Notch) is dynamically decoded to elicit a specific transcriptional output. In particular, they investigate the kinetic properties of Notch-responsive nuclear complexes (the DNA binding factor CSL and its co-activator Mastermind (mam) along with several candidate interacting partners). Their experimental model is the polytene chromosome of the Drosophila salivary gland, in which the naturally inactive Notch can be artificially induced through the expression of a constitutively active form of Notch.

      The authors develop a series of CRISPR and transgenic lines enabling the live imaging of these complexes at a specific locus and in various backgrounds (genetic perturbations/drug treatments). This quantitative live imaging data suggests that Notch nuclear complexes form hubs, and the authors characterize their binding dynamics. Interestingly, they elegantly demonstrate that the content of these hubs and their kinetic properties can evolve, even within Notch ON cells. Hence, they propose the existence of distinct hubs, distinguishing an open (CSL), engaged (CSK-Mam), or active (CSL-Mam-Med-PolII) configuration in Notch ON cells and an inactive hub (in Notch OFF having previously been exposed to Notch) state, that would explain the surprising transcriptional memory that the authors observe hours after Notch withdrawal.

      We thank the reviewer for this constructive summary of our work

      Reviewer #2 (Public Review):

      The manuscript from deHaro-Arbona et al, entitled "Dynamic modes of Notch transcription hubs conferring memory and stochastic activation revealed by live imaging the co-activator Mastermind", uses single molecule microscopy imaging in live tissues to understand the dynamics and molecular determinants of transcription factor recruitment to the E(spl)-C locus in Drosophila salivary gland cells under Notch-ON and -OFF conditions. Previous studies have identified the major players that are involved in transcription regulation in the Notch pathway, as well as the importance of general transcriptional coregulators, such as CBP/P300 and the Mediator CDK module, but the detailed steps and dynamics involved in these processes are poorly defined. The authors present a wealth of single molecule data that provides significant insights into Notch pathway activation, including:

      (1) Activation complexes, containing CSL and Mam, have slower dynamics than the repressor complexes, containing CSL and Hairless.

      (2) Contribution of CSL, NICD, and Mam IDRs to recruitment.

      (3) CSL-Mam slow-diffusing complexes are recruited and form a hub of high protein concentrations around the target locus in Notch-ON conditions.

      (4) Mam recruitment is not dependent on transcription initiation or RNA production.

      (5) CBP/P300 or its associated HAT activity is not required for Mam recruitment.

      (6) Mediator CDK module and CDK8 activity are required for Mam recruitment, and vice-versa, but not CSL recruitment.

      (7) Mam is not required for chromatin accessibility but is dependent on CSL and NICD.

      (8) CSL recruitment and increased chromatin accessibility persist after NICD removal and loss of Mam, which confers a memory state that enables rapid re-activation in response to subsequent Notch activation.

      (9) Differences in the proportions of nuclei with both Pol II and with Mam enrichment, which results in transcription being probabilistic/stochastic. These data demonstrate that the presence of Mamcomplexes is not sufficient to drive all the steps required for transcription in every Notch-ON nucleus.

      (10) The switch from more stochastic to robust transcription initiation was elicited when ecdysone was added.

      Overall, the manuscript is well written, concise, and clear, and makes significant contributions to the Notch field, which are also important for a general understanding of transcription factor regulation and behavior in the nucleus. I recommend that the authors address my relatively minor criticisms detailed below.

      We thank the reviewer for their thorough and constructive summary of our work. We are glad that they overall found it insightful and interesting. Below we have addressed the points they have raised.

      Page 7, bottom. The authors speculate, "It is possible therefore that, once recruited, Mam can be retained at target loci independently of CSL by interactions with other factors so that it resides for longer." Is it possible that another interpretation of that data is that Mam is a limiting factor?

      As indicated our comment is a speculation and is based on the observations summarized in the paragraph. We are not entirely sure what the reviewer is proposing as an alternate model. However, if it relates to the relative concentrations of the different factors, this would not account for the differences in trajectory durations. And for most aspects of our analysis, K[off] has the most profound influence on the results. Furthermore, differences persist even when CSL levels are considerably reduced (as in conditions with Hairless RNAi).

      Page 9. The authors write, "A very low level of enrichment was evident for... for the CSL Cterminus..". The recruitment of CSL ct IDR does not appear to be statistically significant or there is no apparent difference (Figure S2C), suggesting the CSL ct IDR does not play a role in enrichment.

      We agree with the comments of the reviewer and have adjusted the text on page 9 accordingly.

      Page 9. The authors write, "Notably, MamnIDR::GFP fusion was present in droplets, suggesting it can self-associate when present in a high local concentration (Figure S2B)." Is this result only valid for Mam nIDR or does full-length Mam also localize into droplets, as has been previously observed for full-length mammalian Maml1 in transfected cells?

      We agree that the observed foci of MamL1 that have been detected in mammalian cells are interesting. We have not tried to replicate those data because the large size of Mam has made it challenging to produce a full-length form in over-expression. We note however that another portion of Mam, MamIDR, does not make droplets when over-expressed despite it containing a large section of the disordered region of the Drosophila Mam. We have now included a comment about the mammalian data in the text (page 9) to put our findings in context.

      Previous studies in mammalian cells suggest that Maml1 is a high-confidence target for phosphorylation by CDK8, see Poss et al 2016 Cell Reports https://doi.org/10.1016/j.celrep.2016.03.030. By sequence comparison, does fly Mam have similar potential phosphorylation sites, and might these be critical for Mam/CDK module recruitment?

      We thank the reviewer for highlighting this point. Indeed, we were very excited when we learnt that MamL1 was found to be a high confidence CDK8 target and we looked hard in the Mam sequence for potential phosphorylation sites. Sadly, there is very little conservation between the fly and the mammalian proteins beyond the helical region that contacts CSL and NICD. Furthermore, there are no identifiable putative CDK8 phosphorylation sites based on conventional motifs. It therefore remains to be established whether or not Mam is a direct target of the CDK8 kinase activity. We have added an explanatory comment in the text (page 11).

      Page 11: The authors write, "The differences in the effects on Mam and CSL imply that the CDK module is specifically involved in retaining Mam in the hub, and that in its absence other CSL complexes "win-out", either because the altered conditions favour them and/or because they are the more abundant." Are the "other" complexes the authors are referring to Hairless-containing complexes? With the reagents the authors have in hand couldn't this be explicitly shown for CSLcomplexes rather than speculated upon?

      The reviewer is correct that CSL complexes containing Hairless are good candidates to be recruited in these conditions. We have compared the levels of Hairless at E(spl)-C following treatments with Senexin and have not detected a difference. However, it appears that the high proportion of unbound Hairless makes it difficult to detect/quantify the enrichment at E(spl)-C. We have therefore taken a different strategy, which is to measure the recruitment of a mutant form of CSL that is compromised for Hairless binding. Recruitment of the mutant CSL is detected in Notch-ON conditions, but is significantly reduced/absent following Senexin treatment. These data favour the model proposed by the reviewer that in the absence of CDK8 activity, the CSL-Hairless complexes win out. These new data have been added in new Supplementary Figure S3F and S3G (and see text page 11)

      Page 12/13: The authors write, "Based on these results we propose that, after Notch activity decays, the locus remains accessible because when Mam-containing complexes are lost they are replaced by other CSL complexes (e.g. co-repressor complexes)." Again, why not actually test this hypothesis rather than speculate? The dynamics of Hairless complexes following the removal of Notch would be very interesting and build upon previously published results from the Bray lab.

      We thank the reviewer for this comment and we agree it’s possible that the proportion of Hairless complexes increases after Notch withdrawal. However, for the reasons outlined above, it is difficult to quantify changes in Hairless, (and our preliminary experiment did not reveal any large-scale effect) and because of the complexity of the genetics we cannot straightforwardly extend the experiment to analyze the behaviour of the mutant CSL as above. Therefore, at present, we cannot say whether the loss of Mam is compensated by an increase in Hairless. We hope in future to investigate the characteristics of the memory in more depth.

      Page 13: The authors write, "As Notch removal leads to a loss of Mam, but not CSL, from the hub, it should recapitulate the effects of MamDN." While the data in Figure 5B seem to support this hypothesis, it's not clear to me that the loss of Mam and MamDN should phenocopy each other, bc in the case of MamDN, NICD would still be present.

      We apologise that this sentence was a bit misleading. We have now rewritten it to improve accuracy (page 13) “As Notch removal leads to a loss of Mam, but not CSL, from the hub, we hypothesised it would recapitulate the effects of MamDN on chromatin accessibility and transcription of targets.”

      The temporal dynamics for Mam recruitment using the temperature- and optogenetic-paradigms are quite different. For example, in the optogenetic time course experiments, the preactivated cells are in the dark for 4 hours, while in the temperature-controlled experiments, there is still considerable enrichment of Mam at 4 hours. For the preactivated optogenetic experiments, how sure are the authors that Mam is completely gone from the locus, and alternatively, can the optogenetic experimental results be replicated in the temperature-controlled assays? My concern is whether the putative "memory" observation is just due to incomplete Mam removal from the previous activation event.

      We appreciate the concerns of the reviewer. However, we are confident that the 4-hour optogenetic inactivation is much more effective than the equivalent time for temperature shifts. The temperature sensitive experiment involves a longer decay, because not only the protein but also the mRNA has to decay to fully remove NICD activity. The optogenetic experiments, involve only protein decay and so are more acute. Furthermore, we have tested (and we show in Figure 5H) that Mam is fully depleted after 4 hours “Off” in the optogenetic experiments.

      In order to further strengthen the evidence in favour of the memory hub, we have extended the time-frame further to show that CSL is retained at the locus even after 24 hours “Notch OFF” in both the temperature and the optogenetic paradigm. We have also measured the effects on transcription after a 24hr OFF period using the optogenetic paradigm and seen that robust transcription is initiated in cells that have experienced a previous activation (preactivated) compared to those that have not (naïve). These new data have been added to new Figure 5 C-F and strongly support the memory model.

      Reviewer #3 (Public Review):

      Summary:

      DeHaro-Arbona and colleagues investigate the in vivo dynamics of Notch-dependent transcriptional activation with a focus on the role of the Mastermind (MAM) transcriptional co-activator. They use GFP and HALO-tagged versions of the CSL DNA-binding protein and MAM to visualize the complex, and Int/ParB to visualize the site of Notch-dependent E(Spl)-C transcription. They make several conclusions. First, MAM accumulates at E(Spl)-C when Notch signaling is active, just like CSL. Second, MAM recruits the CDK module of Mediator but does not initiate chromatin accessibility. Third, after signaling is turned off, MAM leaves the site quickly but CSL and chromatin accessibility are retained. Fourth, RNA pol II recruitment, Mediator recruitment, and active transcription were similar and stochastic. Fifth, ecdysone enhances the probability of transcriptional initiation.

      Strengths:

      The conclusions are well supported by multiple lines of extensive data that are carefully executed and controlled. A major strength is the strategic combination of Drosophila genetics, imaging, and quantitative analyses to conduct compelling and easily interpretable experiments. A second major strength is the focus on MAM to gain insights into the dynamics of transcriptional activation specifically.

      We thank the reviewer for their positive comments about the strengths of our work.

      Weaknesses:

      Weaknesses are minor. There were no p-values reported for data presented in Figure S1D and no indication of how variable measurements were. In addition, the discussion of stochasticity was not integrated optimally with relevant literature.

      We thank the reviewer for noting these points. The statistical tests have now been included for Figure S1D (now Figure S1F). We have amplified the discussion about stochasticity, to include more reference to the literature and to make clear also the distinction with transcription bursting (page 19, 20).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have an elegant series of manipulations that provide strong evidence for their hypotheses and conclusions. Their exploitation of a unique biological system amenable to imaging in the larval salivary gland is well-considered and well-performed. Most of the conclusions are supported by the data. I only have the concerns below.

      (1) One of the main findings is the composition of Notch nuclear complexes and their interactions within a 'hub'. Yet most of the data showing hubs focus on labeling one protein component (+the locus or transcription), but multi-color imaging is rarely used to show how CSL-Mam, Mam-Med... protein signals coalescence to form a hub. Given the powerful tool developed, it would be important to show these multi-state hubs. Related to this, if the authors expect that hubs are formed independently of transcription or Notch pathway activation, do the authors see clustering at other non-specific loci in the nucleus? If not, can the authors comment on why they think that is the case? If so, do they demonstrate consistent residence time profiles with the tracked E(spl) locus?

      We apologise that it was not evident from the data shown that the proteins co-localize. First we stress that all the experiments are multicolor and most rely on very powerful methods to measure co-recruitment at a chromosomal locus- something that is very rarely achieved by others studying hubs. Second, we have in all cases confirmed that the proteins do colocalize. We have modified the diagram of our analysis pipeline to make more clear that this relies on multi-colour imaging, and adjusted all the figure labels to indicate the position of E(spl)-C. We have also added panels to new supplementary Figure S1C with examples of the co-localization between CSL and Mam and a plot confirming their levels of recruitment are correlated across multiple nuclei.

      We would like to clarify that our data show that the hubs do require Notch activation for their establishment. Other regions of enrichment are detected in Notch-ON conditions, but these are less prominent and, with no independent method for identifying them, can’t be compared between nuclei. In SPT experiments, other clusters with consistent residence are detected as reported in our recent paper which expanded on the SPT data (Baloul et al, 2023). We also detect co-localizations and “hubs” in other tissues, but those analyses are ongoing and beyond the scope of this paper.

      (2) The authors convincingly show that Notch hub complexes exhibit a memory. While the data showing rapid hub reformation upon Notch withdrawal are solid and convincing (Figure 5, in particular, F), the claim that this memory fosters rapid transcriptional reactivation is less clear. Yet in order to invoke transcriptional memory, it's necessary to solidify this transcriptional response angle. The authors should consider quantifying the changes in transcription activity (at the TS and not in the cytoplasm as currently shown), as well as the timing of transcriptional reactivation (with the MS2 system or smFISH). Manipulating the duration of the activation and dark recovery periods could help to draw a better correlation between the timing of hub reformation and that of transcriptional response and would also help determine how persistent this phenomenon is.

      We thank the reviewer for these suggestions. We have carried out several new experiments to probe further the persistence of memory and to show the effects on transcription when Notch is inactivated/reactivated. First, we have extended the time period for Notch inactivation by temperature control and show that the CSL hub persists even at 24 hours and that no transcription from the target E(spl)m3 is detected –neither at the transcription start-site nor in the cytoplasm. Second, we have extended the Notch OFF time period to 24 hours using the optogenetic approach and show that transcription is robustly reinitiated in preactivated nuclei when Notch is re-activated with 30 mins light treatment while little if any E(spl)m3 transcription is detected in naïve nuclei with the same treatment. These new data are included in new Figure 5 C-F and see page 13-14. Both these new experiments substantiate the model that the nuclei retain transcriptional memory.

      (3) The manuscript ends with the finding that the presence of a Mam hub does not always correlate with transcription. They conclude that transcription is initially stochastic. The authors find this surprising and even state that this could not be observed without their in vivo live imaging approaches. I don't understand why this result is surprising or unexpected, as we now know that transcription is generally a stochastic process and that most (if not all) loci are transcribed in a bursting manner. The fact that E(spl)-C locus is bursty is already obvious from the smFISH data. The fact that active nascent transcription does not correlate with local TF hubs was already observed in early Drosophila embryos (with Zelda hubs and two MS2 reporters, hb-MS2, sna-MS2). If, in spite of the inherent stochasticity of transcription (bursting), the data are surprising for other reasons, the authors should explain it better.

      We apologise that we had not made clear the reasons why the results were unexpected. We have substantially rewritten this section, and the discussion section, to clarify. We have also moderated the language used to better reflect the overall context of our results. We briefly summarise here. As the reviewer correctly states, it is well known that transcription is inherently bursty. Indeed the MS2 transcription profiles in “ON” nuclei are bursty, which likely reflects the switching of the promoter. However, in other contexts where we have monitored transcription although it is bursty it has nevertheless been initiated synchronously in response to Notch in all nuclei in a manner that was fully penetrant. What we observe in our current conditions, is that some nuclei never initiate transcription over the time-course of our experiments (2-3 hours), and those that are ON rarely switch off. This implies that there is another rate-limiting step. Supplying a second signal can modulate this so that it occurs with much higher frequency/penetrance. We consider this to be a second tier of regulation above the fundamental transcriptional bursting.

      The fact that Mam is recruited in all nuclei, whether or not they are actively transcribing was surprising because recruitment of the activation complex has been considered as the limiting step. This is somewhat different from Zelda, which is thought to be permissive and needed at an early step to prime genes for later activation rather than to be the last step needed to fire transcription. We note also that we are not monitoring the position of the hub with respect to the promoter, as in the Zelda experiments (Zelda hubs may still persist, but they are not overlapping with the nascent RNA), we are monitoring the presence or absence of Mam hub in proximity to a genomic region.

      Minor suggestions:

      (1) The genotypes of the samples should be indicated in the figure legends.

      We thank the reviewer for this suggestion. We have provided a table (new Table S3) where all of the genetic combinations are provided in detail for each figure. We considered that this approach would be preferable because it would be quite cumbersome to have the genotypes in each legend as they would become very long and repetitive.

      (2) While the schematic Fig1A explains how the locus is detected, the presence of ParS/ParB is never indicated in subsequent panels and Figure. I assume that all panels depicting enrichment profiles, use a given radius from the ParS/ParB dot to determine the zero of the x-axis (grey zone). This should be clearly stated in all panels/figure legends concerned.

      We apologies if this was not made explicit. Yes, all panels depicting enrichment profiles, use immunofluorescence signal from ParA/ParB recruitment to determine the zero of the x-axis. We have now marked this more clearly In all figures (grey bar, grey shading or labelled 0). All images where the locus is indicated by an arrowhead, by a coloured bar above the intensity plots or by grey shading in the graphs have been captured with dual colour and the signal from ParA/B recruitment used to define its location. This is now clearly stated in the analysis methods and in the legend. We have also modified the diagram in new supplementary Figure S1B, showing our analysis pipeline, to make that more explicit.

      (3) FRAP/SPT experiments: the author should provide more details. How many traces? Are traces showing bleaching removed?

      P7: does the statement ' The residences are likely an underestimation because bleaching and other technical limitations also affect track durations' imply that traces showing bleaching have not been removed from the analysis?

      The authors could justify the choice of the model for fitting FRAP/Spt experiments and be cautious about their interpretation. For example, interpreting a kinetic behavior as a DNA-specific binding event can be accurate, only if backed up with measurements with a mutant version of the DNA binding domain.

      We apologise if some of this information was not evident. The number of trajectories is provided in new Figure S1F, which indicates the number of trajectories analyzed for each condition in Figure 1.

      We have now added also the numbers of trajectories analyzed for the ring experiments.

      The comments on page 7 about bleaching refer to the technical limitations of the SPT approach. However, as bleached particles cannot be distinguished from those that leave the plane of imaging, they have not been filtered or removed. We have not sought to make claims about absolute residence times for that reason. Rather the point is to make a comparison between the different molecules. As the same fluorescent ligand and imaging conditions are used in all the experiments, all the samples are equivalently affected by bleaching. We subdivide trajectories according to their properties and infer that those which are essentially stationary are bound to chromatin, as is common practice in the field. We note that we have previously shown that a DNA binding mutant of CSL does not produce a hub at E(spl)-C in Notch-ON conditions and has a markedly more rapid recovery in FRAP experiments (Gomez-Lamarca et al, 2018) consistent with the slow recovery being related to DNA binding. This point has been added to the text (page 8).

      (4) The authors should quantify their RNAi efficiency for Hairless-RNAi, Med13-RNAi, white-RNAi, yellow-RNAi, CBP-RNAi, and CDK8-RNAi.

      We thank the reviewer for this comment. We have made sure that we are using well validated RNAis in all our experiments and have included the references in Table S2 where they have been used. We have now evaluated the knock-down in the precise conditions used in our experiments by quantitative RT-PCR and added those data, which show efficient knock-down is occurring, to new Supplementary Figure S1D and Figure S3J. We note also that the RNAi experiments are complemented by experiments inhibiting the complexes with specific drugs and that these yield similar results.

      (5) Figure 3 A: could the author show that transcription is indeed inhibited upon triptolide treatment with smFISH (with for example m3 probes)? Why not use alpha-amanitin?

      We thank the reviewer for this suggestion. We had omitted the smFISH data from this experiment in error. These data have now been added to new Supplementary Figure S3A and clearly show that transcription is inhibited following 1 hour exposure to triptolide. Triptolide is a very fast acting and very efficient inhibitor of transcription that acts at a very early step in transcription initiation. In our experience it is much more efficient than alpha-amanitin and is now the inhibitor of choice in many transcription studies.

      (6) Figure 4 typo: panel B should be D and vice versa. Accessibility panels are referred to as Figure 4D, D' in the text but presented as panel B in the Figure.

      We thank the reviewer for noting this mistake, it is now changed in the main text.

      (7) The authors must add their optogenetic manipulation protocol to their methods section.

      The method is described in detail in a recently published paper that reports its design and use. We have now also added a section explaining the paradigm in the methods (Page 31) as requested.

      (8) Figure 3G needs a Y-axis label.

      Our apologies, this has now been added.

      (9) The authors should note why there was a change of control in Figure 3D compared to 3E and G (yellow RNAi vs white RNAi).

      This is a pragmatic choice that relates to the chromosomal site of the RNAis being tested. Controls were chosen according to the chromosome that carries the UAS-RNAi: for the second chromosome this was yellow RNAi and for the third white RNAi. This is explained in the methods.

      (10) Figure 1 would benefit from a diagram describing the genomic structure of the E(spl) locus and the relative position of the labelled locus within it.

      We thank the reviewer for this suggestion and have added a diagram to Supplementary Figure S1A .

      Reviewer #2 (Recommendations For The Authors):

      Minor criticisms and typos:

      Pet peeve: in some of the figure panels they are labeled Notch ON or OFF, but in others they are not, albeit that info is included in the figure legend. For the ease of the reader/reviewer, would it be possible to label all relevant figure panels either Notch ON or OFF for clarity?

      We thank the reviewer for this suggestion and have modified the figures accordingly.

      Page 7, top. "In comparison to their average distribution across the nucleus, both CSL and Mam trajectories were significantly enriched in a region of approximately 0.5 μm around the target locus in Notch-ON conditions, reflecting robust Notch dependant recruitment to this gene complex." Are the authors referring to Figure 1D here?

      Thank you, this figure call-out has been added in the text.

      Page 9. "...reported to interact with p300 and other factors (Figure S2B)." I believe the authors mean Figure S2C and not S2B.

      Thank you, this has been corrected in the text.

      Page 9. There is no Figure S2D.

      Apologies, this was referring to Figure S1D, and is now corrected in the text.

      Page 11: "...were at very reduced levels in nuclei co-expressing MamDN (Figure 4B).." Should be Figure 4CD.

      Thank you, this has been corrected in the text.

      Page 12: "...which was maintained in the presence of MamDN (Figure 4D, D')." Should be Figure 4B.

      Thank you, this has been corrected in the text.

      Reviewer #3 (Recommendations For The Authors):

      In the Results section on Hub, the paragraph starting with "Third, we reasoned . ." the callout to Figure S2D should be Fig S1D.

      Thank you, this has been corrected in the text

      Figures: The font size in the Figures is so small that most words and numbers cannot be read on a printout. One has to go to the electronic version and increase the size to read it. This reviewer found that inconvenient and often annoying.

      We apologise for this oversight, the font size has now been adjusted on all the graphs etc.

      Figure legends: the legends are terse and in some cases leave explanations to the imagination (e.g. "px" in Figure 2E). It would be useful to go through them and make sure those who are not a Drosophila Notch person and not a transcription biochemist can make sense of them.

      Our apologies for the lack of clarity in the legends. We have gone over them to make them more accessible and less succinct.

    1. Author Response

      We are very pleased to hear the overall positive views and constructive criticisms of eLife Editors and Reviewers on our work. In particular, we appreciate their comments highlighting the value of our new pipeline for high-throughput quantification of fly embryonic movement and the positive views of reviewers and editors that our data on the roles of miR-2b-1 in embryonic movement are well supported.

      Regarding Reviewer 1, we thank them for their positive comments that our work is experimentally sound and well-written, their kind words on the value of our new embryonic movement pipeline, and their overall appreciation of the quality, scope, and significance of our work. In a revised version of the manuscript we will consider discussing and addressing some of the interesting points raised by Rev1.

      Turning to the comments by Rev2, we are grateful to them for their recognition of the novelty of our miRNA findings and appreciation of the utility of our novel quantitative pipeline for assessing embryonic movement. Nonetheless, we politely – but strongly – disagree with their suggestion that the findings are inflated by our language. For example, they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. It is not inflation. In connection to other comments, in a revised manuscript we will propose a different name for the gene here described as Janus to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work provides new mechanistic insights into the competitive inhibition in the mammalian P2X7 receptors using structural and functional approaches. The authors solved the structure of panda (pd) P2X7 in the presence of the classical competitive antagonists PPNDS and PPADS. They find that both drugs bind to the orthosteric site employed by the physiological agonist ATP. However, owing to the presence of a single phosphate group, they prevent movements in the flipper domain required for channel opening. The authors performed structure-based mutational analysis together with electrophysiological characterization to understand the subtype-specific binding of these drugs. It is known from previous studies that P2X1 and P2X3 are more sensitive to these drugs as compared to P2X7, hence, the residues adjacent to the ATP binding site in pdP2X7 were mutated to those present in P2X1. They observed that mutations of Q143, I214, and Q248 into lysine (hP2X1) increased the P2X7 sensitivity to PPNDS, whereas in P2X1, mutations of these lysines to alanine reduced sensitivity to PPNDS, suggesting that these key residues contribute to the subunit-specific sensitivity to these drugs. Similar experiments were done in hP2X3 to demonstrate its higher sensitivity to PPNDS. This preprint provides a useful framework for developing subtype-specific drugs for the family of P2X receptor channels, an area that is currently relatively unexplored.

      We appreciate the time and effort Reviewer #1 devoted to this review, and we have addressed the specific comments below.

      (1) Why was the crystallization construct of panda P2X7 used for structural studies instead of rat P2X7 with the cytoplasmic ballast which is a more complete receptor that is closely related to the human receptor? Can the authors provide a justification for this choice?

      We appreciate this comment. We did try to express the rat P2X7 receptor in its full-length form based on a previous report (Cell 2019, PMID: 31587896), but the expression of the receptor was not successful for an unknown reason. Instead, we employed a truncated construct of panda P2X7 based on the findings described another previous report (eLife 2016, PMID: 27935479). This truncated construct also possesses ATP-dependent channel activity (eLife 2016, PMID: 27935479). Thus, we understand that the full-length P2X7 construct would be preferable, particularly for addressing the function of the cytoplasmic domain; however, the main focus of this study was on PPNDS/PPNADS recognition and the associated structural changes in the ATP binding pocket, which we believe are less likely to be severely affected by truncation of the cytoplasmic domain. In support of this expectation, our mutational analyses are consistent with the structures in this study. Therefore, we believe that the use of the truncation construct in this study is justified.

      (2) Was there a good reason why hP2X1 and hP2X3 currents were recorded in perforated patches, whereas pdP2X7 currents were recorded using the whole-cell configuration? It seems that the extent of rundown is less of a problem with perforated patch recordings. Can the authors comment and perhaps provide a justification? It would also be good to present data for repeated applications of ATP alone using protocols similar to those for testing antagonists so the reader can better appreciate the extent of run down with different recording configurations for the different receptors.

      We thank the reviewer for bringing up this point. The whole-cell configuration is the most commonly used method in patch-clamp experiments; therefore, we used this method to record the current of pdP2X7 (Author response image 1). However, the whole-cell configuration is not suitable for all experiments; for example, the currents of P2X1 and P2X3 recorded by this method show a severe "rundown" effect. The "rundown" effect prevents accurate calculation of the inhibition rate of the antagonist, and to obtain more accurate results, we used perforated patches to record the currents of hP2X1 and hP2X3.

      Author response image 1.

      Representative current traces of pdP2X7, hP2X3, and hP2X1 after repeated applications of ATP. The pdP2X7 currents were recorded using the whole-cell configuration, and the hP2X1 and hP2X3 currents were recorded using perforated patches.

      (3) The data in Fig. S1, panel A shows multiple examples where the currents activated by ATP after removal of the antagonist are considerably smaller than the initial ATP application. Is this due to rundown or incomplete antagonist unbinding? It is interesting that this wasn't observed with hP2X1 and hP2X3 even though they have a higher affinity for the antagonist. Showing examples of rundown without antagonist application would help to distinguish these distinct phenomena and it would be good for the authors to comment on this in the text. It is also curious why a previous study on pdP2X7 did not seem to have problems with rundown (see Karasawa and Kawate. eLife, 2016).

      We thank the reviewer for bringing up this point. We believe that this difference may be the result of incomplete antagonist unbinding. A similar phenomenon has been observed in previous studies of pdP2X7 (eLife 2016, PMID: 27935479). In the previous experiment, the currents activated by ATP after removal of the antagonist A740003 did not return to the initial value upon ATP application, whereas activation by ATP after removal of the antagonist GW791343 immediately restored the initial value upon ATP application (Fig. 1C of eLife 2016, PMID: 27935479). This may be because different inhibitors dissociate differently from pdP2X7. In our experiments, we assumed that PPNDS/PPADS was not completely dissociated from P2X7 even after 20 min of elution. The activation of P2X7 by ATP without antagonists showed no rundown effect (Author response image 1); therefore, we calculated the inhibition rate of the antagonist according to the precontrol.

      (4) The written presentation could be improved as there are many instances where the writing lacks clarity and the reader has to guess what the authors wish to communicate.

      To address this comment, we made changes to the text, particularly by following the

      Recommendations for The Authors

      Reviewer #1 (Recommendations For The Authors):

      (1) The way the manuscript is written could be greatly improved. There are many confusing sections where the reader has to guess what the authors wish to convey. For example, on page 9 "In addition, the mutation of Val173 to aspartate, as observed in pdP2X7, significantly decreased the sensitivity to PPNDS (Fig. 6B)." It appears from this sentence that Asp is present in P2X7, which is incorrect, please rephrase. There are many more examples of confusing sentences that need to be carefully edited to improve comprehension.

      To address this comment, we extensively modified the text to avoid this kind of misunderstanding. Please see the manuscript file with the track changes.

      (2) Please use either a 1-letter or 3-letter code for amino acid residues throughout the manuscript to maintain uniformity.

      We made this correction throughout the revised manuscript.

      (3) In Figure 1 on the right side, including the nearby density and side chains for interacting residues of PPNDS and PPADS would give more information and reliability for the density of the drugs.

      We appreciate this comment. The corresponding information is shown in Fig. S7.

      (4) Typo: Figure S1, E, and F panels - please correct the y-axis label to Inhibition.

      We corrected the typo in Fig. S1.

      (5) Please rewrite the legends for Fig. S3 and S5. They are confusing. The figure shows 3D classification using Relion, however, the legend suggests it was done using Cryosparc. Please clarify.

      We apologize for the confusion. Before applying C3 symmetry, all steps including 3D classification were performed in Relion 3.1. With C3 symmetry, we performed further refinement using Cryosparc v4.2.1 by non-uniform refinement. We have corrected the figure legends accordingly.

      (6) For Fig. S3 and S5 increase the resolution and size of representative micrographs, and also please provide scale bars.

      We have corrected Figures S3 and S5 accordingly.

      (7) Please add the 3D classification protocol performed in Relion/Cryosparc in the methods section as well.

      We added the corresponding description to the revised manuscript (Lines 9-14, Page 16).

      (8) In Table S1, under the initial model the authors state 'this study' when they should report the use of 5U1L according to the methods section.

      We corrected Table S1 in accordance with this comment.

      (9) The authors should consider combining the raw data shown in Figure S1 in Figure 6 as it provides stronger support for the conclusions than the bar graphs shown in Figure 6B.

      We appreciate the comment and fully understand the intention of Reviewer #1. Nevertheless, we would like to keep Figure S1, since it was also mentioned earlier together with Figure 1. In addition, if we combine Figure S1 with Figure 6, the result would be too large to present as a single figure.

      (10) In Figure 6A, please provide colored labels for both P2X7 and P2X1 to aid comprehension of the structural models.

      Based on this comment, we corrected the labels in Figure 6.

      (11) In the discussion, the authors write about comparisons with the docking study by Huo et al. JBC, 2018. Can they show the superimposition of their EM model with the previous studies' docking model in a supplementary figure for more clarity?

      We appreciate the constructive comments. However, unfortunately, the docking model in the previous study (JBC 2018, PMID: 29997254) is not available, so it is not possible to show the superimposition.

      Reviewer #2 (Public Review):

      Summary:

      P2X receptors play pivotal roles in physiological processes such as neurotransmission and inflammation, making them promising drug targets. This study, through cryo-EM and functional experiments, reveals the structural basis of the competitive inhibition of the PPNDS and PPADS on mammalian P2X7 receptors. Key findings include the identification of the orthosteric site for these antagonists, the revelation of how PPADS/PPNDS binding impedes channel-activating conformational changes, and the pinpointing of specific residues in P2X1 and P2X3 subtypes that determine their heightened sensitivity to these antagonists. These insights present a comprehensive understanding that could guide the development of improved drugs targeting P2X receptors. This work will be a valuable addition to the field.

      Strengths and weaknesses:

      The combination of structural experiments and mutagenesis analyses offers a deeper understanding of the mechanism. While the inclusion of MD simulation is appreciated, providing more insights from the simulation might further strengthen this already compelling story.”

      We appreciate the time and effort Reviewer #2 devoted to this review, and we have addressed the specific comments below.

      Reviewer #2 (Recommendations For The Authors):

      (1) On page 3, the sentence "ATP analogs are the most competitive inhibitors of P2X receptors but are typically unsuitable due to a lack of high specificity in vivo," might need additional context. Could the authors clarify if they are referring to the unsuitability of ATP analogs for medical applications?

      To address this comment, we have rewritten the sentence as follows (Lines 13-16, Page 3):

      ATP analogs are most common among competitive inhibitors for P2X receptors; however, they are generally unsuitable for in vivo applications due to their relatively low specificity, which may result in off-target toxicity. This issue arises because the human body contains numerous ATP-binding proteins.

      (2) Fig. S1. I am curious why, for P2X7, the ATP-only current after removal of PPNDS/PPADS does not recover and become larger than the current in the presence of PPNDS/PPADS? Such behavior was not as pronounced in P2X1. Does that suggest PPNDS/PPADS might remain bound and can not be removed when the P2X7 channel is closed?

      We thank the reviewer for bringing up this point. We believe that this difference may be the result of incomplete antagonist unbinding. A similar phenomenon has been observed in previous studies of pdP2X7 (eLife 2016, PMID: 27935479). In the previous experiment, the currents activated by ATP after removal of the antagonist A740003 did not return to the initial value upon ATP application, whereas activation by ATP after removal of the antagonist GW791343 immediately restored the initial value upon ATP application (Fig. 1C of eLife 2016, PMID: 27935479). We strongly agree with the reviewer that this may be due to the difficulty of dissociating the antagonist from pdP2X7.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Weaknesses are the absence of correlation between the results from the animal studies and human pancreatic cancers.

      Author response: We appreciate the reviewer’s attention to the importance of human pancreatic cancer studies. In a previous study (D’Amico et al. Genes & Development 2018 doi: 10.1101/gad.311852.118), we evaluated the expression of STAT3 in human pancreatic tissue microarrays and data from the Human Protein Atlas. Mutations in Stat3 are infrequent in human pancreatic cancers, however there is a trend of decreased STAT3 activity in poorly differentiated carcinomas.

      In the current study, STAT3 and SMAD4 gene signature scores (computed from KO KPC cells) were aligned with human pancreatic ductal adenocarcinoma samples from the TCGA cohort, and statistical analyses supported the selective antagonism of STAT3 and SMAD4 (Fig 4D, Fig 4E).

      The complex process of EMT is difficult to characterize rigorously in human cancers. Mouse models offer an opportunity to study the relationships between cancer phenotypes and genetic alterations.

      Reviewer #2 (Public Review):

      [...] While correlations are strong, the study would benefit from additional cause-and-effect type experiments. It would also be beneficial to better tie together the first and second parts of the paper.

      Author response: We understand the Reviewer’s interest in additional experiments that could further elucidate mechanisms that drive EMT and/or KRAS dependency in relation to STAT3 and TGF-beta antagonism. We previously investigated the development of mutant KRAS knockout tumors (Ischenko et al. Nature Communications 2021 doi:10.1038/s41467-021-21736) to find loss of KRAS promotes EMT, similar to loss of STAT3. Additional experiments are underway but are outside the scope of the current study.

      The first part of the paper is mechanistic and used KRAS-transformed mouse embryo fibroblasts to perform in vitro studies with foci formation. The cell-based foci formation assay has been shown to best evaluate malignant transformation and oncogenic potential. In the second part we transitioned to epithelial cells and pancreatic ductal adenocarcinomas to combine mechanistic relationships with genetic models.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      I would like to express my appreciation for the authors' dedication to revising the manuscript. It is evident that they have thoughtfully addressed numerous concerns I previously raised, significantly contributing to the overall improvement of the manuscript.

      Response: We appreciate the reviewers’ recognition of our efforts in revising the manuscript.

      My primary concern regarding the authors' framing of their findings within the realm of habitual and goal-directed action control persists. I will try explain my point of view and perhaps clarify my concerns. While acknowledging the historical tendency to equate procedural learning with habits, I believe a consensus has gradually emerged among scientists, recognizing a meaningful distinction between habits and skills or procedural learning. I think this distinction is crucial for a comprehensive understanding of human action control. While these constructs share similarities, they should not be used interchangeably. Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses).

      Response: We would like to clarify that, contrary to the reviewer’s assertion of a scientific consensus on this matter, the discussion surrounding the similarities and differences between habits and skills remains an ongoing and unresolved topic of interest among scientists (Balleine and Dezfouli, 2019; Du and Haith, 2023; Graybiel and Grafton, 2015; Haith and Krakauer, 2018; Hardwick et al., 2019; Kruglanski and Szumowska, 2020; Robbins and Costa, 2017). We absolutely agree with the reviewer that “Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses)”. But so do habits. Some researchers also highlight the intentional/goal-directed nature of habits (e.g., Du and Haith, 2023, “Habits are not automatic” (preprint) or Kruglanski and Szumowska, 2020, “Habitual behavior is goal-driven”: “definitions of habits that include goal independence as a foundational attribute of habits are begging the question; they effectively define away, and hence dispose of, the issue of whether habits are goal-driven (p 1258).” Therefore, there is no clear consensus concerning the concept of habit.

      While we acknowledge the meaningful distinctions between habits and skills, we also recognize a substantial body of literature supporting the overlap between these concepts (cited in our manuscript), particularly at the neural level. The literature clearly indicates that both habits and skills are mediated by subcortical circuits, with a progressive disengagement of cognitive control hubs in frontal and cingulate cortices as repetition evolves. We do not use these concepts interchangeably. Instead, we simply present evidence supporting the assertion that our trained app sequences meet several criteria for their habitual nature.

      Our choice of Balleine and Dezfouli (2018)'s criteria stemmed from the comprehensive nature of their definitions, which effectively synthesized insights from various researchers (Mazar and Wood, 2018; Verplanken et al., 1998; Wood, 2017, etc). Importantly, their list highlights the positive features of habits that were previously overlooked. However, these authors still included a controversial criterion ("habits as insensitive to changes in their relationship to their individual consequences and the value of those consequences"), even though they acknowledged the problems of using outcome devaluation methods and of relying on a null-effect. According to Kruglanski and Szumowska (2020), this criterion is highly problematic as “If, by definition, habits are goalindependent, then any behavior found to be goal-dependent could not be a habit on sheer logical grounds” (p. 1257). In their definition, “habitual behavior is sensitive to the value of the reward (i.e., the goal) it is expected to mediate and is sensitive to the expectancy of goal attainment (i.e., obtainment of the reward via the behavior, p.1265). In fact, some recent analyses of habitual behavior are not using devaluation or revaluation as a criterion (Du and Haith, 2023). This article, for example, ascertains habits using different criteria and provides supporting evidence for trained action sequences being understood as skills, with both goal-directed and habitual components.

      In the discussion of our manuscript, we explicitly acknowledge that the app sequences can be considered habitual or goal-directed in nature and that this terminology does not alter the fact that our overtrained sequences exhibit clear habitual features.

      Watson et al. (2022) aptly detailed my concerns in the following statements: "Defining habits as fluid and quickly deployed movement sequences overlaps with definitions of skills and procedural learning, which are seen by associative learning theorists as different behaviors and fields of research, distinct from habits."

      "...the risk of calling any fluid behavioral repertoire 'habit' is that clarity on what exactly is under investigation and what associative structure underpins the behavior may be lost." I strongly encourage the authors, at the very least, to consider Watson et al.'s (2022) suggestion: "Clearer terminology as to the type of habit under investigation may be required by researchers to ensure that others can assess at a glance what exactly is under investigation (e.g., devaluationinsensitive habits vs. procedural habits)", and to refine their terminology accordingly (to make this distinction clear). I believe adopting clearer terminology in these respects would enhance the positioning of this work within the relevant knowledge landscape and facilitate future investigations in the field.

      Response: We would like to highlight that we have indeed followed Watson et al (2022)’s recommendations on focusing on other features/criteria of habits at the expense of the outcome devaluation/contingency degradation paradigm, which has been more controversial in the human literature. Our manuscript clearly aligns with Watson et al. (2022) ‘s recommendations: “there are many other features of habits that are not captured by the key metrics from outcome devaluation/contingency degradation paradigms such as the speed at which actions are performed and the refined and invariant characteristics of movement sequences (Balleine and Dezfouli, 2019). Attempts are being made to develop novel behavioral tasks that tap into these positive features of habits, and this should be encouraged as should be tasks that are not designed to assess whether that behavior is sensitive to outcome devaluation, but capture the definition of habits through other measures”.

      Regarding the authors' use of Balleine and Dezfouli's (2018) criteria to frame recorded behavior as habitual, as well as to acknowledgment the study's limitations, it's important to highlight that while the authors labelled the fourth criterion (which they were not fulfilling) as "resistance to devaluation," Balleine and Dezfouli (2018) define it as "insensitive to changes in their relationship to their individual consequences and the value of those consequences." In my understanding, this definition is potentially aligned with the authors' re-evaluation test, namely, it is conceptually adequate for evaluating the fourth criterion (which is the most accepted in the field and probably the one that differentiate habits from skills). Notably, during this test, participants exhibited goaldirected behavior.

      The authors characterized this test as possibly assessing arbitration between goal-directed and habitual behavior, stating that participants in both groups "demonstrated the ability to arbitrate between prior automatic actions and new goal-directed ones." In my perspective, there is no justification for calling it a test of arbitration. Notably, the authors inferred that participants were habitual before the test based on some criteria, but then transitioned to goal-directed behavior based on a different criterion. While I agree with the authors' comment that: "Whether the initiation of the trained motor sequences in experiment 3 (arbitration) is underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1)." they implicitly assert a shift from habit to goal-directed behavior without providing evidence that relies on the same probed mechanism. Therefore, I think it would be more cautious to refer to this test as solely an outcome revaluation test. Again, the results of this test, if anything, provide evidence that the fourth criterion was tested but not met, suggesting participants have not become habitual (or at least undermines this option).

      Response: In our previously revised manuscript, we duly acknowledged that the conventional (perhaps nowadays considered outdated) goal devaluation criterion was not met, primarily due to constraints in designing the second part of the study. We did cite evidence from another similar study that had used devaluation app-trained action sequences to demonstrate habitual qualities (but the reviewer ignored this).

      The reviewer points out that we did use a manipulation of goal revaluation in one of the follow-up tests conducted (although this was not a conventional goal revaluation test inasmuch that it was conducted in a novel context). In this test, please note that we used 2 manipulations: monetary and physical effort. Although we did show that subjects, including OCD patients, were apparently goaldirected in the monetary reward manipulation, this was not so clear when goal re-evaluation involved the physical effort expended. In this effort manipulation, participants were less goaloriented and OCD patients preferred to perform the longer, familiar, to the shorter, novel sequence, thus exhibiting significantly greater habitual tendencies, as compared to controls. Hence, we cannot decisively conclude that the action sequence is goal-directed as the reviewer is arguing. In fact, the evidence is equivocal and may reflect both habitual and goal-directed qualities in the performance of this sequence, consistent with recent interpretations of skilled/habitual sequences (Du and Haith, 2023). Relying solely on this partially met criterion to conclude that the app-trained sequences are goal-directed, and therefore not habitual, would be an inaccurate assessment for several reasons: 1) the action sequences did satisfy all other criteria for being habitual; 2) this approach would rest on a problematic foundation for defining habits, as emphasized by Kruglanski & Szumowska (2020); and 3) it would succumb to the pitfall of subscribing to a zero-sum game perspective, as cautioned by various researchers, including the review by Watson et al. (2022) cited by the referee, thus oversimplifying the nuanced nature of human behavior.

      While we have previously complied with the reviewer’s suggestion on relabelling our follow-up test as a “revaluation test” instead of an “arbitration test”, we have now explicitly removed all mentions of the term “arbitration” (which seems to raise concerns) throughout the manuscript. As the reviewer has suggested, we now use a more refined terminology by explicitly referring to the measured behavior as "procedural habits", as he/she suggested. We have also extensively revised the discussion section of our manuscript to incorporate the reviewer’s viewpoint. We hope that these adjustments enhance the clarity and accuracy of our manuscript, addressing the concerns raised during this review process.

      In essence, this is an ontological and semantic matter, that does not alter our findings in any way. Whether the sequences are consider habitual or goal directed, does not change our findings that 1) Both groups displayed equivalent procedural learning and automaticity attainment; 2) OCD patients exhibit greater subjective habitual tendencies via self-reported questionnaires; 3) Patients who had elevated compulsivity and habitual self-reported tendencies engaged significantly more with the motor habit-training app, practiced more and reported symptom relief at the end of the study; 4) these particular patients also show an augmented inclination to attribute higher intrinsic value to familiar actions, a possible mechanism underlying compulsions.

      Reviewer #2 (Recommendations For The Authors):

      A few more small comments (with reference to the point numbers indicated in the rebuttal):

      (14) I am not entirely sure why the suggested analysis is deemed impractical (i.e., why it cannot be performed by "pretending" participants received the points they should have received according to their performance). This can further support (or undermine) the idea of effect of reward on performance rather than just performance on performance.

      Response: We have now conducted this analysis, generating scores for each trial of practices after day 20, when participants no longer gained points for their performance. This analysis assesses whether participants trial-wise behavioral changes exhibit a similar pattern following simulated relative increases or decrease in scores, as if they had been receiving points at this stage. Note that this analysis has fewer trials available, around 50% less on average.

      Before presenting our results, we wish to emphasize the importance of distinguishing between the effects of performance on performance and the effects of reward on performance. In response to a reviewer's suggestion, we assessed the former in the first revision of our manuscript. We normalized the movement time variable and evaluated how normalized behavioral changes responded to score increments and decrements. The results from the original analyses were consistent with those from the normalized data.

      Regarding the phase where participants no longer received scores, we believe this phase primarily helps us understand the impact of 'predicted' or 'learned' rewards on performance. Once participants have learned the simple association between faster performance and larger scores, they can be expected to continue exhibiting the reward sensitivity effects described in our main analysis. We consider it is not feasible to assess the effects of performance on performance during the reward removal phase, which occurs after 20 days. Therefore, the following results pertain to how the learned associations between faster movement times and scores persist in influencing behavior, even when explicit scores are no longer displayed on the screen.

      Results: The main results of the effect of reward on behavioral changes persist, supporting that relative increases or decreases in scores (real or imagined/inferred) modulate behavioral adaptations trial-by-trial in a consistent manner across both cohorts. The direction of the effects of reward is the same as in the main analyses presented in the manuscript: larger mean behavioral changes (smaller std) following ∆R- . First, concerning changes in “normalized” movement time (MT) trial-by-trial, we conducted a 2 x 2 factorial analysis of the centroid of the Gaussian distributions with the same factors Reward, Group and Bin. This analysis demonstrated a significant main effect of Reward (P = 2e-16), but not of Group (P = 0.974) or Bin (P = 0.281). There were no significant interactions between factors. The main Reward effect can be observed in the top panel of the figure below. The same analysis applied to the spread (std) of the Gaussian distributions revealed a significant main effect of Reward (P = 0.000213), with no additional main effects or interactions.

      Author response image 1.

      Next, conducting the same 2 x 2 factorial analyses on the centroid and spread of the Gaussian distributions fitted to the Consistency data, we also obtained a robust significant main effect of Reward. For the centroid variable, we obtained a significant main effect of Reward (P = 0.0109) and Group (P = 0.0294), while Bin and the factor interactions were non-significant. See the top panel of the figure below.

      On the other hand, Reward also modulated significantly the spread of the Gaussian distributions fitted to the Consistency data, P = 0.00498. There were no additional significant main effects or interactions. See the bottom panel in the figure below.

      Note that here the factorial analysis was performed on the logarithmic transformation of the std.

      Author response image 2.

      (16) I find this result interesting and I think it might be worthwhile to include it in the paper.

      Response: We have now included this result in our revised manuscript (page 28)

      (18) I referred to this sentence: "The app preferred sequence was their preferred putative habitual sequence while the 'any 6' or 'any 3'-move sequences were the goal-seeking sequences." In my understanding, this implies one choice is habitual and another indicates goal-directedness.

      One last small comment:
In the Discussion it is stated: "Moreover, when faced with a choice between the familiar and a new, less effort-demanding sequence, the OCD group leaned toward the former, likely due to its inherent value. These insights align with the theory of goal-direction/habit imbalance in OCD (Gillan et al., 2016), underscoring the dominance of habits in particular settings where they might hold intrinsic value."

      This could equally be interpreted as goal-directed behavior, so I do not think there is conclusive support for this claim.

      Response: The choice of the familiar/trained sequence, as opposed to the 'any 6' or 'any 3'-move sequences cannot be explicitly considered goal-directed: firstly, because the app familiar sequences were associated with less monetary reward (in the any-6 condition), and secondly, because participants would clearly need more effort and time to perform them. Even though these were automatic, it would still be much easier and faster to simply tap one finger sequentially 6 times (any6) or 3 times (any-3). Therefore, the choice for the app-sequence would not be optimal/goaldirected. In this sense, that choice aligns with the current theory of goal-direction/habit imbalance of OCD. We found that OCD patients prefer to perform the trained app sequences in the physical effort manipulation (any-3 condition). While this, on one hand cannot be explicitly considered a goal-directed choice, we agree that there is another possible goal involved here, which links to the intrinsic value associated to the familiar sequence. In this sense the action could potentially be considered goal-directed. This highlights the difficulty of this concept of value and agrees with: 1) Hommel and Wiers (2017): “Human behavior is commonly not driven by one but by many overlapping motives . . . and actions are commonly embedded into larger-scale activities with multiple goals defined at different levels. As a consequence, even successful satiation of one goal or motive is unlikely to also eliminate all the others(p. 942) and 2) Kruglanski & Szumowska (2020)’s account that “habits that may be unwanted from the perspective of an outsider and hence “irrational” or purposeless, may be highly wanted from the perspective of the individual for whom a habit is functional in achieving some goal” (p. 1262) and therefore habits are goal-driven.

      References:

      Balleine BW, Dezfouli A. 2019. Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits. Front Psychol 10:2735. doi:10.3389/fpsyg.2019.02735

      Du Y, Haith A. 2023. Habits are not automatic. doi:10.31234/osf.io/gncsf Graybiel AM, Grafton ST. 2015. The Striatum: Where Skills and Habits Meet. Cold Spring Harb Perspect Biol 7:a021691. doi:10.1101/cshperspect.a021691

      Haith AM, Krakauer JW. 2018. The multiple effects of practice: skill, habit and reduced cognitive load. Current Opinion in Behavioral Sciences 20:196–201. doi:10.1016/j.cobeha.2018.01.015

      Hardwick RM, Forrence AD, Krakauer JW, Haith AM. 2019. Time-dependent competition between goal-directed and habitual response preparation. Nat Hum Behav 1–11. doi:10.1038/s41562019-0725-0

      Hommel B, Wiers RW. 2017. Towards a Unitary Approach to Human Action Control. Trends Cogn Sci 21:940–949. doi:10.1016/j.tics.2017.09.009

      Kruglanski AW, Szumowska E. 2020. Habitual Behavior Is Goal-Driven. Perspect Psychol Sci 15:1256– 1271. doi:10.1177/1745691620917676

      Mazar A, Wood W. 2018. Defining Habit in Psychology In: Verplanken B, editor. The Psychology of Habit: Theory, Mechanisms, Change, and Contexts. Cham: Springer International Publishing. pp. 13–29. doi:10.1007/978-3-319-97529-0_2

      Robbins TW, Costa RM. 2017. Habits. Current Biology 27:R1200–R1206. doi:10.1016/j.cub.2017.09.060

      Verplanken B, Aarts H, van Knippenberg A, Moonen A. 1998. Habit versus planned behaviour: a field experiment. Br J Soc Psychol 37 ( Pt 1):111–128. doi:10.1111/j.2044-8309.1998.tb01160.x

      Watson P, O’Callaghan C, Perkes I, Bradfield L, Turner K. 2022. Making habits measurable beyond what they are not: A focus on associative dual-process models. Neurosci Biobehav Rev 142:104869. doi:10.1016/j.neubiorev.2022.104869

      Wood W. 2017. Habit in Personality and Social Psychology. Pers Soc Psychol Rev 21:389–403. doi:10.1177/1088868317720362

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ need further interrogation. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors for the positive assessment of our manuscript. We have carefully considered the reviewers’ constructive and helpful comments and revised our manuscript accordingly. To address the question about the dissociable relationships between global and local BM processing, we have provided more evidence and additional analyses in this revised version.

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating differences in biological motion perception in participants with ADHD in comparison with controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated local and global (holistic) biological motion perception, the group, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention/impulsivity). As well as local global biological motion perception is reduced in ADHD participants. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not the controls. A path analysis in the ADHD data suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature and adds potentially also new behavioral markers for this clinical condition. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thank you for your positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper.

      We agree that the relationship between genetic factors and BM processing in ADHD needs more investigation, We have modified our statement in Discussion section as following:

      “Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19.” (lines 421 - 425),

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a relatively clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate your positive assessment of our work.

      Weaknesses:

      Except for the main analysis, it is unclear what the authors' specific predictions are regarding the three different tasks they employ. The three BM tasks are used to probe different processes underlying BM perception, but it is difficult to gather from the introduction why these three specific tasks were chosen and what predictions the authors have about the performance of the ADHD group in these tasks. Relatedly, the authors do not report whether (and if so, how) they corrected for multiple comparisons in their analyses. As the number of tests one should control for depends on the theoretical predictions (http://daniellakens.blogspot.com/2016/02/why-you-dont-need-to-adjust-you-alpha.html), both are necessary for the reader to assess the statistical validity of the results and any inferences drawn from them. The same is the case for the secondary analyses exploring relationships between the 3 individual BM tasks and social function measured by the social responsivity scale (SRS).

      We appreciate these constructive suggestions. In response, we have included a detailed description in the Introduction section explaining why we employed three different tasks and our predictions about the performance in ADHD:

      “Despite initial indications, a comprehensive investigation into BM perception in ADHD is warranted. We proposed that it is essential to deconstruct BM processing into its multiple components and motion features, since treating them as a single entity may lead to misleading or inconsistent findings31. To address this issue, we employed a carefully designed behavioral paradigm used in our previous study19, making slight adjustments to adapt for children. This paradigm comprises three tasks. Task 1 (BM-local) aimed to assess the ability to process local BM cues. Scrambled BM sequences were displayed and participants could use local BM cues to judge the facing direction of the scrambled walker. Task 2 (BM-global) tested the ability to process the global configuration cues of the BM walker. Local cues were uninformative, and participants used global BM cues to determine the presence of an intact walker. Task 3 (BM-general) tested the ability to process general BM cues (local + global cues). The stimulus sequences consisted of an intact walker and a mask containing similar target local cues, so participants could use general BM cues (local + global cues) to judge the facing direction of the walker.” (lines 116 - 130)

      “In Experiment 1, we examined three specific BM perception abilities in children with ADHD. As mentioned earlier, children with ADHD also show impaired social interaction, which implies atypical social cognition. Therefore, we speculated that children with ADHD performed worse in the three tasks compared to TD children.” (lines 131 - 134)

      Additionally, we have reported the p values corrected for multiple comparisons (false discovery rate, FDR) in the revised manuscript wherever it was necessary to adjust the alpha (lines 310 - 316; Table 2). The pattern of the results remained unchanged.

      In relation to my prior point, the authors could provide more clarity on how the conclusions drawn from the results relate to their predictions. For example, it is unclear what specific conclusions the authors draw based on their findings that ADHD show performance differences in all three BM perception tasks, but only local BM is related to social function within this group. Here, the claim is made that their results support a specific hypothesis, but it is unclear to me what hypothesis they are actually referring to (see line 343 & following). This lack of clarity is aggravated by the fact that throughout the rest of the discussion, in particular when discussing other findings to support their own conclusions, the authors often make no distinction between the two processes of interest. Lastly, some of the authors' conclusions related to their findings on local vs global BM processing are not logically following from the evidence: For instance, the authors conclude that their data supports the idea that social atypicalities are likely to reduce with age in ADHD individuals. However, according to their own account, local BM perception - the only measure that was related to social function in their study - is understood to be age invariant (and was indeed not predicted by age in the present study).

      Thank you for pointing out this issue. We have carefully revised the Discussion section about our findings to clarify these points:

      “Our study contributes several promising findings concerning atypical biological motion perception in ADHD. Specifically, we observe the atypical local and global BM perception in children with ADHD. Notably, a potential dissociation between the processing of local and global BM information is identified. The ability to process local BM cues appears to be linked to the traits of social interaction among children with ADHD. In contrast, global BM processing exhibits an age-related development. Additionally, general BM perception may be affected by factors including attention.” (lines 387 - 393)

      We have provided a detailed discussion on the two processes of interest to clarify their potential differences and the possible reasons behind the difference of the divergent developmental trajectories between local and global BM processing:

      “BM perception is considered a multi-level phenomenon56-58. At least in part, processing information of local BM and global BM appears to involve different genetic and neural mechanisms16,19. Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19. The sensitivity to local rather than global BM cues seems to emerge early in life. Visually inexperienced chicks exhibit a spontaneous preference for the BM stimuli of hen, even when the configuration was scrambled20. The same finding was reported in newborns. On the contrary, the ability to process global BM cues rather than local BM cues may be influenced by attention28,29 and shaped by experience24,56.” (lines 419 - 430)

      “We found that the ability to process global and general BM cues improved significantly with age in both TD and ADHD groups, which imply the processing module for global BM cues tends to be mature with development. In the ADHD group, the improvement in processing general and global BM cues is greater than that in processing local BM cues, while no difference was found in TD group. This may be due to the relatively higher baseline abilities of BM perception in TD children, resulting in a relatively milder improvement. These findings also suggest a dissociation between the development of local and global BM processing. There seems to be an acquisition of ability to process global BM cues, akin to the potential age-related improvements observed in certain aspects of social cognition deficits among individuals with ADHD5, whereas local BM may be considered an intrinsic trait19.” (lines 438 -449)

      In addition, we have rephased some inaccurate statements in revised manuscript. Another part of social dysfunction might be stable and due to the atypical local BM perception in ADHD individuals, although some studies found a part of social dysfunction would reduce with age in ADHD individuals. One reason is that some factors related to social dysfunction would improve with age, like the symptom of hyperactivity.

      Results reported are incomplete, making it hard for the reader to comprehensively interpret the findings and assess whether the conclusions drawn are valid. Whenever the authors report negative results (p-values > 0.05), the relevant statistics are not reported, and the data not plotted. In addition, summary statistics (group means) are missing for the main analysis.

      Thanks for your comments. We have provided the complete statistical results in the revised manuscript (lines 309 - 316) and supplementary material, which encompass relevant statistics and plots of negative results (Figure 4, Figure S2 and S3), in accordance with our research questions. And we have also included summary statistics in the Results section (lines 287 - 293).

      Some of the conclusions/statements in the article are too strong and should be rephrased to indicate hypotheses and speculations rather than facts. For example, in lines 97-99 the authors state that the finding of poor BM performance in TD children in a prior study 'indicated inferior applicability' or 'inapplicable experimental design'. While this is one possibility, a perhaps more plausible interpretation could be that TD children show 'poor' performance due to outstanding maturation of the underlying (global) BM processes (as the authors suggest themselves that BM perception can improve with age). There are several other examples where statements are too strong or misleading, which need attention.

      We thank you for pointing out the issue. We have toned down and rephrased the strong statements and made the necessary revisions.

      “Another study found that children with ADHD performed worse in BM detection with moderate ratios of noise34. This may be due to the fact that BM stimuli with noise dots will increase the difficulty of identification, which highlights the difference in processing BM between the two groups33,35.” (lines 111 - 115)

      Reviewer #3 (Public Review):

      Summary:

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The important and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, I am unsure that these differences between local and global skills are truly supported by the data and suggest some further analyses.

      Strengths:

      The authors present clear differences between the ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate your positive feedback. In revised manuscript, we have added more analyses to support the differences between local and global motion processing. Please refer to our response to the point #3 you mentioned below.

      Weaknesses:

      I am unsure that the data are strong enough to support claims about differences between global and local processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but do not seem so plausible to me. I am also concerned about gender, and possible autism, confounds when examining the effect of ADHD. Specifics:

      Gender confound. There are proportionally more boys in the ADHD than TD group. The authors appear to attempt to overcome this issue by including gender as a covariate. I am unsure if this addresses the problem. The vast majority of participants in the ADHD group are male, and gender is categorically, not continuously, defined. I'm pretty sure this violates the assumptions of ANCOVA.

      We appreciate your comments. We concur with you that although we observed a clear difference between local and global BM processing in ADHD, the evidence is to some extent preliminary. The mechanistic possibilities for why these abilities may dissociate have been discussed in revised manuscript. Please refer to the response to reviewer 2’s point #2. To further examine if gender played a role in the observed results, we used a statistical matching technique to obtain a sub-dataset. The pattern of results remained with the more balanced dataset (see Supplementary Information part 1). According to your suggestion, we have also presented the results without using gender as a covariate in main text and also separated the data of boys and girls on the plots (see Figure 1 and Figure S1). There were indeed no signs of a gender effect.

      Autism. Autism and ADHD are highly comorbid. The authors state that the TD children did not have an autism or ADHD diagnosis, but they do not state that the ADHD children did not have an autism diagnosis. Given the nature of the claims, this seems crucial information for the reader.

      Thanks for your suggestion. We have confirmed that all children with ADHD in our study were not diagnosed with autism. We used a semi-structured interview instrument (K-SADSPL-C) to confirm every recruited child with ADHD but not with ASD. The exclusion criteria for both groups were mentioned in the Materials and methods section:

      “Exclusion criteria for both groups were: (a) neurological diseases; (b) other neurodevelopmental disorders (e.g., ASD, Mental retardation, and tic disorders), affective disorders and schizophrenia…” (lines 158 - 162)

      Conclusions. The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. I think that if the authors wish to make strong claims here they must show inferential stats supporting (1) a difference between ADHD and TD SRS-Task 1 correlations, and (2) a difference in those differences for Task 2 and 3 relative to Task 1. I think they should also show a scatterplot of this correlation, with separate lines of best fit for the two groups, for Tasks 2 and 3 as well. I.e. Figure 4 should have 3 panels. I would recommend the same type of approach for age. Currently, they have small samples for correlations, and are reading much of theoretical significance between some correlations passing significance threshold and others not. It would be incredibly interesting if the social skills (as measured by SRS) only relate to local BM abilities, and age only to global, but I think the data are not so clear with the current information. I would be surprised if all BM abilities did not improve with age. Even if there is some genetic starter kit (and that this differs according to particular BM component), most abilities improve with learning/experience/age.

      Thank you for this recommendation. We have added more statistics to test differences between the correlations (a difference between ADHD and TD in SRS-Task 1 correlations (see the first paragraph of Supplementary Information part 2), a difference in SRS-response accuracy correlations for Task 2 and 3 relative to Task 1(see the second paragraph of Supplementary Information part 2), and a difference in age-response accuracy correlations for Task 2 and 3 relative to Task 1 in ADHD group (see Supplementary Information part 3)). Additionally, we have included scatterplots for SRS-Task1, SRS-Task2, SRS-Task3 (with separate lines of best fit for the two groups in each, see Figure 4), SRS-ADHD, SRS-TD, age-ADHD and age-TD (with separate lines of best fit for the three tasks in each, see Figure S2 and S3) to make a clear demonstration. Detailed results have been presented in the revised manuscript and Supplementary Information. We expect these further analyses would strengthen our conclusions.

      Theoretical assumptions. The authors make some sweeping statements about local vs global biological motion processing that need to be toned down. They assume that local processing is specifically genetically whereas global processing is a product of experience. The fact their global, but not local, task performance improves with age would tend to suggest there could be some difference here, but the existing literature does not allow for this certainty. The chick studies showing a neonatal preference are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      Thank you for pointing out this issue. We have toned down rephrased our claims that the difference between local and global BM processing according to your suggestion:

      “These findings suggest that local and global mechanisms might play different roles in BM perception, though the exact mechanisms underlying the distinction remain unclear. Exploring the two components of BM perception will enhance our understanding of the difference between local and global BM processing, shedding light on the psychological processes involved in atypical BM perception.” (lines 87 - 92)

      Reviewer #1 (Recommendations For The Authors):

      I have only a number of minor points that should be addressed prior to publication:

      L. 95ff: What is meant by 'inapplicability of experimental designs' ? This paragraph is somewhat unclear.

      In revised manuscript, we have clarified this point (lines 111 - 115).

      L. 146: The groups were not perfectly balanced for sex. Would results change fundamentally in a more balanced design, or can arguments be given that gender does not play a role, like it seems to be the case for some functions in biological motion perception (e.g. Pavlova et al. 2015; Tsang et al 2018). One could provide a justification that this disbalance does not matter or test for subsampled balanced data sets maybe.

      This point is similar to the point #1 from reviewer 3, and we have addressed this issue in our response above.

      L. 216 f.: In this paragraph it does not become very clear that the mask for the global task consisted of scrambles generated from walkers walking in the same direction. The mask for the local task then should consist of a balanced mask that contains the same amount of local motion cues indicating right and leftwards motion. Was this the case? (Not so clear from this paragraph.)

      Regarding the local task, the introduction of mask would make the task too difficult for children. Therefore, in the local task, we only displayed a scrambled walker without a mask, which was more suitable for children to complete the task. We have made clear this point in the corresponding paragraph (lines 232 - 241).

      L. 224 ff.: Here it would be helpful to see the 5 different 'facing' directions of the walkers. What does this exactly mean? Do they move on oblique paths that are not exactly orthogonal to the viewing directions, and how much did these facing directions differ?

      Out of the five walkers we used, two faced straight left or right, orthogonal to the viewing directions. Two walked with their bodies oriented 45 degrees from the observer, to the left or right. The last one walked towards the observer. We have included a video (Video 4) to demonstrate the 5 facing directions.

      L. 232: How was the number of 5 practicing trials determined/justified?

      As mentioned in main text, global BM processing is susceptible to learning. Therefore, too many practicing trials would increase BM visual experience and influence the results. We determined the number of training trials to be 5 based on the results of the pilot experiment. During this phase, we observed that nearly all children were able to understand the task requirements well after completing 5 practicing trials.

      L 239: Apparently no non-parametric statistics was applied. Maybe it would be good to mention in the Statistics section briefly why this was justified.

      We appreciate your suggestion and have cited two references in the Statistics section (Fagerland et al. 2012, Rochon et al. 2012). Fagerland et al., mentioned that when the sample size increases, the t-test is more robust. According to the central limit theorem, when the sample size is greater than 30, the sampling distribution of the mean can be safely assumed to be normal.

      (http://www2.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%201/I.07%20normal.p df). In fact, we also ran non-parametric statistics for our data and found the results to be robust.

      L 290: 'FIQ' this abbreviation should be defined.

      Regarding the abbreviation ’FIQ’, it stands for the abbreviation of the full-scale intellectual quotient, which was mentioned in Materials and methods section:

      “Scores of the four broad areas constitute the full-scale intellectual quotient (FIQ).”

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors gender etc. time appropriate beta_i values. This formula should be corrected or one just says that a GLM was run with the predictors gender ....

      The same criticism applies to these other models that follow.

      We thank you for pointing this out. We have modified all formulas accordingly in the revised manuscript (see part3 of the Results section).

      All these models assume linearity of the combination of the predictors.was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the group of patients. Does the same observation also apply to the normals?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data of the study will be available at https://osf.io/37p5s/.

      (2) Although overall, the language was clear and understandable, there are a few parts where language might confuse a reader and lead to misconceptions. For instance, line 52: Did the authors mean to refer to 'emotions and intentions' instead of 'emotions and purposes'? See also examples where rephrasing may help to reflect a statement is speculation rather than fact.

      Thanks for the comments. We have carefully checked the full text and rephrased the confused statements.

      (3) Line 83/84: Autism is not a 'mental disorder' - please change to something like 'developmental disability'. Authors are encouraged to adapt their language according to terms preferred by the community (e.g., see Fig. 5 in this article:

      https://onlinelibrary.wiley.com/doi/10.1002/aur.2864)

      Suggestion well taken. We have changed the wording accordingly:

      “In recent years, BM perception has received significant attention in studies of mental disorders (e.g., schizophrenia30) and developmental disabilities, particularly in ASD, characterized by deficits in social communication and social interaction31,32.” (lines 93 - 95)

      (4) Please report how the sample size for the study was determined.

      In the Materials and methods section (lines 168 - 173), we explained how the sample size was determined.

      Line 94: It would be helpful to have a brief description of what neurophysiological differences have been observed upon BM perception in children with ADHD.

      Thanks for the comment. We have added a brief description of neurophysiological findings in children with ADHD (lines 108 - 111).

      (6) Line 106/107 and 108/109: please add references.

      We have revised this part, and the relevant findings and references are in line with the revised manuscript (lines 77, 132 - 133).

      (7) Line 292: Please add what order the factors were entered into each regression model.

      Regarding this issue, we used SPSS 26 for the main analysis. SPSS utilizes the Type III sum of squares (default) to evaluate models. Regardless of the order in the GLM, we will obtain the same result. For more information, please refer to the documentation of SPSS 26 (https://www.ibm.com/docs/en/spss-statistics/26.0.0?topic=features-glm-univariate-analysis).

      Reviewer #3 (Recommendations For The Authors)

      (1) Task specifics. It is key to understanding the findings, as well as the dissociation between tasks, that the precise nature of the stimuli is clear. I think there is room for improvement in description here. Task 1 is described as involving relocating dots within the range of the intact walker. Of course, PLWs are created by presenting dots at the joints, so relocation can involve either moving to another place on the body, or random movement within the 2D spatial array (which likely involves moving it off the body). Which was done? It is said that Ps must indicate the motion direction, but what was the display of the walker? Sagittal? Task 2 requires detecting whether there is an intact walker amongst scrambled walkers. Were all walkers completely overlaid? Task 3 requires detecting the left v right facing of an intact walker at different orientations, presented amongst noise. So Task 3 requires determining facing direction and Task 1 walking direction. Are these tasks the same but described differently? Or can walkers ever walk backwards? Wrt this point, I also think it would help the reader if example videos were uploaded.

      We appreciate you for bringing this to our attention. With regards to Task 1, it appears that your second speculation is correct. We scrambled the original dots and randomly presented them within the 2D spatial array (which likely involved moving them off the body). As a result, the global configuration of the 13 dots was completed disrupted while preserving the motion trajectory of each individual dot. This led to the display of scrambled dots on the monitor (which does not resemble a human). In practice, these local BM cues contain information about motion direction. In Task 2, the target walkers completely overlaid by a mask that is approximately 1.44 times the size of the intact walker. The task requirements of Task 1 and Task3 are same, which is judging the motion (walking) direction. The difference is that Task 1 displayed a scrambled walker while Task 3 displayed an intact walker within a mask. We have clarified these points and improved our descriptions in Procedure section and created example videos for each task, which we believe will be helpful for the readers to understand each task.

      (2) Gender confound (see above). I think that the authors should present the results without gender as a covariate. Can they separate boys and girls on the plots with different coloured individual datapoints, such that readers can see whether it's actually a gender effect driving the supposed ADHD effect? And show that there are no signs of a gender effect in their TD group?

      This point is similar to the point #1 you mentioned. Please refer to our response to that point above.

      (3) Autism possible confound (see above). I think the authors must report whether any of the ADHD group had an autism diagnosis.

      Please refer to the response for the point #2 your mentioned.

      (4) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors should add stats demonstrating differences between the correlations to support such claims, as well as demonstrating appropriate scatterplots for SRS-Task 1, SRS-Task 2, SRS-Task 3 and age-Task 1, age-Task2 and age-Task 3 (with separate lines of best fit for the two groups in each).

      Please refer to the response for the point #3 your mentioned.

      (5) Theoretical assumptions (see above). I would suggest rephrasing all claims here to outline that these discussed mechanistic differences between local and global BM processing are only possibilities and not known on the basis of existing data.

      Please refer to the response for the point #4 your mentioned.

    1. Author Response

      The following is the authors’ response to the original reviews.

      General response:

      We thank the reviewers for their thorough evaluation of our manuscript. Working on the raised concerns has improved the manuscript greatly. Specifically, the recommendations to clarify the adopted assumptions in the study strengthened the motivation for the study; further, following up some of the reviewers’ concerns with additional analyses validated our chosen measures and strengthened the compatibility of the findings with the predictions of the dynamic attending framework. Below, you will find our detailed point-by-point responses, along with information on specific revisions.

      The reviewers pointed out that study assumptions were unclear, some of the measures we chose were not well motivated, and the findings were not well enough explained considering possible alternatives. As suggested, we reformulated the introduction, explained the common assumptions of entrainment models that we adopted in the study, and further clarified how our chosen measures for the properties of the internal oscillators relate to these assumptions.

      We realized that the initial emphasis on the compatibility of the current findings with predictions of entrainment models might have led to the wrong impression that the current study aimed to test whether auditory rhythmic processing is governed by timekeeper or oscillatory mechanisms. However, testing these theoretical models to explain human behavior necessitates specific paradigms designed to compare the contrasting predictions of the models. A number of studies do so by manipulating regularity in a stimulus sequence or expectancy of stimulus onsets, or assessing the perceived timing of targets that follow a stimulus rhythm. Such paradigms allow testing the prediction that an oscillator, underlying perceptual timing, would entrain to a regular but not an irregular sequence. This would further lead to stronger expectancies at the peak of the oscillation, where 'attentional energy' is the highest. These studies report 'rhythmic facilitation', where targets that align with the peaks of the oscillation are better detected than those that do not (see Henry and Herrmann (2014) and Haegens and Zion Golumbic (2018) for reviews). Additionally, unexpected endings of standard intervals, preceded by a regular entraining sequence, lead to a biased estimation of subsequent comparison intervals, due to the contrast between the attentional oscillator's phase and a deviating stimulus onset (Barnes & Jones, 2000; Large & Jones, 1999; McAuley & Jones, 2003). Even a sequence rate that is the multiple of the to-be-judged standard and comparison intervals give rise to rhythmic facilitation (McAuley & Jones, 2003), and the expectancy of a stimulus onset modulates duration judgments. These findings are not compatible with predictions of timekeeper models as time intervals in these models are represented arbitrarily and are not affected by expectancy violations.

      In the current study, we adopted an entrainment approach to timing, rather than testing predictions of competing models. This choice was motivated by several aspects of entrainment models that align better with the aims of the current study. First, our focus was on understanding perception and production of rhythms, for which perception is better explained by entrainment models than by timekeeper models, which excel at explaining perception of isolated time intervals (McAuley, 2010). Moreover, we wanted to leverage the fact that entrainment models elegantly include parameters that can explain different aspects of timing abilities, and these parameters can be estimated in an individualized manner. For instance, the flexibility property of oscillators can be linked to the ability to adapt to changes in external context, while timekeeper or Bayesian timing approaches lack a specific mechanism to quantify temporal adaptation across perceptual and motor domains. Finally, that entrainment is observed across theoretical, behavioral, and neural levels renders entrainment models useful in explaining and generalizing behavior across different domains. Nevertheless, some results showed partial compatibility with predictions of the timekeeper models, such as the modulation of 'bestperformance rates' by the temporal context, observed in Experiment 1’ random-order sessions, where stimulus rates maximally differed across consecutive trials. However, given that the mean, standard deviation, and range of stimulus rates were identical across sessions, and timekeeper models assume no temporal adaptation in duration perception, we should have observed similar results across these sessions. Conversely, we found significant accuracy differences, biased duration judgments, and harmonic relationships between the best-performance rates. We elaborate more on these results with respect to their compatibility with the contrasting models of human temporal perception in the revised discussion.

      Responses to specific comments:

      (1.1) At times, I found it challenging to evaluate the scientific merit of this study from what was provided in the introduction and methods. It is not clear what the experiment assumes, what it evaluates, and which competing accounts or predictions are at play. While some of these questions are answered, clear ordering and argumentative flow is lacking. With that said, I found the Abstract and General Discussion much clearer, and I would recommend reformulating the early part of the manuscript based on the structure of those segments.

      Second, in my reading, it is not clear to what extent the study assumes versus demonstrates the entrainment of internal oscillators. I find the writing somewhat ambiguous on this count: on the one hand, an entrainment approach is assumed a priori to design the experiment ("an entrainment approach is adopted") yet a primary result of the study is that entrainment is how we perceive and produce rhythms ("Overall, the findings support the hypothesis that an oscillatory system with a stable preferred rate underlies perception and production of rhythm..."). While one could design an experiment assuming X and find evidence for X, this requires testing competing accounts with competing hypotheses -- and this was not done.

      We appreciate the reviewer’s concerns and suggestion to clarify the assumptions of the study and how the current findings relate to the predictions of competing accounts. To address these concerns:

      • We added the assumptions of the entrainment models that we adopted in the Introduction section and reformulated the motivation to choose them accordingly.

      • We clarified in the Introduction that the study’s aim was not to test the entrainment models against alternative theories of rhythm perception.

      • We added a paragraph in the General Discussion to further distinguish predictions from the competing accounts. Here we discussed the compatibility of the findings with predictions of both entrainment and timekeeper models.

      • We rephrased reasoning in the Abstract, Introduction, and General Discussion to further clarify the aims of the study, and how the findings support the hypotheses of the current study versus those of the dynamic attending theory.

      (1.2) In my view, more evidence is required to bolster the findings as entrainment-based regardless of whether that is an assumption or a result. Indeed, while the effect of previous trials into the behaviour of the current trial is compatible with entrainment hypotheses, it may well be compatible with competing accounts as well. And that would call into question the interpretation of results as uncovering the properties of oscillating systems and age-related differences in such systems. Thus, I believe more evidence is needed to bolster the entrainment hypothesis.

      For example, a key prediction of the entrainment model -- which assumes internal oscillators as the mechanism of action -- is that behaviour in the SMT and PTT tasks follows the principles of Arnold's Tongue. Specifically, tapping and listening performance should worsen systematically as a function of the distance between the presented and preferred rate. On a participant-by-participant, does performance scale monotonically with the distance between the presented and preferred rate? Some of the analyses hint at this question, such as the effect of 𝚫IOI on accuracy, but a recontextualization, further analyses, or additional visualizations would be helpful to demonstrate evidence of a tongue-like pattern in the behavioural data. Presumably, non-oscillating models do not follow a tongue-like pattern, but again, it would be very instructive to explicitly discuss that.

      We thank the reviewer for the excellent suggestion of assessing 'Arnold's tongue' principles in timing performance. We agree that testing whether timing performance forms a pattern compatible with an Arnold tongue would further support our assumption that the findings related to preferred rate stem from an entrainment-based mechanism. We rather refer to the ‘entrainment region’, (McAuley et al., 2006) that corresponds to a slice in the Arnold tongue at a fixed stimulus intensity that entrains the internal oscillator. In both representations of oscillator behavior across a range of stimulus rates, performance should systematically increase as the difference between the stimulus rate and the oscillator's preferred rate, namely, 'detuning' decreases. In response to the reviewer’s comment, we ran further analyses to test this key prediction of entrainment models. We assessed performance at stimulus rates that were faster and slower than an individual's preferred rate estimates from in Experiment 1. To do so, we ran logistic regression models on aggregated datasets from all participants and sessions, where normalized IOI, in trials where the stimulus rate was faster than the preferred rate estimate, and in those where it was slower, predicted accuracy. Stimulus IOIs were normalized within each direction (faster- versus slower-than-preferred rate) using z-score transformation, and the direction was coded as categorical in the model. We reasoned that a positive slope for conditions with stimulus rates faster than IOI, and a negative slope from conditions with slower rates, should indicate a systematic accuracy increase toward the preferred rate estimate. This is exactly what we found. These results revealed significant main effect for the IOI and a significant interaction between IOI and direction, indicating that accuracy increased towards the preferred rate at fast rates and decreased as the stimulus rate diverged from the preferred rate at slow rates. We added these results to the respective subsections of Experiment 1 Methods and Results, added a plot showing the slices of the regression surfaces to Figure 2B and elaborated on the results in Experiment 1 Discussion. As the number of trials in Experiment 2 was much lower (N = 81), we only ran these additional analyses in Experiment 1.

      (1.3) Fourth, harmonic structure in behaviour across tasks is a creative and useful metric for bolstering the entrainment hypothesis specifically because internal oscillators should display a preference across their own harmonics. However, I have some doubts that the analyses as currently implemented indicate such a relationship. Specifically, the main analysis to this end involves summing the residuals of the data closest to y=x, y=2*x and y=x/2 lines and evaluating whether this sum is significantly lower than for shuffled data. Out of these three dimensions, y=x does not comprise a harmonic, and this is an issue because it could by itself drive the difference of summed residuals with the shuffled data. I am uncertain whether rerunning the same analysis with the x=y dimension excluded constitutes a simple resolution because presumably there are baseline differences in the empirical and shuffled data that do not have to do with harmonics that would leak into the analysis. To address this, a simulation with ground truths could be helpful to justify analyses, or a different analysis that evaluates harmonic structure could be thought of.

      We thank the reviewer for pointing out the weakness of the permutation test we developed to assess the harmonic relationship between Experiment 1’s preferred rate estimates. Datapoints that fall on the y=x line indeed do not represent harmonic relationships. They rather indicate one-to-one correspondence between the axes, which is a stronger indicator of compatibility between the estimates. Maybe speaking to the reviewer’s point, standard correlation analyses were not significant, which would have been expected if the permutation results were being driven by the y=x relationship. This was the reason we developed the permutation test to include integer-ratio datapoints could also contribute.

      Based on reviewer’s comment, we ran additional analyses to assess the harmonic relationships between the estimates. The first analysis involved a circular approach. We first normalized each participant’s estimates by rescaling the slower estimate with respect to the faster one by division; and converted the values to radians, since a pair of values with an integer-ratio relationship should correspond to the same phase on a unit circle. Then, we assessed whether the resulting distribution of normalized values differed from a uniform distribution, using Rayleigh’s test, which was significant (p = .004). The circular mean of the distribution was 44 (SD = 53) degrees (M = 0.764, SD = 0.932 radians), indicating that the slower estimates were slightly slower than the fast estimate or its duplicates. As this distribution was skewed toward positive values due to the normalization procedure, we did not compare it against zero angle. Instead, we ran a second test, which was a modular approach. We first calculated how much the slower estimate deviated proportionally from the faster estimate or its multiples (i.e., subharmonics) by normalizing the estimates from both sessions by the faster estimate. The outcome measure was the modulus of the slower, relative to the faster estimate, divided by the faster estimate. Then, we ran a permutation test, shuffling the linear-order session estimates over 1000 iterations and taking the median percent deviation values for each iteration. The test statistic was significant (p = .004), indicating that the harmonic relationships we observed in the estimates were not due to chance or dependent on the assessment method. We added these details of additional analyses to assess harmonic relationships between the Experiment 1 preferred rate estimates in the Supplementary Information.

      (2.1) The current study is presented in the framework of the ongoing debate of oscillator vs. timekeeper mechanisms underlying perceptual and motor timing, and authors claim that the observed results support the former mechanism. In this line, every obtained result is related by the authors to a specific ambiguous (i.e., not clearly related to a biophysical parameter) feature of an internal oscillator. As pointed out by an essay on the topic (Doelling & Assaneo, 2021), claiming that a pattern of results is compatible with an "oscillator" could be misleading, since some features typically used to validate or refute such mechanisms are not well grounded on real biophysical models. Relatedly, a recent study (Doelling et al., 2022) shows that two quantitatively different computational algorithms (i.e., absolute vs relative timing) can be explained by the same biophysical model. This demonstrates that what could be interpreted as a timekeeper, or an oscillator can represent the same biophysical model working under different conditions. For this reason, if authors would like to argue for a given mechanism underlying their observations, they should include a specific biophysical model, and test its predictions against the observed behavior. For example, it's not clear why authors interpret the observation of the trial's response being modulated by the rate of the previous one, as an oscillator-like mechanism underlying behavior. As shown in (Doelling & Assaneo, 2021) a simple oscillator returns to its natural frequency as soon as the stimulus disappears, which will not predict the long-lasting effect of the previous trial. Furthermore, a timekeeper-like mechanism with a long enough integration window is compatible with this observation.

      Still, authors can choose to disregard this suggestion, and not testing a specific model, but if so, they should restrict this paper to a descriptive study of the timing phenomena.

      We thank the reviewer for their valuable suggestion of to include a biophysical model to further demonstrate the compatibility of the current findings with certain predictions of the model. While we acknowledge the potential benefits of implementing a biophysical model to understand the relationships between model parameters and observed behavior, this goes beyond the scope of the current study.

      We note that we have employed a modeling approach in a subsequent study to further explore how the properties and the resulting behavior of an oscillator map onto the patterns of human behavior we observed in the current study (Kaya & Henry, 2024, February 5). In that study, we fitted a canonical oscillator model, and several variants thereof, separately to datasets obtained from random-order and linear-order sessions of Experiment 1 of the current submission. The base model, adapted from McAuley and Jones (2003), assumed sustained oscillations within the trials of the experiment, and complete decay towards the preferred rate between the trials. We introduced a gradual decay parameter (Author response image 1A) that weighted between the oscillator's concurrent period value at the time of decay and its initial period (i.e., preferred rate). This parameter was implemented only within trials, between the standard stimulus sequence and comparison interval in Variant 1, between consecutive trials in Variant 2, and at both temporal locations in Variant 3. Model comparisons (Author response image 1B) showed that Variant 3 was the best-fitting model for both random- and linear-order datasets. Crucially, estimates for within- and between-trial decay parameters, obtained from Variant 3, were positively correlated, suggesting that oscillators gradually decayed towards their preferred rate at similar timescales after cessation of a stimulus.

      Author response image 1.

      (A) Illustration of the model fitted to Experiment 1 datasets and (B) model comparison results. In each trial, the model is initialized with a phase (ɸ) and period (P) value. A At the offset of each stimulus interval i, the model updates its phase (pink arrows) and period (blue arrows) depending on the temporal contrast (C) between the model state and stimulus onset and phase and period correction weights, Wɸ and Wp. Wdecaywithin updates the model period as a weighted average between the period calculated for the 5th interval, P5, and model’s preferred rate, P0. C, calculated at the offset of the comparison interval. Wdecaybetween parameter initializes the model period at the beginning of a new trial as a weighted average between the last period from the previous trial and P0. The base model’s assumptions are marked by asterisks, namely sustained oscillation during the silence (i=5), and complete decay between trials. B Left: The normalized probability of each model having the minimum BIC value across all models and across participants. Right: AICc, calculated from each model’s fit to participants’ single-session datasets. In both panels, random-order and linear-order sessions were marked in green and blue, respectively. B denotes the base model, and V1, V2 and V3 denote variants 1, 2 and 3, respectively.

      Although our behavioral results and modeling thereof must necessarily be interpreted as reflecting the mechanics of an attentional, but not a neural oscillator, these findings might shed light on the controversy in neuroscience research regarding the timeline of entrainment decay. While multiple studies show that neural oscillations can continue at the entrained rate for a number of cycles following entrainment (Bouwer et al., 2023; Helfrich et al., 2017; Lakatos et al., 2013; van Bree et al., 2021), different modeling approaches reveal mixed results on this phenomenon. Whereas Doelling and Assaneo (2021) show that a Stuart-Landau oscillator returns immediately back to its preferred rate after synchronizing to an external stimulus, simulations of other oscillator types suggest gradual decay toward the preferred rate (Large, 1994; McAuley, 1995; Obleser et al., 2017) or self-sustained oscillation at the external stimulus rate (Nachstedt et al., 2017).

      While the Doelling & Assaneo study (2021) provides insights on entrainment and behavior of the Stuart-Landau oscillator under certain conditions, the internal oscillators hypothesized by the dynamic attending theory might have different forms, therefore may not adhere to the behavior of a specific implementation of an oscillator model. Moreover, that a phase-coupled oscillator does not show gradual decay does not preclude that models with period tracking behave similarly. Adaptive frequency oscillators, for instance, are able to sustain the oscillation after the stimulus ceases (Nachstedt et al., 2017). Alongside with models that use Hebbian learning (Roman et al., 2023), the main implementations of the dynamic attending theory have parameters for period tracking and decay towards the preferred rate (Large, 1994; McAuley, 1995). In fact, the u-shaped pattern of duration discrimination sensitivity across a range of stimulus rates (Drake & Botte, 1993) is better explained by a decaying than a non-decaying oscillator (McAuley, 1995). To conclude, the literature suggests that the emergence of decay versus sustain behavior of the oscillators and the timeline of decay depend on the particular model used as well as its parameters and does therefore not offer a one-for-all solution.

      Reviewer #2 (Recommendations For The Authors):

      • Are the range, SD and mean of the random-order and linear-order sessions different? If so, why?

      Information regarding the SD and mean of the random-order and linear-order sessions was added to Experiment 1 Methods section.

      “While the mean (M = 599 ms), standard deviation (SD = 231 ms) and range (200, 998 ms) of the presented stimulus IOIs were identical between the sessions, the way IOI changed from trial to trial was different.“ (p. 5)

      • Perhaps the title could mention the age-related flexibility effect you demonstrate, which is an important contribution that without inclusion in the title could be missed in literature searches.

      We have changed the title to include age-related changes in oscillator flexibility. Thanks for the great suggestion.

      • Is the statistical analysis in Figure 4A between subjects? Shouldn't the analyses be within subjects?

      We have now better specified that the statistical analyses of Experiment 2’s preferred rate estimates were across the tasks, in Figure 4 caption.

      "Vertical lines above the box plots represent within-participants pairwise comparisons." (p. 17)

      • It says participants' hearing thresholds were measured using standard puretone audiometry. What threshold warranted participant exclusion and how many participants were excluded on the basis of hearing skills?

      We have now clarified that hearing threshold was not an exclusion criterion.

      "Participants were not excluded based on hearing threshold." (p. 11)

      • "Tapping rates from 'fastest' and 'slowest' FMT trials showed no difference between pre- and postsession measurements, and were additionally correlated across repeated measurements" - could you point to the statistics for this comparison?

      Table 2 includes the results from both experiments’ analyses on unpaced tapping. (p. 10)

      “The results of the pairwise comparisons between tapping rates from all unpaced tapping tasks across measurements are provided in Table 2.” (p. 15)

      • How was the loudness (dB) of the woodblock stimuli determined on a participant-by-participant basis? Please ignore if I missed this.

      Participants were allowed to set the volume to a comfortable level.

      "Participants then set the sound volume to a level that they found comfortable for completing the task." (p. 4)

      • Please spell out IOI, DEV, and other terms in full the first time they are mentioned in the manuscript.

      We added the descriptions of abbreviations before their initial mention.

      "In each experimental session, 400 unique trials of this task were presented, each consisting of a combination of the three main independent variables: the inter-onset interval, IOI; amount of deviation of the comparison interval from the standard, DEV, and the amount of change in stimulus IOI between consecutive trials, 𝚫IOI. We explain each of these variables in detail in the next paragraphs." (p. 4)

      • Small point: In Fig 1 sub-text, random order and linear order are explained in reverse order from how they are presented in the figure.

      We fixed the incompatibility between of Figure 1 content and caption.

      • Small point: I found the elaborate technical explanation of windowing methods, including alternatives that were not used, unnecessary.

      We moved the details of the smoothing analysis to the Supplementary Information.

      • With regard to the smoothing explanation, what is an "element"? Is this a sample? If so, what was the sampling rate?

      We reworded ‘element’ as ‘sample’. In the smoothing analyses, the sampling rate was the size of the convolution window, which was set to 26 for random-order, 48 for linear-order sessions.

      • Spelling/language error: "The pared-down", "close each other", "always small (+4 ms), than".

      We fixed the spelling errors.

      Reviewer #3 (Recommendations For The Authors):

      • My main concern is the one detailed as a weakness in the public review. In that direction, if authors decide to keep the mechanistic interpretation of the outcomes (which I believe is a valuable one) here I suggest a couple of models that they can try to adapt to explain the pattern of results:

      a. Roman, Iran R., et al. "Hebbian learning with elasticity explains how the spontaneous motor tempo affects music performance synchronization." PLOS Computational Biology 19.6 (2023): e1011154.

      b. Bose, Amitabha, Áine Byrne, and John Rinzel. "A neuromechanistic model for rhythmic beat generation." PLoS Computational Biology 15.5 (2019): e1006450.

      c. Egger, Seth W., Nhat M. Le, and Mehrdad Jazayeri. "A neural circuit model for human sensorimotor timing." Nature Communications 11.1 (2020): 3933.

      d. Doelling, K. B., Arnal, L. H., & Assaneo, M. F. (2022). Adaptive oscillators provide a hard-coded Bayesian mechanism for rhythmic inference. bioRxiv, 2022-06

      Thanks for the suggestion! Please refer to our response (2.1.) above. To summarize, although we considered a full, well-fleshed-out modeling approach to be beyond the scope of the current work, we are excited about and actively working on exactly this. Our modeling take is available as a preprint (Kaya & Henry, 2024, February 5).

      • Since the authors were concerned with the preferred rate they circumscribed the analysis to extract the IOI with better performance. Would it be plausible to explore how is the functional form between accuracy and IOI? This could shed some light on the underlying mechanism.

      Unfortunately, we were unsure about what the reviewer meant by the functional form between accuracy and IOI. We interpret it to mean a function that takes IOI as input and outputs an accuracy value. In that case, while we agree that estimating this function might indeed shed light on the underlying mechanisms, this type of analysis is beyond the scope of the current study. Instead, we refer the reviewer and reader to our modeling study (please see our response (2.1.) above) that includes a model which takes the stimulus conditions, including IOI, and model parameters for preferred rate, phase and period correction and within- and between-trial decay and outputs predicted accuracy for each trial. We believe that such modeling approach, as compared to a simple function, gives more insights regarding the relationship between oscillator properties and duration perception.

      • Is the effect caused by the dIOI modulated by the distance to the preferred frequency?

      We thank the reviewer for the recommendation. We measured flexibility by the oscillator's ability to adapt to on-line changes in the temporal context (i.e., effect of 𝚫IOI on accuracy), rather than by quantifying the range of rates with improved accuracy. Nevertheless, we acknowledge that distance to the preferred rate should decrease accuracy, as this is a key prediction of entrainment models. In fact, testing this prediction was recommended also by the other reviewer, in response to which we ran additional analyses. These analyses involved assessment of the relationship between accuracy and detuning. Specifically, we assessed accuracy at stimulus rates that were faster and slower than an individual's preferred rate estimates from in Experiment 1. We ran logistic regression models on aggregated datasets from all participants and sessions, where accuracy was predicted by z-scored IOI, from trials where the stimulus rate was faster than the preferred rate estimate, and in those where it was slower. The model had a significant main effect of IOI and an interaction between IOI and direction (i.e., whether stimulus rate was faster or slower than the preferred rate estimate), indicating that accuracy increased towards the preferred rate at fast rates and decreased as the stimulus rate diverged from the preferred rate at slow rates. We added information regarding this analysis to the respective subsections of Experiment 1 Methods and Results, added a plot showing the slices of the regression surfaces to Figure 2B and elaborated on the results in Experiment 1 Discussion. As the number of trials in Experiment 2 was insufficient, we only ran these additional analyses in Experiment 1. We agree that a range-based measure of oscillator flexibility would also index the oscillators’ adaptive abilities. However, the current paradigms were designed for assessment of temporal adaptation. Thus, comparison of the two approaches to measuring oscillator flexibility, which can be addressed in future studies, is beyond the scope of the current study.

      • Did the authors explore if the "motor component" (the difference between the motor and perceptual rates) is modulated by the participants age?

      In response to the reviewer’s comment, we correlated the difference between the motor and perceptual rates with age, which was nonsignificant.

      • Please describe better the slider and the keypress tasks. For example, what are the instructions given to the participant on each task, and how they differ from each other?

      We added the Experiment 2 instructions in Appendix A.

      • Typos: The caption in figure one reads 2 ms, while I believe it should say 200. Page 4 mentions that there are 400 trials and page 5 says 407.

      We fixed the typos.

      References

      Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cogn Psychol, 41(3), 254-311. https://doi.org/10.1006/cogp.2000.0738

      Bouwer, F. L., Fahrenfort, J. J., Millard, S. K., Kloosterman, N. A., & Slagter, H. A. (2023). A Silent Disco: Differential Effects of Beat-based and Pattern-based Temporal Expectations on Persistent Entrainment of Low-frequency Neural Oscillations. J Cogn Neurosci, 35(6), 9901020. https://doi.org/10.1162/jocn_a_01985

      Doelling, K. B., Arnal, L. H., & Assaneo, M. F. (2022). Adaptive oscillators provide a hard-coded Bayesian mechanism for rhythmic inference. bioRxiv. https://doi.org/10.1101/2022.06.18.496664

      Doelling, K. B., & Assaneo, M. F. (2021). Neural oscillations are a start toward understanding brain activity rather than the end. PLoS Biol, 19(5), e3001234. https://doi.org/10.1371/journal.pbio.3001234

      Drake, C., & Botte, M. C. (1993). Tempo sensitivity in auditory sequences: evidence for a multiplelook model. Percept Psychophys, 54(3), 277-286. https://doi.org/10.3758/bf03205262

      Haegens, S., & Zion Golumbic, E. (2018). Rhythmic facilitation of sensory processing: A critical review. Neurosci Biobehav Rev, 86, 150-165. https://doi.org/10.1016/j.neubiorev.2017.12.002

      Helfrich, R. F., Huang, M., Wilson, G., & Knight, R. T. (2017). Prefrontal cortex modulates posterior alpha oscillations during top-down guided visual perception. Proc Natl Acad Sci U S A, 114(35), 9457-9462. https://doi.org/10.1073/pnas.1705965114

      Henry, M. J., & Herrmann, B. (2014). Low-Frequency Neural Oscillations Support Dynamic Attending in Temporal Context. Timing & Time Perception, 2(1), 62-86. https://doi.org/10.1163/22134468-00002011

      Kaya, E., & Henry, M. J. (2024, February 5). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator. https://doi.org/10.31234/osf.io/q9uvr

      Lakatos, P., Musacchia, G., O'Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). The spectrotemporal filter mechanism of auditory selective attention. Neuron, 77(4), 750-761. https://doi.org/10.1016/j.neuron.2012.11.034

      Large, E. W. (1994). Dynamic representation of musical structure. The Ohio State University.

      Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106(1), 119-159. https://doi.org/Doi 10.1037/0033295x.106.1.119

      McAuley, J. D. (1995). Perception of time as phase: Toward an adaptive-oscillator model of rhythmic pattern processing Indiana University Bloomington].

      McAuley, J. D. (2010). Tempo and Rhythm. In Music Perception (pp. 165-199). https://doi.org/10.1007/978-1-4419-6114-3_6

      McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: a comparison of interval and entrainment approaches to short-interval timing. J Exp Psychol Hum Percept Perform, 29(6), 1102-1125. https://doi.org/10.1037/0096-1523.29.6.1102

      McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: life span development of timing and event tracking. J Exp Psychol Gen, 135(3), 348-367. https://doi.org/10.1037/0096-3445.135.3.348

      Nachstedt, T., Tetzlaff, C., & Manoonpong, P. (2017). Fast Dynamical Coupling Enhances Frequency Adaptation of Oscillators for Robotic Locomotion Control. Front Neurorobot, 11, 14. https://doi.org/10.3389/fnbot.2017.00014

      Obleser, J., Henry, M. J., & Lakatos, P. (2017). What do we talk about when we talk about rhythm? PLoS Biol, 15(9), e2002794. https://doi.org/10.1371/journal.pbio.2002794

      Roman, I. R., Roman, A. S., Kim, J. C., & Large, E. W. (2023). Hebbian learning with elasticity explains how the spontaneous motor tempo affects music performance synchronization. PLoS Comput Biol, 19(6), e1011154. https://doi.org/10.1371/journal.pcbi.1011154<br /> van Bree, S., Sohoglu, E., Davis, M. H., & Zoefel, B. (2021). Sustained neural rhythms reveal endogenous oscillations supporting speech perception. PLoS Biol, 19(2), e3001142. https://doi.org/10.1371/journal.pbio.3001142

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor suggestions:

      Abstract: I really liked the conclusion (that IM and VWM are two temporal extremes of the same process) as articulated in lines 557--563. (It is always satisfying when the distinction between two things that seem fundamentally different vanishes). If something like this but shorter could be included in the Abstract, it would highlight the novel aspects of the results a little more, I think.

      Thank you for this comment. We have added the following to the abstract:

      “A key conclusion is that differences in capacity classically thought to distinguish IM and VWM are in fact contingent upon a single resource-limited WM store.”

      L 216: There's an orphan parenthesis in "(justifying the use".

      Fixed.

      L 273: "One surprising result was the observed set size effect in the 0 ms delay condition". In this paragraph, it might be a good idea to remind the reader of the difference between the simultaneous and zero-delay conditions. If I got it right, the results differ between these conditions because it takes some amount of processing time to interpret the cue and free the resources associated with the irrelevant stimuli. Recalling that fact would make this paragraph easier to digest.

      That is correct. However, at this point in the text, we have not yet fitted the DyNR model to the data. Therefore, we believe that introducing cue processing and resource reallocation as concepts that differentiate between those two conditions would disrupt the flow of this paragraph. We address these points soon after, in a paragraph starting on line 341.

      Figures 3, 5: The labels at the bottom of each column in A would be more clear if placed at the top of each column instead. That way, the x-axis for the plots in A could be labeled appropriately, as "Error in orientation estimate" or something to that effect.

      We edited both figures, now Figure 4 and Figure 6, as suggested.

      L 379: It should be "(see Eq 6)", I believe.

      That is correct, line 379 (currently line 391) should read ‘Eq 6’. Fixed.

      L 379--385: I was a bit mystified as to why the scaled diffusion rate produced a worse fit than a constant rate. I imagine the scaled version was set to something like

      sigma^2_diff_scaled = sigma^2_base + K*(N-1)

      where N is the set size and sigma^2_base and K are parameters. If this model produced a similar fit as with a constant diffusion rate, the AIC would penalize it because of the extra parameter. But why would the fit be worse (i.e., not match the pattern of variability)? Shouldn't the fitter just find that the K=0 solution is the best? Not a big deal; the Nelder-Mead solutions can wobble when that many parameters are involved, but if there's a simple explanation it might be worth commenting on.

      The scaled diffusion was implemented by extending Eq 6 in the following way:

      σ(t)2 = (t-toffset) * σ̇ 2diff * N

      where N is set size. Therefore, the scaling was not associated with a free parameter that could become 0 if set size did not affect diffusion rate, but variability rather mandatory increased with set size. We now clarify this in the text:

      “The second variant was identical to the proposed model, except that we replaced the constant diffusion rate with a set size scaled diffusion rate by multiplying the right side of Eq 6 by N.“

      Figure 4 is not mentioned in the main text. Maybe the end of L 398 would be a good place to point to it. The paragraph at L 443-455 would also benefit from a couple of references to it.

      Thank you for this suggestion. Figure 4 (now Figure 5) was previously mentioned on line 449 (previously line 437), but now we have included it on line 410 (previously line 398), within the paragraph spanning lines 455-467 (previously 443-455), and also on line 136 where we first discuss masking effects.

      L 500: Figure S7 is mentioned before Figures S5 and S6. Quite trivial, I know....

      Thank you for this comment. There was no specific reason for Figure S7 to appear after S5 & S6, so we simply swapped their order to be consistent with how they are referred to in the manuscript (i.e., S7 became S5, S5 became S6, and S6 became S7).

      Reviewer #2 (Recommendations For The Authors):

      (1) One potential weakness is that the model assumes sensory information is veridical. However, this isn't likely the case. Acknowledging noise in sensory representations could affect the model interpretation in a couple of different ways. First, neurophysiological recordings have shown normalization affects sensory representations, even when a stimulus is still present on the screen. The DyNR model partially addresses this concern because reports are drawn from working memory, which is normalized. However, if sensory representations were also normalized, then it may improve the model variant where subjects draw directly from sensory representations (an alternative model that is currently described but discarded).

      Thank you for this suggestion. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      Second, visual adaptation predicts sensory information should decrease over time. This would predict that for long stimulus presentation times, the error would increase. Indeed, this seems to be reflected in Figure 5B. This effect is not captured by the DyNR model.

      Indeed, neural responses in the visual cortex have been observed to quickly adapt during stimulus presentation, showing reduced responses to prolonged stimuli after an initial transient (Groen et al., 2022; Sawamura et al., 2006; Zhou et al., 2019). This adaptation typically manifests as 1) reduced activity towards the end of stimulus presentation and 2) a faster decay towards baseline activity after stimulus offset.

      In the DyNR model, we use an idealized solution in which we convolve the presented visual signal with a response function (i.e., temporal filter). At the longest presentation durations, in DyNR, the sensory signal plateaus and remains stable until stimulus offset. Because our psychophysical data does not allow us to identify the exact neural coding scheme that underlies the sensory signal, we tend to favour this simple implementation, which is broadly consistent with some previous attempts to model temporal dynamics in sensory responses (e.g., Carandini and Heeger, 1994). However, we agree with the reviewer that some adaptation of the sensory signal with prolonged presentation would also be consistent with our data.

      We have added the following to the manuscript:

      “In Experiment 2, the longest presentation duration shows an upward trend in error at set sizes 4 and 10. While this falls within the range of measurement error, it is also possible that this is a meaningful pattern arising from visual adaptation of the sensory signal, whereby neural populations reduce their activity after prolonged stimulation. This would mean less residual sensory signal would be available after the cue to supplement VWM activity, predicting a decline in fidelity at higher set sizes. Visual adaptation has previously been successfully accounted for by a type of delayed normalization model in which the sensory signal undergoes a series of linear and nonlinear transformations (Zhou et al., 2019). Such a model could in future be incorporated into DyNR and validated against psychophysical and neural data.”

      Carandini, M., & Heeger, D. J. (1994). Summation and division by neurons in primate visual cortex. Science, 264(5163), 1333–1336. https://doi.org/10.1126/science.8191289

      Groen, I. I. A., Piantoni, G., Montenegro, S., Flinker, A., Devore, S., Devinsky, O., Doyle, W., Dugan, P., Friedman, D., Ramsey, N. F., Petridou, N., & Winawer, J. (2022). Temporal Dynamics of Neural Responses in Human Visual Cortex. The Journal of Neuroscience, 42(40), 7562–7580. https://doi.org/10.1523/JNEUROSCI.1812-21.2022

      Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of Neuronal Adaptation Does Not Match Response Selectivity: A Single-Cell Study of the fMRI Adaptation Paradigm. Neuron, 49(2), 307–318. https://doi.org/10.1016/j.neuron.2005.11.028

      Zhou, J., Benson, N. C., Kay, K., & Winawer, J. (2019). Predicting neuronal dynamics with a delayed gain control model. PLOS Computational Biology, 15(11), e1007484. https://doi.org/10.1371/journal.pcbi.1007484

      (2) A second potential weakness is that, in Experiment 1, the authors briefly change the sensory stimulus at the end of the delay (a 'phase shift', Fig. 6A). I believe this is intended to act as a mask. However, I would expect that, in the DyNR model, this should be modeled as a new sensory input (in Experiment 2, 50 ms is plenty of time for the subjects to process the stimuli). One might expect this change to disrupt sensory and memory representations in a very characteristic manner. This seems to make a strong testable hypothesis. Did the authors find evidence for interference from the phase shift?

      The phase shift was implemented with the intention of reducing retinal after-effects, essentially acting as a mask for retinal information only; crucially the orientation of the stimulus is unchanged by the phase shift, so from the perspective of the DyNR model, it transmits the same orientation information to working memory as the original stimulus.

      If our objective were to model sensory input at the level of individual neurons and their receptive fields, we would indeed need to treat this phase shift as a novel input. Nevertheless, for DyNR, conceived as an idealization of a biological system for encoding orientation information, we can safely assume that visual areas in biological organisms have a sufficient number of phase-sensitive simple cells and phase-indifferent complex cells to maintain the continuity of input to VWM.

      When comparing conditions with and without the phase shift of stimuli (Fig S1B), we found performance to be comparable in the perceptual condition (simultaneous presentation) and with the longest delay (1 second), suggesting that the phase shift did not change the visibility or encoding of information into VWM. In contrast, we found strong evidence that observers had access to an additional source of information over intermediate delays when the phase shift was not used. This was evident through enhanced recall performance from 0 ms to 400 ms delay. Based on this, we concluded that the additional source of information available in the absence of a phase shift was accessible immediately following stimulus offset and had a brief duration, aligning with the theoretical concept of retinal afterimages.

      (3) It seems odd that the mask does not interrupt sensory processing in Experiment 2. Isn't this the intended purpose of the mask? Should readers interpret this as all masks not being effective in disrupting sensory processing/iconic memory? Or is this specific to the mask used in the experiment?

      Visual masks are often described as instantly and completely halting the visual processing of information that preceded the mask. We also anticipated the mask would entirely terminate sensory processing, but our data indicate the effect was not complete (as indicated by model variants in Experiment 2). Nevertheless, we believe we achieved our intended goal with this experiment – we observed a clear modulation of response errors with changing stimulus duration, indicating that the post-stimulus information that survived masking did not compromise the manipulation of stimulus duration. Moreover, the DyNR model successfully accounted for the portion of signal that survived the mask.

      We can identify two possible reasons why masking was incomplete. First, it is possible that the continuous report measure used in our experiments is more sensitive than the discrete measures (e.g., forced-choice methods) commonly employed in experiments that found masks to be 100% effective. Second, despite using a flickering white noise mask at full contrast, it is possible that it may not have been the most effective mask; for instance, a mask consisting of many randomly oriented Gabor patches matched in spatial frequency to the stimuli could prove more effective. We decided against such a mask because we were concerned that it could potentially act as a new input to orientation-sensitive neurons, rather than just wiping out any residual sensory activity.

      (4) I apologize if I missed it, but the authors did not compare the DyNR model to a model without decaying sensory information for Experiment 1.

      We tested two DyNR variants in which the diffusion process was solely responsible for memory fidelity dynamics. These models assumed that the sensory signal terminates abruptly with stimuli offset, and the VWM signal encoding the stimuli was equal to the limit imposed by normalization, independent of the delay duration.

      As variants of this model failed to account for the observed response errors both quantitatively (see 'Fixed neural signal' under Model variants) and qualitatively (Figure S3), we decided not to test any more restrictive variants, such as the one without sensory decay and diffusion.

      (5) In the current model, selection is considered to be absolute (all or none). However, this need not be the case (previous work argues for graded selection). Could a model where memories are only partially selected, in a manner that is mediated by load, explain the load effects seen in behavior?

      Thank you for this point. If attentional selection was partial, it would affect the observers’ efficiency in discarding uncued objects to release allocated resources and encode additional information about the cued item. We and others have previously examined whether humans can efficiently update their VWM when previous items become obsolete. For example, Taylor et al. (2023) showed that observers could efficiently remove uncued items from VWM and reallocate the released resources to new visual information. These findings align with results from other studies (e.g., Ecker, Oberauer, & Lewandowsky, 2014; Kessler & Meiran, 2006; Williams et al., 2013).

      Based on these findings, we feel justified in assuming that observers in our current task were capable of fully removing all uncued objects, allowing them to continue the encoding process for the cued orientation that was already partially stored in VWM, such that the attainable limit on representational precision for the cued item equals the maximum precision of VWM.

      Partial removal could in principle be modelled in the DyNR model by introducing an additional plateau parameter specifying a maximum attainable precision after the cue. Our concern would be that such a plateau parameter would trade off with the parameter associated with Hick’s law (i.e., cue interpretation time). The former would control the amount of information that can be encoded into VWM, while the latter regulates the amount of sensory information available for encoding. We are wary of adding additional parameters, and hence flexibility, to the model where we do not have the data to sufficiently constrain them.

      Ecker, U. K. H., Oberauer, K., & Lewandowsky, S. (2014b). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15. https://doi.org/10.1016/j.jml. 2014.03.006

      Kessler, Y., & Meiran, N. (2006). All updateable objects in working memory are updated whenever any of them are modified: Evidence from the memory updating paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 570–585. https://doi.org/10.1037/0278-7393.32.3.570

      Taylor, R., Tomić, I., Aagten-Murphy, D., & Bays, P. M. (2023). Working memory is updated by reallocation of resources from obsolete to new items. Attention, Perception, & Psychophysics, 85(5), 1437–1451. https://doi.org/10.3758/s13414-022-02584-2

      Williams, M., & Woodman, G. F. (2012). Directed forgetting and directed remembering in visual working memory. Journal of Experimental Psychology. Learning, Memory, and Cognition, 38(5), 1206–1220. https://doi.org/10.1037/a0027389

      (6) Previous work, both from the authors and others, has shown that memories are biased as if they are acted on by attractive/repulsive forces. For example, the memory of an oriented bar is biased away from horizontal and vertical and biased towards diagonals. This is not accounted for in the current model. In particular, this could be one mechanism to generate a non-uniform drift rate over time. As noted in the paper, a non-uniform drift rate could capture many of the behavioral effects reported.

      The reviewer is correct that the model does not currently include stimulus-specific effects, although our work on that topic provides a clear template for incorporating them in future (e.g. Taylor & Bays, 2018). Specifically on the question of generating a non-uniform drift, we have another project that currently looks at this exact question (cited in our manuscript as Tomic, Girones, Lengyel, and Bays; in prep.). By examining various datasets with varying memory delays, including the Additional Dataset 1 reported in the Supplementary Information, we found that stimulus-specific effects on orientation recall remain constant with retention time. Specifically, although there is a clear increase in overall error over time, estimation biases remain constant in direction and amplitude, indicating that the bias does not manifest in drift rates (see also Rademaker et al., 2018; Figure S1).

      Taylor, R., & Bays, P. M. (2018). Efficient coding in visual working memory accounts for stimulus-specific variations in recall. The Journal of Neuroscience, 1018–18. https://doi.org/10.1523/JNEUROSCI.1018-18.2018

      Rademaker, R. L., Park, Y. E., Sack, A. T., & Tong, F. (2018). Evidence of gradual loss of precision for simple features and complex objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance. https://doi.org/10.1037/xhp0000491

      (7) Finally, the authors use AIC to compare many different model variants to the DyNR model. The delta-AICs are high (>10), indicating a strong preference for the DyNR model over the variants. However, the overall quality of fit to the data is not clear. What proportion of the variance in data was the model able to explain? In particular, I think it would be helpful for the reader if the authors reported the variance explained on withheld data (trials, conditions, or subjects).

      Thank you for this comment.

      Below we report the estimates of r2, representing the goodness of fit between observed data (i.e., RMSE) and the DyNR model predictions.

      In Experiment 1, the r2 values between observations and predictions were computed across delays for each set size, yielding the following estimates: r2ss1 = 0.60; r2ss4 = 0.87; r2ss10 = 0.95. Note that lower explained variance for set size 1 arises from both data and model predictions having near-constant precision.

      In Experiment 2, we calculated r2 between observations and predictions across presentation durations, separately for each set size, resulting in the following estimates: r2ss1 = 0.88; r2ss4 = 0.71; r2ss10 = 0.70. Note that in this case the decreasing percentage of explained variance with set size is a consequence of having less variability in both data and model predictions with larger set sizes.

      While these estimates suggest that the DyNR model effectively fits the psychophysical data, a more rigorous validation approach would involve cross-validation checks across all conditions with a withheld portion of trials. Regrettably, due to the large number of conditions in each experiment, we could only collect 50 trials per condition. We are sceptical that fitting the model to even fewer trials, as necessary for cross-validation, would provide a reliable assessment of model performance.

      Minor: It isn't clear to me why the behavioral tasks are shown in Figure 6. They are important for understanding the results and are discussed earlier in the manuscript (before Figure 3). This just required flipping back and forth to understand the task before I could interpret the results.

      Thank you for this comment. We have now moved the behavioural task figure to appear early in the manuscript (as Figure 3).

      Reviewer #3 (Recommendations For The Authors):

      (1) Dynamics of sensory signals during perception

      I believe that the modeled sensory signal is a reasonable simplification and different ways to model the decay function are discussed. I would like to ask the authors to discuss the implications of slightly more complex initial sensory transients such as the ones shown in Teeuwen (2021). Specifically for short exposure times, this might be particularly relevant for the model fits as some of the alternative models diverge from the data for short exposures. In addition, the role of feedforward (initial transient?) and feedback signaling (subsequent "plateau" activity) could be discussed. The first one might relate more strongly to sensory signals whereas the latter relates more to top-down attention/recurrent processing/VWM.

      Particularly, this latter response might also be sensitive to the number of items present on the screen which leads to a related question pertaining to the limitations of attention during perception. Some work suggests that perception is similarly limited in the amount of information that can be represented concurrently (Tsubomi, 2013). Could the authors discuss the implications of this hypothesis? What happens if maximum sensory amplitude is set as a free parameter in the model?

      Tsubomi, H., Fukuda, K., Watanabe, K., & Vogel, E. K. (2013). Neural limits to representing objects still within view. Journal of Neuroscience, 33(19), 8257-8263.

      Thank you for this question. Below, we unpack it and answer it point by point.

      While we agree our model of the sensory response is justified as an idealization of the biological reality, we also recognise that recent electrophysiological recordings have illuminated intricacies of neuronal responses within the striate cortex, a critical neural region associated with sensory memory (Teeuwen et al, 2021). Notably, these recordings reveal a more nuanced pattern where neurons exhibit an initial burst of activity succeeded by a lower plateau in firing rate, and stimulus offset elicits a second small burst in the response of some neurons, followed by a gradual decrease in activity after the stimulus disappears (Teeuwen et al, 2021).

      In general, asynchronous bursts of activity in individual neurons will tend to average out in the population making little difference to predictions of the DyNR model. Synchronized bursts at stimulus onset could affect predictions for the shortest presentations in Exp 2, however the model appears to capture the data very well without including them. We would be wary of incorporating these phenomena into the model without more clarity on their universality (e.g., how stimulus-dependent they are), their significance at the population level (as opposed to individual neurons), and most importantly, their prominence in visual areas outside striate cortex. Specifically, while Teeuwen et al. (2021) described activity in V1, our model does not make strong assumptions about which visual areas are the source of the sensory input to WM. Based on these uncertainties we believe the idealized sensory response is justified for use in our model.

      Next, thank you for the comment on feedforward and feedback signals. We have added the following to our manuscript:

      “Following onset of a stimulus, the visual signal ascends through visual areas via a cascade of feedforward connections. This feedforward sweep conveys sensory information that persists during stimulus presentation and briefly after it disappears (Lamme et al., 1998). Simultaneously, reciprocal feedback connections carry higher-order information back towards antecedent cortical areas (Lamme and Roelfsema, 2000). In our psychophysical task, feedback connections likely play a critical role in orienting attention towards the cued item, facilitating the extraction of persisting sensory signals, and potentially signalling continuous information on the available resources for VWM encoding. While our computational study does not address the nature of these feedforward and feedback signals, a challenge for future research is to describe the relative contributions of these signals in mediating transmission of information between sensory and working memory (Semedo et al., 2022).”

      Lamme, V. A., Supèr, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8(4), 529–535. https://doi.org/10.1016/S0959-4388(98)80042-1

      Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23(11), 571–579. https://doi.org/10.1016/S0166-2236(00)01657-X

      Semedo, J. D., Jasper, A. I., Zandvakili, A., Krishna, A., Aschner, A., Machens, C. K., Kohn, A., & Yu, B. M. (2022). Feedforward and feedback interactions between visual cortical areas use different population activity patterns. Nature Communications, 13(1), 1099. https://doi.org/10.1038/s41467-022-28552-w

      Finally, both you and Reviewer 2 raised a similar interesting question regarding capacity limitations of attention during perception Such a limitation could be modelled by freely estimating sensory amplitude and implementing divisive normalization to that signal, similar to how VWM is constrained. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      (2) Effectivity of retro-cues at long delays

      Can the authors discuss how cues presented at long delays (>1000 ms) can still lead to increased memory fidelity when sensory signals are likely to have decayed? A list of experimental work demonstrating this can be found in Souza & Oberauer (2016).

      Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78, 1839-1860.

      The increased memory fidelity observed with longer delays between memory array offset and cue does not result from integrating available sensory signals into VWM because the sensory signal would have completely decayed by that time. Instead, research so far has indicated several alternative mechanisms that could lead to higher recall precision for cued items, and we can briefly summarize some of them, which are also reviewed in more detail in Souza and Oberauer (2016).

      One possibility is that, after a highly predictive retro-cue indicates the to-be-tested item, uncued items can simply be removed from VWM. This could result in decreased interference for the cued item, and consequently higher recall precision. Secondly, the retro-cue could also indicate which item can be selectively attended to, and thereby differentially strengthening it in memory. Furthermore, the retro-cue could allow evidence to accumulate for the target item ahead of decision-making, and this could increase the probability that the correct information will be selected for response. Finally, the retro-cued stimulus could be insulated from interference by subsequent visual input, while the uncued stimuli may remain prone to such interference.

      A neural account of this retro-cue effect based on the original neural resource model has been proposed in Bays & Taylor, Cog Psych, 2018. However, as we did not use a retro-cue design in the present experiments, we have decided not to elaborate on this in the manuscript.

      (3) Swap errors

      I am somewhat surprised by the empirically observed and predicted pattern of swap errors displayed in Figure S2. For set size 10, swap probability does not consistently increase with the duration of the retention interval, although this was predicted by the author's model. At long intervals, swap probability is significantly higher for large compared to small set sizes, which also seems to contrast with the idea of shared, limited VWM resources. Can the authors provide some insight into why the model fails to reproduce part of the behavioral pattern for swap errors? The sentence in line 602 might also need some reconsideration in this regard.

      Determining the ground truth for swap errors poses a challenge. The prevailing approach has been to employ a simpler model that estimates swap errors, such as a three-component mixture model, and use those estimates as a proxy for ground truth. However, this method is not without its shortcomings. For example, the variability of swap frequency estimates tends to increase with variability in the report feature dimension (here, orientation). This is due to the increasing overlap of response probability distributions for swap and non-swap responses. Consequently, the discrepancy between any two methods of swap estimation is most noticeable when there is substantial variability in orientation reports (e.g., 10 items and long delay or short exposure).

      When modelling swap frequency in the DyNR model, our aim was to provide a parsimonious account of swap errors while implementing similar dynamics in the spatial (cue) feature as in the orientation (report) feature. This parametric description captured the overall pattern of swap frequency with set size and retention and encoding time, but is still only an approximation of the predictions if we fully modelled memory for the conjunction of cue and report features (as in e.g. Schneegans & Bays, 2017; McMaster et al, 2020).

      We expanded the existing text in the section ‘Representational dynamics of cue-dimension features’ of our manuscript:

      “… Although we did not explicitly model the neural signals representing location, the modelled dynamics in the probability of swap errors were consistent with those of the primary memory feature. We provided a more detailed neural account of swap errors in our earlier works that is theoretically compatible with the DyNR model (McMaster et al., 2020; Schneegans & Bays, 2017).

      The DyNR model successfully captured the observed pattern of swap frequencies (intrusion errors). The only notable discrepancy between DyNR and the three-component mixture model (Fig. S2) arises with the largest set size and longest delay, although with considerable interindividual variability. As the variability in report-dimension increases, the estimates of swap frequency become more variable due to the growing overlap between the probability distributions of swap and non-swap responses. This may explain apparent deviations from the modelled swap frequencies with the highest set size and longest delay where orientation response variability was greatest. “

      McMaster, J. M. V., Tomić, I., Schneegans, S., & Bays, P. M. (2022). Swap errors in visual working memory are fully explained by cue-feature variability. Cognitive Psychology, 137, 101493. https://doi.org/10.1016/j.cogpsych.2022.101493

      Schneegans, S., & Bays, P. M. (2017). Neural Architecture for Feature Binding in Visual Working Memory. The Journal of Neuroscience, 37(14), 3913–3925. https://doi.org/10.1523/JNEUROSCI.3493-16.2017

      (4) Direct sensory readout

      The model assumes that readout from sensory memory and from VWM happens with identical efficiency. Currently, we don't know if these two systems are highly overlapping or are fundamentally different in terms of architecture and computation. In the case of the latter, it might be less reasonable to assume that information readout would happen at similar efficiencies, as it is currently assumed in the manuscript. Perhaps the authors could briefly discuss this possibility.

      In the direct sensory read-out model, we did not explicitly model the efficiency of readout from either sensory or VWM store. However, the distinctive prediction of this model is that the precision of recall changes exponentially with delay at every set size, including one item. This prediction does not depend on the relative efficiency of readout from sensory and working memory, but only on the principle that direct readout from sensory memory bypasses the capacity limit on working memory. This prediction is inconsistent with the pattern of results observed in Experiment 1, where early cues did not show a beneficial effect on recall error for set size 1. While the proposal raised by the reviewer is intriguing, even if we were to model the process of readout from both the sensory and VWM stores with different efficiencies, the direct read-out model could not account for the near-constant recall error with delay for set size one.

      (5) Encoding of distractors

      One of the model assumptions is that, for simultaneous presentations of memory array and cue only the cued feature will be encoded. Previous work has suggested that participants often accidentally encode distractors even when they are cued before memory array onset (Vogel 2005). Given these findings, how reasonable is this assumption in the authors' model?

      Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal individual differences in controlling access to working memory. Nature, 438(7067), 500-503.

      Although previous research suggested that observers can misinterpret the pre-cue and encode one of the uncued items, our results argue against this being the case in the current experiment. Such encoding failures would manifest in overall recall error, resulting in a gradient of error with set size, owing to the presence of more adjacent distractors in larger set sizes. However, when we compared recall errors between set sizes in the simultaneous cue condition, we did not find a significant difference between set sizes, and moreover, our results were more likely under the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). If observers occasionally encoded and reported one of the uncued items in the simultaneous cue condition, those errors were extremely infrequent and did not affect the overall error distributions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al., investigated the relationship between monocular and binocular responses of V1 superficial-layer neurons using two-photon calcium imaging. They found a strong relationship in their data: neurons that exhibited a greater preference for one eye or the other (high ocular dominance) were more likely to be suppressed under binocular stimulation, whereas neurons that are more equivalently driven by each other (low ocular dominance) were more likely to be enhanced by binocular stimulation. This result chiefly demonstrates the relationship between ocular dominance and binocular responses in V1, corroborating what has been shown previously using electrophysiological techniques but now with greater spatial resolution (albeit less temporal resolution). The binocular responses were well-fitted by a model that institutes divisive normalization between the eyes that accounts for both the suppression and enhancement phenomena observed in the subpopulation of binocular neurons. In so doing, the authors reify the importance of incorporating ocular dominance in computational models of binocular combination.

      The conclusions of this paper are mostly well supported by the data, but there are some limitations of the methodology that need to be clarified, and an expansion of how the results relate to previous work would better contextualize these important findings in the literature.

      Strengths:

      The two-photon imaging technique used to resolve the activity of individual neurons within intact brain tissue grants a host of advantages. Foremost, two-photon imaging confers considerably high spatial resolution. As a result, the authors were able to sample and analyze the activity from thousands of verified superficial-layer V1 neurons. The animal model used, awake macaques, is also highly relevant for the study of binocular combination. Macaques, like humans, are binocular animals, meaning they have forward-facing eyes that confer overlapping visual fields. Importantly, macaque V1 is organized into cortical columns that process specific visual features from the separate eyes just like in humans. In combination with a powerful imaging technique, this allowed the authors to evaluate the monocular and binocular response profiles of V1 neurons that are situated within neighboring ocular dominance columns, a novel feat. To this aim, the approach was well-executed and should instill further confidence in the notion that V1 neurons combine monocular information in a manner that is dependent on the strength of their ocular dominance.

      Weaknesses:

      While two-photon imaging provides excellent spatial resolution, its temporal resolution is often lower compared to some other techniques, such as electrophysiology. This limits the ability to study the fast dynamics of neuronal activity, a well-understood trade-off of the method. The issue is more so that the authors draw comparisons to electrophysiological studies without explicit appreciation of the temporal difference between these techniques. In a similar vein, two-photon imaging is limited spatially in terms of cortical depth, preferentially sampling from neurons in layers 2/3. This limitation does not invalidate any of the interpretations but should be considered by readers, especially when making comparisons to previous electrophysiological reports using microelectrode linear arrays that sample from all cortical layers. Indeed, it is likely that a complete picture of early cortical binocular processing will require high spatial resolution (i.e., sampling from neurons in neighboring ocular dominance columns, from pia mater to white matter) at the biophysically relevant timescales (1ms resolution, capturing response dynamics over the full duration of the stimulus presentation, including the transient onset and steady-state periods).

      To address the same concern from all three reviewers, we discussed the technical limitations of two photon calcium imaging at the end of Discussion, including limited imaging depth, low temporal resolution, and nonlinearity. The relevant texts are copied here:

      (Ln 304) “Limitations of the current study

      Although capable of sampling a large number of neurons at cellular resolution and with low sampling bias, two-photon calcium imaging has its known limitations that may better make it a complementary research tool to electrophysiological recordings.

      For example, two-photon imaging can only sample neurons from superficial-layers, while binocular neurons also exist in deeper layers, and even neurons in the input layer are affected by feedback from downstream binocular neurons to exhibit binocular response properties (Dougherty, Cox, Westerberg, & Maier, 2019). Furthermore, calcium signals are relatively slow and cannot reveal the fast dynamics of neuronal responses. Due to these spatial and temporal limitations, a more complete picture of the neuronal mechanisms underlying binocular combination of monocular responses may come from studies using both technologies.

      In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although calcium signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rates within a range of 10-150 Hz (Li, Liu, Jiang, Lee, & Tang, 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the differences in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      (Recommendations For The Authors):

      Overall, my main suggestion for the authors to improve the paper is to revise some of the interpretations of their results in relation to previous research. The purpose of the present study was to illustrate a more complete picture of the binocular combination of monocular responses by taking into consideration the ocular dominance of V1 cells (lines 34-36). A study published earlier this year had an identical purpose (Mitchell et al., Current Biology, 2023) and arrived at a highly similar conclusion (and also applied divisive normalization to fit their data). I would ask that this paper be mentioned in the introduction and discussed.

      The Mitchell et al 2023 paper is added to the Introduction and Discussion:

      (Ln 50) “In addition (to the Dougherty et al 2019 paper from the same group), Mitchell, Carlson, Westerberg, Cox, and Maier (2023) reported that binocular combination of monocular stimuli with different contrasts is also affected by neurons’ eye preference.”

      (Ln 286) “The critical roles of ocular dominance have been largely overlooked by extant binocular vision models to our knowledge, except that Anderson and Movshon (1989) demonstrated that a model consisting of multiple ocular dominance channels can better explain their psychophysical adaptation data, and that Mitchell et al. (2023) revealed that binocular combination of different contrasts presented to different eyes are affected by neurons’ ocularity preference.”

      Nevertheless, the results of the present study are very valuable. They add substantial spatial resolution and sophisticated relational analysis of monocular and binocular responses that Mitchell et al., 2023 did not include. Therefore, my suggestion is to emphasize the advantages of two-photon imaging in the introduction, focusing on the ability to image neurons in neighboring ocular dominance columns. The rigorous modeling of the relationship between nearby neurons with a range of eye preferences, in tandem with the incredible yield of two-photon imaging, is what sets this paper apart from previous electrophysiological work.

      The finding that binocular responses were dependent on ocular dominance is largely consistent with previous electrophysiological results. However, there should be a paragraph in the discussion section that speaks to the limitations of comparing two-photon imaging data to electrophysiological data. Namely, there are two limitations:

      (1) These two techniques confer different temporal resolutions. It is conceivable that some of the electrophysiology relationships (for example, described by Dougherty et al., 2019) may be dependent on the temporal window over which the data was averaged, typically over 50-100ms around stimulus onset, or 100-250ms comprising the neurons' sustained response to the stimulus. This possible explanation of the difference in obtained results would be especially useful for the discussion paragraph starting at line 232. It would also be helpful to readers for there to be some mention of the advantage of having high temporal resolution (i.e., the benefits of electrophysiology) since (a) recent work has distinguished between sequential stages of binocular combination (Cox et al., 2019) and (b) modern models of V1 neurons emphasize recurrent feedback to explain V1 temporal dynamics (see Heeger et al., 2019; Rubin et al., 2015), which could prove to be relevant for combination of stimuli in the two eyes (Fleet et al., 1997).

      Our discussion regarding the technical limitations of 2-p calcium imaging has been listed earlier. Specific to the Dougherty et 2019 paper, we added the following discussion to address the issue of temporal resolution difference between two technologies.

      (Ln 266) “In addition, it is unclear whether the discrepancies are caused by different temporal resolutions of electrode recording and calcium imaging. The results of Dougherty et al. (2019) represent changes of neuronal spike activities over a period of approximately 50-200 ms after the stimulus onset, which may reflect the sustained neuronal responses to the stimulus and possible feedback signals. Calcium signals are much slower and indicative of the aggregated neuronal responses over a longer period (up to 1000 ms in the current study). They should have smeared, rather than exaggerated, the differences between monocular and binocular responses, although we cannot exclude the possibility that some neuronal response changes beyond 200 ms are responsible for the discrepancies.”

      (2) The sample of V1 neurons in this study is limited to cells in the most superficial layers of the cortex (layers 2/3). This limitation is, of course, well understood, but it should be mentioned at least in the context of studying the formative mechanisms of binocular combination in V1 (since we know that binocular neurons also exist in layers 5/6, and there is now substantial evidence that even layer 4 neurons are not as "monocular" as we previously thought (Dougherty et al., 2019)).

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      In short, I believe the paper would be improved by (1) adding the above citations in the appropriate places, (2) acknowledging in the introduction that this question has been investigated electrophysiologically but emphasizing the advantages of two-photon imaging, and (3) adding a paragraph to the discussion section that discusses the temporal and spatial limitations when using two-photon imaging to study binocular combination, particularly when comparing the results to electrophysiology.

      Reviewer #2 (Public Review):

      Summary:

      This study examines the pattern of responses produced by the combination of left-eye and right-eye signals in V1. For this, they used calcium imaging of neurons in V1 of awake, fixating monkeys. They take advantage of calcium imaging, which yields large populations of neurons in each field of view. With their data set, they observe how response magnitude relates to ocular dominance across the entire population. They analyze carefully how the relationship changed as the visual stimulus switched from contra-eye only, ipsi-eye only, and binocular. As expected, the contra-eye-dominated neurons responded strongly with a contra-eye-only stimulus. The ipsi-eye-dominated neurons responded strongly with an ipsi-eye-only stimulus. The surprise was responses to a binocular stimulus. The responses were similarly weak across the entire population, regardless of each neuron's ocular dominance. They conclude that this pattern of responses could be explained by interocular divisive normalization, followed by binocular summation.

      Strengths:

      A major strength of this work is that the model-fitting was done on a large population of simultaneously recorded neurons. This approach is an advancement over previous work, which did model-fitting on individual neurons. The fitted model in the manuscript represents the pattern observed across the large population in V1, and washes out any particular property of individual neurons. Given the large neuronal population from which the conclusion was drawn, the authors provide solid evidence supporting their conclusion. They also observed consistency across 5 fields of view.

      The experiments were designed and executed appropriately to test their hypothesis. Their data support their conclusion.

      Weaknesses:

      One weakness of their study is that calcium signals can exaggerate the nonlinear properties of neurons. Calcium imaging renders poor responses poorer and strong responses stronger, compared to single-unit recording. In particular, the dramatic change in the population response between monocular stimulation and binocular stimulation could actually be less pronounced when measured with single-unit recording methods. This means their choice of recording method could have accidentally exaggerated the evidence of their finding.

      We discussed the nonlinearity of calcium signals as part of the technical limitations of 2-p imaging calcium. The calcium indicator we use, GCaMP5, has a reasonable range of linear relationship with spike rates. But out of this range, the nonlinearity is indeed a concern.

      (Ln 314) “In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rate within a range of 10-150 Hz (Li et al., 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the changes in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      The implication of their finding is that strong ocular dominance is the result of release from interocular suppression by a monocular stimulus, rather than the lack of binocular combination as many traditional studies have assumed. This could significantly advance our understanding of the binocular combination circuitry of V1. The entire population of neurons could be part of a binocular combination circuitry present in V1.

      This is a very good insight. We added the following sentences to the end of the first paragraph of Discussion:

      (Ln 242) “These findings implicate that at least for neurons in superficial layers of V1, significant ocular dominance may result from a release of interocular suppression during monocular stimulation, an unusual viewing condition as our vision is typically binocular, rather than a lack of binocular combination of inputs from upstream monocular neurons.”

      (Recommendations For The Authors):

      Line 150: "To model interocular response suppression, responses from each eye in Eq. 2 were further normalized by an interocular suppression factor wib or wcb," I recommend the authors improve their explanation of how they arrived at Eq. 3 from Eq. 2. As it stands, my impression is that they have one model for the responses to monocular stimulation, and another model for the responses to binocular stimulation. What I think is missing is that both equations are derived from the same model. Monocular stimulation is a situation in which the stimulus in one eye's contrast is zero. Could the authors clarify whether this situation produces an interocular suppression of zero, and how that leads to Eq. 2?

      We rewrote the modeling part to show that Equations 1-3 are sequential steps of development for the same model. We also added a brief paragraph to discuss how Eq. 3 could lead to Eq. 2 under monocular viewing:

      (Ln 166) “Although not shown in Eq. 3, we also assumed that the nonlinear exponent b also depends on the contrast of the stimulus presented to the other eye (i.e., Sc or Si). Consequently, when Sc or Si = 0 under monocular stimulation, Rc or Ri = 0 (Eq. 1), and interocular suppression wib or wcb = 1, so Eq. 3 changes back to Eq. 2. It is only when Sc and Si are equal and close to 1, as in the current study, that interocular suppression and binocular combination would be in the current Eq. 3 format.”

      Line 225: "However, individually, compared to monocular responses, responses of monocular neurons more preferring the stimulated eye are actually suppressed, and only responses of binocular neurons are increased by binocular stimulation." This sentence is difficult to follow. I recommend the authors improve clarity by breaking up the sentence into several sentences. If I understand correctly, they summarize the pattern in the data that is indicative of interocular divisive normalization, i.e., their final conclusion.

      This sentence no longer exists in the Discussion.

      Line 426: "Third, for those showing significant orientation difference, the trial-based orientation responses of each neuron were fitted with a Gaussian model with a MATLAB nonlinear least squares function:" The choice of using a Gaussian function to fit orientation tuning was probably suboptimal. A Gaussian function provides an adequate fit only for neurons whose tuning is very sharp. The responses outside of the peak fall down to the baseline and the two ends meet. Otherwise, the two ends do not meet. An adequate fit would be achieved with a function of a circular variable, which wraps around 180 deg. I recommend using a Von Mises function for fitting orientation tuning.

      We agree with the reviewer that the Von Mises function is more accurate than Gaussian for fitting orientation tuning functions. Indeed we are using it to fit orientation tuning of V4 neurons, many of which have two peaks. For the current V1 data, the differences between Von Mises and Gaussian fittings are very small, as shown in the orientation functional maps from three macaques below. Because we also use the same Gaussian fitting of orientation tuning in several published and current under-review papers, we prefer to keep the Gaussian fitting results in the manuscript.

      Author response image 1.

      Reviewer #3 (Public Review):

      The authors have made simultaneous recordings of the responses of large numbers of neurons from the primary visual cortex using optical two-photon imaging of calcium signals from the superficial layers of the cortex. Recordings were made to compare the responses of the cortical neurons under normal binocular viewing of a flat screen with both eyes open and monocular viewing of the same screen with one eye's view blocked by a translucent filter. The screen displayed visual stimuli comprising small contrast patches of Gabor function distributions of luminance, a stimulus that is known to excite cortical neurons.

      This is an important data set, given the large numbers of neurons recorded. The authors present a simple model to explain the binocular combination of neuronal signals from the right and left eyes.

      The limitations of the paper as written are as follows. These points can be addressed with some additional analysis and rewriting of sections of the paper. No new experimental data need to be collected.

      (1) The authors should acknowledge the fact that these recordings arise from neurons in the superficial layers of the cortex. This limitation arises from the usual constraints on optical imaging in the macaque cortex. This means that the sample of neurons forming this data set is not fully representative of the population of binocular neurons within the visual cortex. This limitation is important in comparing the outcome of these experiments with the results from other studies of binocular combination, which have used single-electrode recording. Electrode recording will result in a sample of neurons that is drawn from many layers of the cortex, rather than just the superficial layers.

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      (2) Single-neuron recording of binocular neurons in the primary visual cortex has shown that these neurons often have some spontaneous activity. Assessment of this spontaneous level of firing is important for accurate model fitting [1]. The paper here should discuss the level of spontaneous neuronal firing and its potential significance.

      We have noticed previously that at non-optimal spatial frequencies, calcium responses to a moving Gabor grating are close to zero (Guan et al., Prog Neurobiology, 2021, Fig. 1B), but we cannot tell whether this is due to calcium response nonlinearity, or a close-to-zero level of spontaneous neuronal activity. Prince et al (2002) reported low spontaneous responses of V1 neurons with moving grating stimuli (e.g., about 3 spikes/sec in one exemplar neuron, their Fig. 1B), so this appears not a big effect. In our data fitting, we do have an orientation-unspecific component in the Gaussian model, which represents the neuronal response at a non-preferred orientation, but not necessarily the spontaneous activity.

      (3) The arrangements for visual stimulation and comparison of binocular and monocular responses mean that the stereoscopic disparity of the binocular stimuli is always at zero or close to zero. The animal's fixation point is in the centre of a single display that is viewed binocularly. The fixation point is, by definition, at zero disparity. The other points on the flat display are also at zero disparity or very close to zero because they lie in the same depth plane. There will be some small deviations from exactly zero because the geometry of the viewing arrangements results in the extremities of the display being at a slightly different distance than the centre. Therefore, the visual stimulation used to test the binocular condition is always at zero disparity, with a slight deviation from zero at the edges of the display, and never changes. [There is a detail that can be ignored. The experimenters tested neurons with visual stimulation at different real distances from the eyes, but this is not relevant here. Provided the animals accurately converged their eyes on the provided binocular fixation point, then the disparity of the visual stimuli will always be at or close to zero, regardless of viewing distance in these circumstances.] However, we already know from earlier work that neurons in the visual cortex exhibit a range of selectivity for binocular disparity. Some neurons have their peak response at non-zero disparities, representing binocular depths nearer than the fixation depth or beyond it. The response of other neurons is maximally suppressed by disparities at the depth of the fixation point (so-called Tuned Inhibitory [TI] neurons). The simple model and analysis presented in the paper for the summation of monocular responses to predict binocular responses will perform adequately for neurons that are tuned to zero disparity, so-called tuned excitatory neurons [TE], but is necessarily compromised when applied to neurons that have other, different tuning profiles. Specifically, when neurons are stimulated binocularly with a non-preferred disparity, the binocular response may be lower than the monocular response[2, 3]. This more realistic view of binocular responses needs to be considered by the authors and integrated into their modelling.

      We agree and include the following texts when discussing the future work:

      (Ln 298) “In addition, in our experiments, binocular stimuli were presented with zero disparity, which best triggered the responses of neurons with zero-disparity tuning. A more realistic model of binocular combination also requires the consideration of neurons with other disparity-tuning profiles.”

      (4) The data in the paper show some features that have been reported before but are not captured by the model. Notably for neurons with extreme values of ocular dominance, the binocular response is typically less than the larger of the two monocular responses. This is apparent in the row of plots in Figure 2D from individual animals and in the pooled data in Figure 2E. Responses of this type are characteristic of tuned inhibitory [TI] neurons[2]. It is not immediately clear why this feature of the data does not appear in the summary and analysis in Figure 3.

      This difference is indeed captured by the model, which can be more easily appreciated in Fig. 4A where monocular and binocular model simulations are plotted in the same panel. In the text, we also wrote: (Ln 195) “It is apparent that binocular responses cannot be explained by the sum of monocular responses, as binocular responses are substantially lower than the summed monocular responses for both monocular and binocular neurons. Nor can binocular responses be explained by the responses to the preferred eye, as binocular responses are also lower than those to the preferred eye (the larger of the two monocular responses) for monocular neurons.”

      The paper text states that the responses were "first normalized by the median of the binocular responses". This will certainly get rid of this characteristic of the data, but this step needs better justification, or an amendment to the main analysis is needed.

      The relevant sentence has been rewritten as “Monocular and binocular data of each FOV/depth, as well as the pooled data, were first normalized by the respective median of the binocular responses of all neurons in the same FOV/depth.” This normalization would render the overall binocular responses to be around unity, for the purpose of facilitating comparisons among all FOV/depth, but it would not affect the overall characteristic of the data.

      In the present form, the model and analysis do not appear to fit the data in Figure 2 as accurately as needed.

      Thanks for pointing out the problem, as data fitting for FOV C_270 and the pooled data were especially inaccurate. The issue has been mostly fixed when each datum was weighted by its standard deviation (please see the updated Fig. 3).

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zeng and Staley provide a valuable analysis of the molecular requirements for the export of a reporter mRNA that contains a lariat structure at its 5' end in the budding yeast S. cerevisiae. The authors provide evidence that this is regulated by the main mRNA export machinery (Yra1, Mex67, Nab2, Npl3, Tom1, and Mlp1). Of note, Mlp1 has been mainly implicated in the nuclear retention of unspliced pre-mRNA (i.e. quality control), and relatively little has been done to investigate its role in mRNA export in budding yeast.

      Strengths:

      There is relatively little information in the current literature about the nuclear export of splicing intermediates. This paper provides one of the first analyses of this process and dissects the molecular components that promote this form of RNA export. Overall, the strength of the data presented in the manuscript is solid. The paper is well written and the message is clear and of general interest to the mRNA community.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      There are three problems with the paper, although these are not major and likely would not affect the final model as most aspects of the molecular details are confirmed by multiple complementary assays.

      (1) The brG reporter produces both unspliced pre-mRNA and a lariat-containing intermediate RNA. Based on the primer extension assay the authors claim that only 33% of the final product is in pre-mRNA form and that this "is insufficient to account for the magnitude of the cytoplasmic signal from the brG reporter (83%)". Nevertheless, it is possible that primer extension is incomplete or that the lariat-containing RNA is inaccessible for smFISH. The authors could easily perform a dual smFISH experiment (similar to Adivarahan et l., Molecular Cell 2018) where exon 1 is labelled with probes of one color, and the region that overlaps the lariat-containing intermediate is labelled with probes of a second color. If the authors are correct, then one-third of the smFISH foci should have both labels and the rest would have only the second label. This would also confirm that the latter (i.e. the lariat-containing RNAs) are exported to the cytoplasm. Using this approach, the authors could then show that MLP1-depletion (or depletion of any of the other factors) affect(s) one pool of RNAs (i.e. those that are lariat-containing) but not the other (i.e. pre-mRNA). Including these experiments would make the evidence for their model more convincing.

      We appreciate the reviewer’s comments and suggestions. Concerning the primer extension analysis, we are considering alternative assays to quantitate the pre-mRNA and lariat intermediate levels. Concerning the accessibility of the lariat intermediate in smRNA-FISH, in a dbr1∆ strain the only major species from the UAc reporter that is detected by primer extension is the lariat intermediate (Fig. S3), and this reporter is readily detected by smRNA-FISH, indicate that the lariat intermediate is accessible to smRNA-FISH. Concerning discriminating between pre-mRNA and lariat intermediate by smRNA-FISH, we agree with the reviewer that a dual smFISH experiment would directly distinguish between the signals of these species. The brG reporter we used in most smRNA-FISH experiments has a 5’ exon that is too short for smRNA-FISH probes, as is typical of most budding yeast 5’ exons. We have tried to replace the 5’ exon with a longer sequence (GFP) to allow for smRNA-FISH; however, this substitution inhibited splicing. Therefore, to distinguish signals from pre-mRNA versus lariat intermediate, we used additional reporters: G1c and brC reporters, which accumulate pre-mRNA essentially exclusively (Fig. S2A-C), and the UAc reporter, which accumulates lariat intermediate exclusively, in a dbr1∆ strain (Fig. S3). Whereas the mlp1 deletion did not change beta-galactosidase activities of the G1c and brC pre-mRNA-accumulating reporters (Fig. S2E), the mlp1 deletion in a dbr1∆ background did reduce the beta-galactosidase activities of the UAc lariat intermediate-accumulating reporter (Fig. 3D) and did increase smRNA-FISH signal of this reporter in the nucleus (Fig. 3E). These observations corroborate our interpretation based on the brG reporter that Mlp1p is required for efficient export of lariat intermediates but not pre-mRNAs.

      (2) In some cases, the number of smFISH foci appears to change drastically depending on the genetic background. This could either be due to the stochastic nature of mRNA expression between cells or reflect real differences between the genetic backgrounds that could alter the interpretation of the other observations.

      We thank the reviewer for raising this point. We will review our data to distinguish between these possibilities.

      (3) The authors state in the discussion that "the general mRNA export pathway transports discarded lariat intermediates into the cytoplasm". Although this appears to be the case for the reporters that are investigated in this paper, I don't think that the authors should make such a broad sweeping claim. It may be that some discarded lariat intermediates are exported to the cytoplasm while others are targeted for nuclear retention and/or decay.

      The reviewer’s point is well-taken. We will revise the wording accordingly.

      Reviewer #2 (Public Review):

      In this report, Zeng and Staley have used an elegant combination of RNA imaging approaches (single molecule FISH), RNA co-immunoprecipitations, and translation reporters to characterize the factors and pathways involved in the nuclear export of splicing intermediates in budding yeast. Their study notably involves the use of specific reporter genes, which lead to the accumulation of pre-mRNA and lariat species, in a battery of mutants impacting mRNA export and quality control.

      The authors convincingly demonstrate that mRNA species expressed from such reporters are exported to the cytoplasm in a manner depending on the canonical mRNA export machinery (Mex67 and its adaptors) and the nuclear pore complex (NPC) basket (Mlp1). Interestingly, they provide evidence that the export of splicing intermediates requires docking and subsequent undocking at the nuclear basket, a step possibly more critical than for regular mRNAs.

      We thank the reviewer for this overall positive assessment.

      However, their assays do not always allow us to define whether the impacted mRNA species correspond to lariats and/or pre-mRNAs. This is all the more critical since their findings apparently contradict previous reports that supported a role for the nuclear basket in pre-mRNA quality control. These earlier studies, which were similarly based on the use of dedicated yet distinct reporters, had found that the nuclear basket subunit Mlp1, together with different cofactors, prevents the export of unspliced mRNA species. It would be important to clarify experimentally and discuss the possible reasons for these discrepancies.

      It is true that we did not assess export of all reporters in all mutant strains by smFISH; however, we did validate the key conclusion that the export of lariat intermediates requires the nuclear basket gene MLP1: the export of both the brG reporter (mostly lariat intermediate) and the UAc reporter (exclusively lariat intermediate) showed a dependence on MLP1 (Fig. 3). Further, by beta-galactosidase activity, we tested in total five separate reporters – three that accumulated lariat intermediate and two that accumulated exclusively pre-mRNA; only the three reporters accumulating lariat intermediate showed a dependence of export on MLP1 (Fig. 4B,D; Fig S2D); the reporters accumulating pre-mRNA did not show a dependence on MLP1 (Fig. S2E), further validating our main conclusion. We are considering additional experiments to validate this key conclusion even further. Also, see response to comment 1 from reviewer 1.

      We agree that the main conclusion from this manuscript differs from earlier studies. A key difference is that prior studies monitored exclusively pre-mRNA. In our study, we monitored pre-mRNA and lariat intermediate species and in doing so revealed a role for MLP1 in the export of lariat intermediates. This study, our previous study, as well as the previous studies of others have all provided evidence for efficient export of pre-mRNA; all of these studies are in conflict with the studies purporting a general role for the nuclear basked in retaining immature mRNA. Still, these past apparently conflicting studies can be re-interpreted in the context of our model that the export of such species requires docking at the nuclear basket, followed by undocking. In a revised manuscript, we will discuss the possibility that pre-mRNA apparently “retained” by the nuclear basket are stalled in export at the undocking stage.

      Reviewer #3 (Public Review):

      Summary:

      Zeng and Stanley show that in yeast, intron-lariat intermediates that accumulated due to defects in pre-mRNA splicing, are transported to the cytoplasm using the canonical mRNA export pathway. Moreover, they demonstrate that export requires the nuclear basket, a sub-structure of the nuclear pore complex previously implicated with the retention of immature mRNAs. These observations are important as they put into question a longstanding model that the main role of the nuclear basket is to ensure nuclear retention of immature or faulty mRNAs.

      Strengths:

      The authors elegantly combine genetic, biochemical, and single-molecule resolution microscopy approaches to identify the cellular pathway that mediates the cytoplasmic accumulation of lariat intermediates. Cytoplasmic accumulation of such splicing intermediates had been observed in various previous studies but how these RNAs reach the cytoplasm had not yet been investigated. By using smFISH, the authors present compelling, and, for the first time, direct evidence that these intermediates accumulate in the cytoplasm and that this requires the canonical mRNA export pathway, including the RNA export receptor Mex67 as well as various RNA-binding proteins including Yra1, Npl3 and Nab2. Moreover, they show that the export of lariat intermediates, but not mRNAs, requires the nuclear basket (Mlp1) and basket-associated proteins previously linked to the mRNP rearrangements at the nuclear pore. This is a surprising and important observation with respect to a possible function of the nuclear basket in mRNA export and quality control, as it challenges a longstanding model that the role of the basket in mRNA export is primarily to act as a gatekeeper to ensure that immature mRNAs are not exported. As discussed by the authors, their finding suggests a role for the basket in promoting the export of certain types of RNAs rather than retention, a model also supported by more recent studies in mammalian cells. Moreover, their findings also collaborate with a recent paper showing that in yeast, not all nuclear pores contain a basket (PMID: 36220102), an observation that also questioned the gatekeeper model of the basket, as it is difficult to imagine how the basket can serve as a gatekeeper if not all nuclear pore contain such a structure.

      We thank the reviewer for highlighting the importance and surprising nature of our findings.

      Weaknesses:

      One weakness of this study is that all their experiments rely on using synthetic splicing reporter containing a lacZ gene that produces a relatively long transcript compared to the average yeast mRNA.

      We are considering repeating some of our experiments to monitor export of RNAs with more average lengths.

      The rationale for using a reporter containing the brG (G branch point) resulting in more stable lariat intermediates due to them being inefficient substrates for the debranching enzyme Dbr1 could be described earlier in the manuscript, as this otherwise only becomes clear towards the end, what is confusing.

      We thank the reviewer for this comment. We will revise the text to explain sooner the rationale for using the brG reporter to assess the export of lariat intermediates.

      Discussion of their observation in the context that, in yeast, not all pores contain a basket would be useful.

      Thanks for this suggestion. We will raise this point that a nuclear basket is not present on all nuclear pores and discuss the implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes the mechanism of protein disaggregation by the ClpL AAA+ protein of Listeria monocytogenes. Using several model subtrate proteins the authors first show that ClpL possesses a robust disaggregase activity that does not further require the endogenous DnaK chaperone in vitro. In addition, they found that ClpL is more thermostable than the endogenous L. monocytogenes DnaK and has the capacity to unfold tightly folded protein domains. The mechanistic basis for the robust disaggregase activity of ClpL was also dissected in vitro and in some cases, supported by in vivo data performed in chaperonedeficient E. coli strains. The data presented show that the two AAA domains, the pore-2 site and the N-terminal domain (NTD) of ClpL are critical for its disaggregase activity. Remarkably, grafting the NTD of ClpL to ClpB converted ClpB into an autonomous disaggregase, highlighting the importance of such a domain in the DnaK-independent disaggregation of proteins. The role of the ClpL NTD domain was further dissected, identifying key residues and positions necessary for aggregate recognition and disaggregation. Finally, using sets of SEC and negative staining EM experiments combined with conditional covalent linkages and disaggregation assays the authors found that ClpL shows significant structural plasticity, forming dynamic hexameric and heptameric active single rings that can further form higher assembly states via their middle domains.

      Strengths:

      The manuscript is well-written and the experimental work is well executed. It contains a robust and complete set of in vitro data that push further our knowledge of such important disaggregases. It shows the importance of the atypical ClpL N-terminal domain in the disaggregation process as well as the structural malleability of such AAA+ proteins. More generally, this work expands our knowledge of heat resistance in bacterial pathogens.

      Weaknesses:

      There is no specific weakness in this work, although it would have helped to have a drawing model showing how ClpL performs protein disaggregation based on their new findings. The function of the higher assembly states of ClpL remains unresolved and will need further extensive research. Similarly, it will be interesting in the future to see whether the sole function of the plasmid-encoded ClpL is to cope with general protein aggregates under heat stress.

      We thank the reviewer for the positive evaluation. We agree with the reviewer that it will be important to test whether ClpL can bind to and process non-aggregated protein substrates. Our preliminary analysis suggests that the disaggregation activity of ClpL is most relevant in vivo, pointing to protein aggregates as main target.

      We also agree that the role of dimers or tetramers of ClpL rings needs to be further explored. Our initial analysis suggests a function of ring dimers as a resting state. It will now be important to study the dynamics of ClpL assembly formation and test whether substrate presence shifts ClpL assemblies towards an active, single ring state.

      Reviewer #2 (Public Review):

      The manuscript by Bohl et al. is an interesting and carefully done study on the biochemical properties and mode of action of potent autonomous AAA+ disaggregase ClpL from Listeria monocytogenes. ClpL is encoded on plasmids. It shows high thermal stability and provides Listeria monocytogenes food-pathogen substantial increase in resistance to heat. The authors show that ClpL interacts with aggregated proteins through the aromatic residues present in its N-terminal domain and subsequently unfolds proteins from aggregates translocating polypeptide chains through the central pore in its oligomeric ring structure. The structure of ClpL oligomers was also investigated in the manuscript. The results suggest that mono-ring structure and not dimer or trimer of rings, observed in addition to mono-ring structures under EM, is an active species of disaggregase.

      Presented experiments are conclusive and well-controlled. Several mutants were created to analyze the importance of a particular ClpL domain.

      The study's strength lies in the direct comparison of ClpL biochemical properties with autonomous ClpG disaggregase present in selected Gram-negative bacteria and well-studied E. coli system consisting of ClpB disaggregase and DnaK and its cochaperones. This puts the obtained results in a broader context.

      We thank the reviewer for the detailed comments. There are no specific weaknesses indicated in the public review.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript details the characterization of ClpL from L. monocytogenes as a potent and autonomous AAA+ disaggregase. The authors demonstrate that ClpL has potent and DnaKindependent disaggregase activity towards a variety of aggregated model substrates and that this disaggregase activity appears to be greater than that observed with the canonical DnaK/ClpB co-chaperone. Furthermore, Lm ClpL appears to have greater thermostability as compared to Lm DnaK, suggesting that ClpL-expressing cells may be able to withstand more severe heat stress conditions. Interestingly, Lm ClpP can provide thermotolerance to E. coli that have been genetically depleted of either ClpB or in cells expressing a mutant DnaK103. The authors further characterized the mechanisms by which ClpL interacts with protein aggregates, identifying that the N-terminal domain of ClpL is essential for disaggregase function. Lastly, by EM and mutagenesis analysis, the authors report that ClpL can exist in a variety of larger macromolecular complexes, including dimer or trimers of hexamers/heptamers, and they provide evidence that the N-terminal domains of ClpL prevent dimer ring formation, thus promoting an active and substrate-binding ClpL complex. Throughout this manuscript the authors compare Lm ClpL to ClpG, another potent and autonomous disaggregase found in gram-negative bacteria that have been reported on previously, demonstrating that these two enzymes share homologous activity and qualities. Taken together this report clearly establishes ClpL as a novel and autonomous disaggregase.

      Strengths:

      The work presented in this report amounts to a significant body of novel and significant work that will be of interest to the protein chaperone community. Furthermore, by providing examples of how ClpL can provide in vivo thermotolerance to both E. coli and L. gasseri the authors have expanded the significance of this work and provided novel insight into potential mechanisms responsible for thermotolerance in food-borne pathogens.

      Weaknesses:

      The figures are clearly depicted and easy to understand, though some of the axis labeling is a bit misleading or confusing and may warrant revision. While I do feel that the results and discussion as presented support the authors' hypothesis and overall goal of demonstrating ClpL as a novel disaggregase, interpretation of the data is hindered as no statistical tests are provided throughout the manuscript. Because of this only qualitative analysis can be made, and as such many of the concluding statements involving pairwise comparisons need to be revisited or quantitative data with stats needs to be provided. The addition of statistical analysis is critical and should not be difficult, nor do I anticipate that it will change the conclusions of this report.

      We thank the reviewer for the valid criticism. We addressed the major concern of the reviewer and added the requested statistical analysis to all relevant figures. The analysis confirms our conclusions. We also followed the advice of the reviewer and revised axis labeling to increase clarity.

      Reviewer #1 (Recommendations For The Authors):

      • It would really help to have a model showing how ClpL performs protein disaggregation based on their findings.

      We show that ClpL exerts a threading activity that is fueled by ATP hydrolysis in both AAA domains and executed by pore-located aromatic residues. The basic disaggregation mechanism of ClpL therefore does not differ from ClpB and ClpG disaggregases. Similarly, the specificity of ClpL towards protein aggregates is based on simultaneous interactions of multiple N-terminal domains with the aggregate surface. We could recently describe a similar mode of aggregate recognition for ClpG [1]. We therefore prefer not to add a model to the manuscript. We are currently in preparation of a review that includes the characterization of the novel bacterial disaggregases and will present models there as we consider a review article as more appropriate for such illustrations.

      • AAA2 domain of ClpL in Fig 3E should be the same color as in Fig 1A.

      We used light grey instead of dark grey for the ClpL AAA2 domain in Fig 3E, to distinguish between ClpL and ClpB AAA domains. This kind of illustration allows for clearer separation of both AAA+ proteins and the fusion construct LN-ClpB*. We therefore prefer keeping the color code.

      • Partial suppression of the dnaK mutant could be added in the main manuscript Figure.

      The main figure 3 is already very dense and we therefore prefer showing respective data as part of a supplementary figure.

      • It would have been interesting to know if the robust autonomous disaggregation activity of ClpL would be sufficient to rescue the growth of more severe E. coli chaperone mutants, like dnaK tig for example. Did the authors test this?

      We tested whether expression of clpL can rescue growth of E. coli dnaK103 mutant cells at 40°C on LB plates. This experiment is different from the restoration of heat resistance in dnaK103 cells (Figure 3, figure supplement 2A), as continuous growth at elevated temperatures (40°C) is monitored instead of cell survival upon abrupt severe heat shock (49°C). We did not observe rescue of the temperature-sensitive growth phenotype (40°C) of dnaK103 cells upon clpL expression, though expression of clpG complemented the temperature-sensitive growth phenotype (see Author response image 1 below). This finding points to differences in chaperone activities of ClpL and ClpG. It also suggests that ClpL activity is largely restricted to heat-shock generated protein aggregates, enabling ClpL to complement the missing disaggregation function of DnaK but not other Hsp70 activities including folding and targeting of newly synthesized proteins. We believe that dissecting the molecular reasons for differences in ClpG and ClpL complementation activities should be part of an independent study and prefer showing the growth-complementation data only in the response letter.

      Author response image 1.

      Serial dilutions (10-1 – 10-6) of E. coli dnaK103 mutant cells expressing E. coli dnaK, L. monocytogenes clpL or P. aeruginosa clpG were spotted on LB plates including the indicated IPTG concentrations. Plates were incubated at 30°C or 40°C for 24 h. p: empty vector control.

      Reviewer #2 (Recommendations For The Authors):

      Based on results presented in Fig. 2B the authors conclude "that stand-alone disaggregases ClpL and ClpG but not the canonical KJE/ClpB disaggregase exhibit robust threading activities that allow for unfolding of tightly folded domains" (page 5 line 209). In this experiment, the threading power of disaggregases was assessed by monitoring YFP fluorescence during the disaggregation of aggregates formed by fusion luciferase-YFP protein. In my opinion, the results of the experiment depend not only on the threading power of disaggregases but also on the substrate recognition by analyzed disaggregating systems and/or processivity of disaggregases. N-terminal domain in the case of ClpL and KJE chaperones in the case of the KJE/ClpB system are involved in recognition. This is not discussed in the manuscript and the obtained result might be misinterpreted. The authors have created the LN-ClpB* construct (N-terminal domain of ClpL fused to derepressed ClpB) (Fig. 3 E and F). In my opinion, this construct should be used as an additional control in the experiment in Fig. 2 B. It possesses the same substrate recognition domain and therefore the direct comparison of disaggregases threading power might be possible.

      We performed the requested experiment (new Figure 3 - figure supplement 2D). We did not observe unfolding of YFP by LN-ClpB. Sínce ClpL and LN-ClpB do not differ in their aggregate targeting mechanisms, this finding underlines the differences in threading power between ClpL and activated (derepressed) ClpB. It also suggests that the AAA threading motors and the aggregate-targeting NTD largely function independently.

      Presented results suggest that tetramer and dimer of rings might be a "storage form" of disaggregase. It would be interesting to analyze the thermotolerance and/or phenotype of ClpL mutants that do not form tetramer and dimer (E352A). This variant possesses similar to WT disaggregation activity but does not form dimers and tetramers. If in vivo the differences are observed (for example toxicity of the mutant), the "storage form" hypothesis will be probable.

      When testing expression of clpL-MD mutants (E352A, F354A), which cannot form dimers and tetramers of ClpL rings, in E. coli ∆clpB cells, we observed reduced production levels as compared to ClpL wildtype and speculated that reduced expression might be linked to cellular toxicity. We therefore compared spotting efficiencies of E. coli ∆clpB cells expression clpL, ∆NclpL or the clpL-MD mutants at different temperatures. Expression of clpL at high levels abrogated colony formation at 42°C (new Figure 6 - figure supplement 3). ClpL toxicity was dependent on its NTD as no effect was observed upon expression of ∆N-clpL. ClpL-MD mutants (E352A, F354A) were expressed at much lower levels and exhibited strongly increased toxicity as compared to ClpL-WT when produced at comparable levels (new Figure 6 – figure supplement 3). This implies a protective role of ClpL ring dimers and tetramers in the cellular environment by downregulating ClpL activity. We envision that the formation of ClpL assemblies restricts accessibility of the ClpL NTDs and reduces substrate interaction. Increased toxicity of ClpL-E352A and ClpL-F354A points to a physiological relevance of the dimers and tetramers of ClpL rings and is in agreement with the proposed function as storage forms. We added this potential role of ClpL ring assemblies to the discussion section. Due to the strongly reduced production levels of ClpL MD mutants and their enhanced toxicity at elevated temperatures we did not test for their ability to restore thermotolerance in E. coli ∆clpB cells.

      Figure 6G and Figure 6 -figure supplement 2 - it is not clear what is the difference in the preparation of WT and WTox forms of ClpL.

      ClpL WT was purified under reduced conditions (+ 2 mM DTT), whereas WTox was purified in absence of DTT, thus serving as control for ClpL-T355C, which forms disulfide bonds upon purification without DTT. We have added respective information to the figure legend and the materials and methods section.

      Page 5 line 250 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2A should be Figure 3 - Figure Supplement 2A.

      Page 5 line 251 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2B/C should be Figure 3 - Figure Supplement 2B/C.

      Page 7 line 315 - wrong figure citation. Instead of Figure 4F, it should be Figure 4G Figure 1 - Figure Supplement 2E - At first glance, this Figure does not correspond to the text and is confusing. It would be nice to have bars for Lm ClpL activity in the figure. Alternatively, the description of the y-axis might be changed to "relative to Lm ClpL disaggregation activity" instead of "relative disaggregation activity". One has to carefully read the figure legend to find out that 1 corresponds to Lm ClpL activity.

      We have corrected all mistakes and changed the description of y-axis (Figure 1 - figure Supplement 2E) as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) While the authors make many experimental comparisons throughout their study, no statistical tests are described or presented with their results or figures, nor are these statistical tests described in the methods. While the data as presented does appear to support the author's conclusions, without these statistical tests no meaningful conclusions from paired analysis can be drawn. Critically, please report these statistical tests. As a general suggestion please include the statistics (p-values) in the results section when presenting this data, as well as in the figure legends, as this will allow the reader to better understand the authors' presentation and interpretation of the data.

      We have added statistical tests to all relevant figures. The analysis is confirming our former statements. We have further clarified our approach for the statistical analysis in the methods section. We report p-values in the results section, however, due to the volume of comparisons we did not add individual p-values to the figure legends but used standard labeling with stars.

      (2) Some of the axis labels for the presented graphs are a bit misleading or confusing. Many describe a relative (%) disaggregation rate, but it is not clear from the methods or figure legends what this rate is relative to. Is it relative to non-denatured substrates, to no chaperone conditions, etc.? Is it possible to present the figures with the raw data rates/activity (ex. luciferase activity / time) vs. relative rates? I think that labeling these figure axes with "disaggregation rate" is a bit misleading as none of these experiments measure the actual rate of disaggregation of these model substrates per se (say by SEC-MALS or other biophysical measurements), but instead infer the extent of disaggregation by measuring a property of these substrates, i.e. luciferase activity or fluorescence intensity over time. Thus, labeling these figures with the appropriate axis for what is being measured, and then clarifying in the methods and results what is being inferred by these measurements, will help solidify the author's conclusions.

      Relative (%) disaggregation rate usually refers to the disaggregation activity of ClpL wildtype serving as reference. We clarified this point in the revised text and respective figure legends. We now also refer to the process measured (e.g. relative refolding activity of aggregated Luciferase instead of relative disaggregation activity) as suggested by the reviewer and added clarifications to text and materials and methods.

      Since we have many measurements for our most frequently used assays and have a reasonable estimate for the general variance within these assays, we found it reasonable to show activity data in relation to fixed controls. This reduces the impact of unspecific variance and thereby makes more accurate comparisons between different repetitions. The reference is now indicated in the axis title.

      (3) The figures are well presented, clutter-free, and graphically easy to understand. Figure legends have sufficient information aside from the aforementioned statistical information and should include the exact number of independent replicates for each panel/experiment (ex. n=4), not just a greater than 3. While the figures do show each data point along with the mean and error, in some figures it is difficult to determine the number of replicate data points. Example figures 2c, 2d, and 3a. Also, please state whether the error is std. error or SEM.

      While we agree, that this is valuable information, we fear that overloading the figure legends with information may take a toll on the readability. We therefore decided to append the number of replicates for each experiment in a separate supplementary table (Table S2). The depicted error is showing the SD and not the SEM, which we also specified in the figure legends.

      (4) There are various examples throughout the results where qualitative descriptors are used to describe comparisons. Examples of this are "hardly enhanced" (Figure 1) and "partially reduced" (Figure 6). While this is not necessarily wrong, qualitative descriptions of comparisons in this manner would require further explanation. What is the definition of "hardly" or "partially"? My recommendation is to just state the data quantitatively, such as "% enhanced" or "reduced by x", this way there is no misinterpretation. Examples of this can be found in Figures 6C-G. This would require a full statistical overview and presentation of these stats in the results.

      We followed the reviewer`s advice and no longer use the terms criticized (e.g. “hardly enhanced”). We instead provide the requested quantifications in the text.

      Questions for Figures:

      Figures 1B and 1C:

      (1) Is the disaggregase activity of ClpL towards heat-denatured luciferase and GFP ATPdependent? While the authors later in the manuscript show that mutations within the Walker B domains dramatically impair reactivation (disaggregation) of denatured luciferase, this does not rule out an ATP-independent effect of these mutations. Thus, the authors should test whether disaggregase activity is observed when wild-type ClpL is incubated with denatured substrates without ATP present or in the presence of ADP only.

      We tested for ClpL disaggregation activity in absence of nucleotide and presence of ADP only (new Figure 1 – figure supplement 2A). We did not observe any activity, demonstrating that ClpL activity depends on ATP binding and hydrolysis (see also Figure 3 – figure supplement 1D: ATPase-deficient ClpL-E197A/E530A is lacking disaggregation activity).

      (2) The authors suggest that a reduction in disaggregase activity observed in samples combining Lm ClpL and KJE (Figure 1C, supp. 1C-E) could be due to competition for protein aggregate binding as observed previously with ClpG. Did the authors test this directly by pulldown assay or another interaction-based assay? While ClpL and ClpG appear to work in a similar manner, it would be good to confirm this. Also, clarification on how this competition operates would be useful. Is it that ClpL prevents aggregates from interacting with KJE, or vice versa?

      We probed for binding of ClpL to aggregated Malate Dehydrogenase in the presence of L. monocytogenes or E. coli Hsp70 (DnaK + respective J-domain protein DnaJ) by a centrifugation-based assay. Here, we used the ATPase-deficient ClpL-E197A/E530A (ClpLDWB) mutant, ensuring stable substrate interaction in presence of ATP. We observe reduced binding of ClpL-DWB to protein aggregates in presence of DnaK/DnaJ (new Figure 1 – figure supplement 2G). This finding indicates that both chaperones compete for binding to aggregated proteins and explains inhibition of ClpL disaggregation activity in presence of Hsp70.

      (3) Related to the above, while incubation of aggregated substrates with ClpL and KJE does appear to reduce aggregase activity towards GFP (Figure 1c), α-glucosidase (Supp. 1C), and MDH (Supp. 1D), this doesn't appear to be the case towards luciferase (Figure 1b, Supp. 1b). Furthermore, ClpL aggregase activity is reduced towards luciferase when combined with E. coli KJE (Supp. 1e) but not with Lm KJE (Figure 1b). The authors provide no commentary or explanation for these observations. Furthermore, these results complicate the concluding statement that "combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity ... ".

      We suggest that the differing inhibitory degrees of the KJE system on ClpL disaggregation activities reflect diverse binding affinities of KJE and ClpL to the respective aggregates. While we usually observe strong inhibition of ClpL activity in presence of KJE, this is different for aggregated Luciferase. This points to specific structural features of Luciferase aggregates or the presence of distinct binding sites on the aggregate surface that favour ClpL binding. We have added a respective comment to the revised manuscript.

      The former statement that “combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity” referred to aggregated GFP, MDH and α-Glucosidase for which a strong inhibition of ClpL activity was observed. We have specified this point.

      Figures 1D and 1E:

      (1) The authors conclude that the heat sensitivity of ΔClpL L. gasseri cells is because they do not express the canonical ClpB disaggregase. A good test to validate this would be to express KJE/ClpB in these Lg ΔClpL cells to see if heat-sensitivity could be fully or partially rescued.

      We agree that such experiment would further strengthen the in vivo function of ClpL as alternative disaggregase. However, such approach would demand for co-expression of E. coli ClpB with the authentic E. coli DnaK chaperone system (KJE), as ClpB and DnaK cooperate in a species-specific manner [2-4]. This makes the experiment challenging, also because the individual components need to be expressed at a correct stochiometry. Furthermore, the presence of the authentic L. gasseri KJE system, which is likely competing with the E. coli KJE system for aggregate binding, will hamper E. coli KJE/ClpB disaggregation activity in L. gasseri. In view of these limitations, we would like to refrain from conducting such an experiment.

      (2) The rationale for investigating Lg ClpL, and the aggregase activity assays are compelling and support the hypothesis that ClpL contributes to thermotolerance in multiple grampositive species. Though, from Figure 1d, why was only Lg ClpL investigated? It appears that S. thermophilus also lacks the canonical ClpB disaggregase and demonstrates ΔClpL heat sensitivity. There is also other Lactobacillus sp. presented that lack ClpB but were not tested for heat sensitivity. Why only test and move forward with L. gasseri? Lastly, L. mesenteroides is ClpB-negative but doesn't demonstrate ΔClpL heat sensitivity. Why?

      We wanted to document high, partner-independent disaggregation activity for another ClpL homolog. We chose L. gasseri, as (i) this bacterial species lacks a ClpB homolog and (ii) a ∆clpL mutant exhibit reduced survival upon severe heat shock (thermotolerance phenotype), which is associated with defects in cellular protein disaggregation. The characterization of L. gasseri ClpL as potent disaggregase in vitro represents a proof-of-concept and allows to generalize our conclusion. We therefore did not further test S. thermophilus ClpL. L. mesenteroides encodes for ClpL but not ClpB, yet, a ∆clpL mutant has not yet been characterized in this species to the best of our knowledge. As we wanted to link ClpL in vitro activity with an in vivo phenotype, we did not characterize L. mesenteroides ClpL.

      We agree with the reviewer that the characterization of additional ClpL homologs is meaningful and interesting, however, we strongly believe that such analysis should be part of an exhaustive and independent study.

      Figures 2A and 2B:

      (1) Figure 2B demonstrates that both ClpL and ClpG, but not the canonical KJE/ClpB, are able to unfold YFP during the luciferase disaggregation process, suggesting that ClpL and ClpG exhibit stronger threading activity. A technical question, can luciferase activity be measured alongside in the same assay sample? If so, would you expect to observe a concomitant increase in luciferase activity as YFP fluorescence decreases?

      KJE/ClpB can partially disaggregate and refold aggregated Luciferase-YFP without unfolding YFP during the disaggregation reaction [5]. YFP unfolding is therefore not linked to refolding of aggregated Luciferase-YFP. On the other hand, unfolding of YFP during disaggregation can hamper the refolding of the fused Luciferase moiety as observed for the AAA+ protein ClpC in presence of its partner MecA [5]. These diverse effects make the interpretation of LuciferaseYFP refolding experiments difficult as the degree of YFP unfolding activity does not necessarily correlate with the extend of Luciferase refolding. We therefore avoided to perform the suggested experiment.

      Figure 2C and 2D:

      (1) Thermal shift assays for ClpL, ClpG, and DnaK were completed with various nucleotides. Were these experiments also completed with samples in their nucleotide-free apo state? Also, while all these chaperones are ATPases, the nucleotides used differ, but no explanation is provided. Comparison should be made of these ATPases bound to the same molecules.

      We did not monitor thermal stabilities of chaperones without nucleotide as such state is likely not relevant in vivo. We used ATPγS in case of ClpL to keep the AAA+ protein in the ATPconformation. ATP would be rapidly converted to ADP due to the high intrinsic ATPase activity of ClpL. In case of DnaK ATPγS cannot be used as it does not induce the ATP conformation [6]. The low intrinsic ATPase activity of DnaK allows determining the thermal stability of its ATP conformation in presence of ATP. This is confirmed by calculating a reduced thermal stability of ADP-bound DnaK.

      (2) The authors suggest that incubation at 55⁰C will cause unfolding of Lm DnaK, but not ClpL, providing ClpL-positive Lm cells disaggregase activity at 55⁰C. While the thermal shift assays in Figures 2C and 2D support this, an experiment to test this would be to heat-treat Lm DnaK and ClpL at 55⁰C then test for disaggregase activity using either aggregated luciferase or GFP as in Figure 1.

      We followed the suggestion of the reviewer and incubated Lm ClpL and DnaK at 55-58°C in presence of ATP for 15 min prior to their use in disaggregation assays. We compared the activities of pre-heated chaperones with controls that were incubated at 30°C for 15 min. Notably, we did not observe a loss of DnaK disaggregation activity, suggesting that thermal unfolding of DnaK at this temperature is reversible. We provide these data as Figure 2 -figure supplement 1 and added a respective statement to the revised manuscript.

      Figure 3B:

      (1) The authors state that ATPase activity of ΔN-ClpL was "hardly affected", but from the data provided it appeared to result in an approximate 35% reduction. As discussed above, no stats are provided for this figure, but given the error bars, it is highly likely that this reduction is significant. Please perform this statistical test, and if significant, please reflect this in the written results as well as the figure. Lastly, if this reduction in ATPase activity is significant, why would this be so, and could this contribute to the reduction in aggregase activity towards luciferase and MDH observed in Figure 3A?

      We applied statistical tests as suggested by the reviewer, showing that the reduction in ATPase activity of ∆N-ClpL is statistically significant. N-terminal domains of Hsp100 proteins can modulate ATPase activity as shown for the family member ClpB, functioning as auxiliary regulatory element for fine tuning of ClpB activity [7]. We speculate that the impact of the ClpL-NTD on the assembly state (stabilization of ClpL ring dimers) might affect ClpL ATPase activity. We would like to point out that other ClpL mutants (e.g. NTD mutant ClpL-Y51A; MDmutant ClpL-F354A) have a similarly reduced ATPase activity, yet exhibit substantial disaggregation activity (approx. 2-fold reduced compared to ClpL wildtype). In contrast ∆NClpL does not exhibit any disaggregation activity. This suggests that the loss of disaggregation activity is caused by a substrate binding defect but not by a partial reduction in ATPase activity. We added a comment on the reduced ATPase activity and also discuss its potential reasons in the discussion section.

      (2) I think the authors' conclusion that deletion of the ClpL NTD does not contribute to structural defects of ClpL is premature given the apparent reduction in ATPase activity. Did the authors perform any biophysical analysis of ΔN-ClpL to confirm this conclusion? Thermal shift assays, Native-PAGE, or size-exclusion chromatography for aggregates would all be good assays to demonstrate that the wild-type and ΔN-ClpL have similar structural properties. Surprisingly, Figure 6 describes significant macromolecular changes associated with ΔN-ClpL such that it preferentially forms a dimer of rings. Furthermore, in Supp. Figure 6D the authors report that ΔN-ClpL appears to have an increased Tm as compared to WT- or ΔM-ClpL. The authors should reflect these observations as deletion of the ClpL NTD does appear to contribute to structural changes, though perhaps only at the macromolecular scale, i.e. dimerization of the rings.

      We have characterized the oligomeric state of ∆N-ClpL by size exclusion chromatography (Figure 6 – figure supplement 1A) and negative staining electron microscopy (Figure 6C), both showing that it forms assemblies similar to ClpL wildtype. We did not observe an increased tendency of ∆N-ClpL to form aggregates and the protein remained fully soluble after several cycles of thawing and freezing. EM data reveal that ∆N-ClpL exclusively form ring dimers, suggesting that the NTDs destabilize MD-MD interactions. The stabilized interaction between two ∆N-ClpL rings can explain the increased thermal stability (Figure 6 – figure supplement 1D). We speculate that the ClpL NTDs either affect MD-MD interactions through steric hindrance or by directly contacting MDs. We have added a respective statement to the discussion section.

      Figure 3C and 3D:

      (1) Given the larger error in samples expressing ClpG (100) or ClpL (100) statistical analysis with p-values is required to make conclusions regarding the comparison of these samples vs. plasmid-only control. The effect of ΔN-ClpL vs. wild-type ClpL looks compelling and does appear to attenuate the ClpL-induced thermotolerance. This is nicely demonstrated in Figure 3D.

      We quantified respective spot tests (new Figure 3E) and tested for statistical significance as suggested by the reviewer. We show that restoration of heat resistance is significant for the first 30 min. While we always observe rescue at later timepoints significance is lost here due to larger deviations in the number of viable cells and thus the degree of complementation.

      Figure 3F:

      (1) What is the role of the ClpB NTD? It appears to be dispensable for disaggregase activity, assuming that ClpB is co-incubated with KJE. A quick explanation of this domain in ClpB could be useful.

      The ClpB NTD is not required for disaggregation activity, as ClpB is recruited to protein aggregates by DnaK, which interacts with the ClpB MDs. Still, two functions have been described for the ClpB NTD. First, it can bind soluble unfolded substrates such as casein [8]. This substrate binding function can increase ClpB disaggregation activity towards some aggregated model substrates (e.g. Glucose-6-phosphate dehydrogenase) [9]. However, NTD deletion usually does not decrease ClpB disaggregation activity and can even lead to an increase [7, 10, 11]. An increased disaggregation activity of ∆N-ClpB correlates with an enhanced ATPase activity, which is explained by NTDs stabilizing a repressing conformation of the ClpB MDs, which function as main regulators of ClpB ATPase activity [7]. We added a short description on the role of the ClpB NTD to the respective results section.

      (2) The result of fusing the ClpL NTD to ClpB supports a role for this NTD in promoting autonomous disaggregase activity. What would you expect to observe if the fused Ln-ClpB protein was co-incubated with KJE? Would this further promote disaggregase activity, or potentially impair through competition? This experiment could potentially support the authors' hypothesis that ClpL and ClpB/KJE can compete with each other for aggregated substrates as suggested in Figure 1.

      We have performed the suggested experiment using aggregated MDH as model substrate. We did not observe an inhibition of LN-ClpB disaggregation activity in presence of KJE. In contrast ClpL disaggregation activity towards aggregated MDH is inhibited upon addition of KJE due to competition for aggregate binding (Figure 1 – figure supplement 2D/F). Disaggregation activity of LN-ClpB in presence of KJE can be explained by functional cooperation between both chaperone systems, which involves interactions between aggregate-bound DnaK and the ClpB MDs of the LN-ClpB fusion construct. We prefer showing these data only in the response letter but not including them in the manuscript, as respective results distract from the main message of the LN-ClpB fusion construct: the ClpL NTD functions as autonomous aggregatetargeting unit that can be transferred to other Hsp100 family members.

      Author response image 2.

      LN-ClpB cooperates with DnaK in protein disaggregation. Relative MDH disaggregation activities of indicated disaggregation systems were determined. KJE: DnaK/DnaJ/GrpE. The disaggregation activity of Lm ClpL was set to 1. Statistical Analysis: Oneway ANOVA, Welch’s Test for post-hoc multiple comparisons. Significance levels: **p < 0.001. n.s.: not significant.

      Figures 4E and 4F:

      (1) While the effect of various NTD mutations follows a similar trend in regard to the impairment of ClpL-mediated disaggregation of luciferase and MDH, the degree of these effects does appear different. For example, patch A and C mutations reduce ClpL disaggregase activity towards luciferase (~60% / 50% reduction) vs. MDH (>90%) respectively. While these results do suggest a critical role for residues in patches A and C of ClpL, these substrate-specific differences are not discussed. Why would we expect a difference in the effect of these patch A/C ClpL mutations on different substrates?

      We speculate that the aggregate structure and the presence or distributions of ClpL NTD binding sites differ between aggregated Luciferase and MDH. A difference between both aggregated model substrates was also observed when testing for an inhibitory effect of Lm KJE (and Ec KJE) on ClpL disaggregation activity (see comment above). We speculate that the mutated NTD residues make specific contributions to aggregate recognition. The severity of binding defects (and reduction of disaggregation activities) of these mutants will depend on specific features of the aggregated model substrates. We now point out that ClpL NTD patch mutants can differ in disaggregation activities depending on the aggregated model substrate used and refer to potential differences in aggregate structures.

      (2) The authors suggest that the loss of disaggregation activity of selected NTD mutants could be linked to reduced binding to aggregated luciferase. While this is likely given that these mutations do not appear to affect ATPase activity (Supp. 4), it could be possible that these mutants can still bind to aggregated luciferase and some other mechanism may impair disaggregation. A pull-down assay would help to prove whether reduced binding is observed in these NTD ClpL mutants. This also needs to be confirmed for Supp. Figure 4.2H.

      We have shown a strong correlation between loss of aggregate binding and disaggregation activity for several NTD mutants (Fig. 4G, Figure 4 – figure supplement 2H). We decided to perform the aggregate binding assay only with mutants that show a full but not a partial disaggregation defect as we made the experience that the centrifugation-based assay provides clear and reproducible results for loss-of-activity mutants but has limitations in revealing differences for partially affected mutants. This might be explained by the use of nonhydrolyzable ATPγS in these experiments, which strongly stabilizes substrate interactions, potentially covering partial binding defects. We agree with the reviewer that some ClpL NTD mutants might have additional effects on disaggregation activity by e.g. controlling substrate transfer to the processing pore site. We have added a respective comment to the revised manuscript.

      (3) Supp. Figure 4.2H has no description in the figure legend. The Y-axes states % aggregate bound to chaperone. How was this measured? See the above comments for Figures 4E and 4F.

      We apologize and added the description to the figure legend. The determination of % aggregate bound chaperone is based on the quantifications of chaperones present in the supernatant and pellet fractions after sample centrifugation. Background levels of chaperones in the pellet fractions in absence of protein aggregates were subtracted. We added this information to the materials and methods section.

      Figure 6G:

      The authors observed reduced disaggregase activity and ATPase activity of mutant T355C under both oxidative and reducing conditions. While this observation under oxidative conditions supports the authors' hypothesis, under reducing conditions (+DTT) we would expect the enzyme to behave similarly to wild-type ClpL unless this mutation has other effects. Can the authors please comment on this and provide an explanation or hypothesis?

      The reviewer is correct, ClpL-T355C exhibit a reduced disaggregation activity (Figure 6 – figure supplement 2B). We observe a similar reduction in disaggregation activity for the ClpL MD mutant F354A, pointing to an auxiliary function of the MD in protein disaggregation. We have made a respective comment in the discussion section of the revised manuscript. How exactly ClpL MDs support protein disaggregation is currently unclear and will be subject of future analysis in the lab. We strongly believe that such analysis should be part of an independent study.

      Discussion:

      In the fourth feature, it is discussed that one disaggregase feature of ClpL is that it does not cooperate with the ClpP protease. While a reference is provided for the canonical ClpB, no data in this paper, nor a reference, is provided demonstrating that ClpL does not interact with ClpP. As discussed, it is highly unlikely that ClpL interacts with ClpP given that ClpL does not contain the IGL/F loops that mediate the interaction of ClpP with cochaperones, such as ClpX, but data or a reference is needed to make such a factual statement.

      The absence of the IGL/F loop makes an interaction between ClpL and ClpP highly unlikely. However, the reviewer is correct, direct evidence for a ClpP-independent function of ClpL, though very likely, is not provided. We have therefore rephrased the respective statement: “Forth, novel disaggregases lack the specific IGL/F signature motif, which is essential for cooperation of other Hsp100 proteins with the peptidase ClpP. This feature is shared with the canonical ClpB disaggregase [12] suggesting that protein disaggregation is primarily linked to protein refolding.”.

      References

      (1) Katikaridis P, Simon B, Jenne T, Moon S, Lee C, Hennig J, et al. Structural basis of aggregate binding by the AAA+ disaggregase ClpG. J Biol Chem. 2023:105336.

      (2) Glover JR, Lindquist S. Hsp104, Hsp70, and Hsp40: A novel chaperone system that rescues previously aggregated proteins. Cell. 1998;94:73-82.

      (3) Krzewska J, Langer T, Liberek K. Mitochondrial Hsp78, a member of the Clp/Hsp100 family in Saccharomyces cerevisiae, cooperates with Hsp70 in protein refolding. FEBS Lett. 2001;489:92-6.

      (4) Seyffer F, Kummer E, Oguchi Y, Winkler J, Kumar M, Zahn R, et al. Hsp70 proteins bind Hsp100 regulatory M domains to activate AAA+ disaggregase at aggregate surfaces. Nat Struct Mol Biol. 2012;19:1347-55.

      (5) Haslberger T, Zdanowicz A, Brand I, Kirstein J, Turgay K, Mogk A, et al. Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments. Nat Struct Mol Biol. 2008;15:641-50.

      (6) Theyssen H, Schuster H-P, Bukau B, Reinstein J. The second step of ATP binding to DnaK induces peptide release. J Mol Biol. 1996;263:657-70.

      (7) Iljina M, Mazal H, Goloubinoff P, Riven I, Haran G. Entropic Inhibition: How the Activity of a AAA+ Machine Is Modulated by Its Substrate-Binding Domain. ACS chemical biology. 2021;16:775-85.

      (8) Rosenzweig R, Farber P, Velyvis A, Rennella E, Latham MP, Kay LE. ClpB N-terminal domain plays a regulatory role in protein disaggregation. Proc Natl Acad Sci U S A. 2015;112:E6872-81.

      (9) Barnett ME, Nagy M, Kedzierska S, Zolkiewski M. The amino-terminal domain of ClpB supports binding to strongly aggregated proteins. J Biol Chem. 2005;280:34940-5.

      (10) Beinker P, Schlee S, Groemping Y, Seidel R, Reinstein J. The N Terminus of ClpB from Thermus thermophilus Is Not Essential for the Chaperone Activity. J Biol Chem. 2002;277:47160-6.

      (11) Mogk A, Schlieker C, Strub C, Rist W, Weibezahn J, Bukau B. Roles of individual domains and conserved motifs of the AAA+ chaperone ClpB in oligomerization, ATP-hydrolysis and chaperone activity. J Biol Chem. 2003;278:15-24.

      (11) Weibezahn J, Tessarz P, Schlieker C, Zahn R, Maglica Z, Lee S, et al. Thermotolerance Requires Refolding of Aggregated Proteins by Substrate Translocation through the Central Pore of ClpB. Cell. 2004;119:653-65.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors identified that genetically and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.<br /> Strengths:

      The study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia. Overall, the article it's well written and clear.<br /> Weaknesses:

      Many of the experiments confirmed previous published data, which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line. The mechanistic insights of how the increased amount of long ceramides (cer c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed.

      We thank the reviewer for the assessment and would like to point out that Cers1 had not previously been studied in the context of aging. Moreover, our unbiased pathway analyses in human skeletal muscle implicate CERS1 for the first time with myogenic differentiation, which we validate in cell culture systems. To improve mechanistic insights, as suggested by Reviewer #1, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. Hence, we believe that reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition, sphingosine is forced towards the production of other, potentially less toxic or myogenesis-impairing ceramides. We added these new data to the revised manuscript as new Fig 5D-E and new Fig S5G-I.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wohlwend et al. investigates the implications of inhibiting ceramide synthase Cers1 on skeletal muscle function during aging. The authors propose a role for Cers1 in muscle myogenesis and aging sarcopenia. Both pharmacological and AAV-driven genetic inhibition of Cers1 in 18month-old mice lead to reduced C18 ceramides in skeletal muscle, exacerbating age-dependent features such as muscle atrophy, fibrosis, and center-nucleated fibers. Similarly, inhibition of the Cers1 orthologue in C. elegans reduces motility and causes alterations in muscle morphology.<br /> Strengths:

      The study is well-designed, carefully executed, and provides highly informative and novel findings that are relevant to the field.

      Weaknesses:

      The following points should be addressed to support the conclusions of the manuscript.

      (1) It would be essential to investigate whether P053 treatment of young mice induces age-dependent features besides muscle loss, such as muscle fibrosis or regeneration. This would help determine whether the exacerbation of age-dependent features solely depends on Cers1 inhibition or is associated with other factors related to age- dependent decline in cell function. Additionally, considering the reported role of Cers1 in whole-body adiposity, it is necessary to present data on mice body weight and fat mass in P053treated aged-mice.

      We thank the reviewer to suggest that we study Cers1 inhibition in young mice. In fact, a previous study shows that muscle-specific Cers1 knockout in young mice impairs muscle function (PMID: 31692231). Similar to our observation, these authors report reduced muscle fiber size and muscle force. Therefore, we do not believe that our observed effects of Cers1 inhibition in aged mice are specific to aging, although the phenotypic consequences are accentuated in aged mice. As requested by the reviewer, we attached the mice body weights and fat mass (Author response image 1A-B). The reduced fat mass upon P053 treatment is in line with previously reported reductions in fat mass in chow diet or high fat diet fed young mice upon Cers1 inhibition (PMID: 30605666, PMID: 30131496), again suggesting that the effect of Cers1 inhibition might not be specific to aging.

      Author response image 1.

      (A-B) Body mass (A) and Fat mass as % of body mass (B) were measured in 22mo C57BL/6J mice intraperitoneally injected with DMSO or P053 using EchoMRI (n=7-12 per group). (C-D) Grip strengh measurements in all limbs (C) or only the forelimbs (D) in 24mo C57BL/6J mice intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (n=8 per group). (E-F) Pax7 gene expression in P053 or AAV9 treated mice (n=6-7 per group) (E), or in mouse C2C12 muscle progenitor cells treated with 25nM scramble or Cers1 targeting shRNA (n=8 per group) (F). (G) Proliferation as measured by luciferase intensity in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=24 per group). Each column represents one biological replicate. (H) Overlayed FACS traces of Annexin-V (BB515, left) and Propidium Iodide (Cy5, right) of mouse C2C12 muscle myotubes treated with 25nM scramble or Cers1 targeting shRNA (n=3 per group). Quantification right: early apoptosis (Annexin+-PI-), late apoptosis (Annexin+-PI+), necrosis (Annexin--PI+), viability (Annexin--PI-). (I) Normalized Cers2 gene expression in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=6-7 per group). (J-K) Representative mitochondrial respiration traces of digitonin-permeablized mouse C2C12 muscle muscle cells treated DMSO or P053 (J) with quantification of basal, ATP-linked, proton leak respiration as well as spare capacity and maximal capacity linked respiration (n=4 per group). (L) Reactive oxygen production in mitochondria of mouse C2C12 muscle muscle cells treated DMSO or P053. (M) Enriched gene sets related to autophagy and mitophagy in 24mo C57BL/6J mouse muscles intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (left), or intraperitoneally injected with DMSO or P053 (right). Color gradient indicates normalized effect size. Dot size indicates statistical significance (n=6-8 per group). (N) Representative confocal Proteostat® stainings with quantifications of DMSO and P053 treated mouse muscle cells expressing APPSWE (top) and human primary myoblasts isolated from patients with inclusion body myositis (bottom). (O) Stillness duration during a 90 seconds interval in adult day 5 C. elegans treated with DMSO or 100uM P053. (P) Lifespan of C. elegans treated with DMSO or P053. (n=144-147 per group, for method details see main manuscript page 10).

      (2) As grip and exercise performance tests evaluate muscle function across several muscles, it is not evident how intramuscular AAV-mediated Cers1 inhibition solely in the gastrocnemius muscle can have a systemic effect or impact different muscles. This point requires clarification.

      The grip strength measurements presented in the manuscript come from hindlimb grip strength, as pointed out in the Methods section. We measured grip strength in all four limbs, as well as only fore- (Author response image 1C-D). While forelimb strength did not change, only hindlimb grip strength was significantly different in AAV-Cers1KD compared to the scramble control AAV (Fig 3I), which is in line with the fact that we only injected the AAV in the hindlimbs. This is similar to the effect we observed with our previous data where we saw altered muscle function upon IM AAV delivery in the gastrocnemius (PMID: PMID: 34878822, PMID: 37118545). The gastrocnemius likely has the largest contribution to hindlimb grip strength given its size, and possibly even overall grip strength as suggested by a trend of reduced grip strength in all four limbs (Author response image 1C). We also suspect that the hindlimb muscles have the largest contribution to uphill running as we could also see an effect on running performance. While we carefully injected a minimal amount of AAV into gastrocnemius to avoid leakage, we cannot completely rule out that some AAV might have spread to other muscles. We added this information to the discussion of the manuscript as a potential limitation of the study.

      (3) To further substantiate the role of Cers1 in myogenesis, it would be crucial to investigate the consequences of Cers1 inhibition under conditions of muscle damage, such as cardiotoxin treatment or eccentric exercise.<br /> While it would be interesting to study Cers1 in the context of muscle regeneration, and possibly mouse models of muscular dystrophy, we think such work would go beyond the scope of the current manuscript.

      (4) It would be informative to determine whether the muscle defects are primarily dependent on the reduction of C18-ceramides or the compensatory increase of C24-ceramides or C24-dihydroceramides.

      To improve mechanistic insights, as suggested by Reviewer #2, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. We added these data to the manuscript as new Fig 5D-E, new Fig S5G-I. These data, together with our previous results showing that Degs1 knockout reduces myogenesis (PMID: 37118545, Fig. 6s-x and Fig. 7) suggest that C24/dhC24 might contribute to the age-related impairments in myogenesis. We added the new results to the revised manuscript.

      (5) Previous studies from the research group (PMID 37118545) have shown that inhibiting the de novo sphingolipid pathway by blocking SPLC1-3 with myriocin counteracts muscle loss and that C18-ceramides increase during aging. In light of the current findings, certain issues need clarification and discussion. For instance, how would myriocin treatment, which reduces Cers1 activity because of the upstream inhibition of the pathway, have a positive effect on muscle? Additionally, it is essential to explain the association between the reduction of Cers1 gene expression with aging (Fig. 1B) and the age-dependent increase in C18-ceramides (PMID 37118545).

      Blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore seems beneficial for muscle aging. While most enzymes in the ceramide pathway that we studied so far (SPTLC1, CERS2) revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects. This is also visible in the direction of CERS1 expression compared to the other enzymes in one of our previous published studies (PMID: 37118545, Fig. 1e and Fig. 1f). In the current study, we show that Cers1 inhibition indeed exacerbates age-related myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. As the reviewer points out, both C18- and C24-ceramides seem to accumulate upon muscle aging. We think this is due to an overall overactive ceramide biosynthesis pathway. Blocking C18-ceramides via Cers1 inhibition results in the accumulates C24-ceramides and worsens muscle phenotypes (see reply to question #4). On the other hand, blocking C24-ceramides via Cers2 inhibition improves muscle differentiation. These observations together with the finding that Cers1 mediated inhibition of muscle differentiation is dependent on proper Cers2 function (new Fig 5D-E, new Fig S5G-I) points towards C24-ceramides as the main culprit of reduced muscle differentiation. Hence, at least a significant part of the benefits of blocking SPTLC1 might have been related to reducing very long-chain ceramides. We believe that reduced Cers1 expression in skeletal muscle upon aging, observed by us and others (PMID: 31692231), might reflect a compensatory mechanism to make up for an overall overactive ceramide flux in aged muscles. Reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition (new Fig 5E-D, new Fig S5G-I), sphingosine is forced towards the production of other, potentially less toxic, or myogenesis-impairing ceramides. These data are now added to the revised manuscript (see page 7). Details were added to the discussion of the manuscript (see page 8).

      Addressing these points will strengthen the manuscript's conclusions and provide a more comprehensive understanding of the role of Cers1 in skeletal muscle function during aging.

      Reviewer #1 (Recommendations For The Authors):

      The authors identified that genetical and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.

      Even though many of the experiments only confirmed previous published data (ref 21, 11,37,38), which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line, the study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia and opens new questions on understanding how inhibition of SPTLC1 (upstream CERS1) have beneficial effects in healthy aging (ref 15 published by the same authors).

      Overall, the article it's well written and clear. However, there is a major weakness. The mechanistic insights of how the increased amount of long ceramides (c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed. At the present stage the manuscript is descriptive and confirmatory of CERS1 mediated function in preserving muscle mass. The authors should consider the following points:

      Comments:

      (1) Muscle data

      (a) The effect of CERS1 inhibition on myotube formation must be better characterized. Which step of myogenesis is affected? Is stem cell renewal or MyoD replication/differentiation, or myoblast fusion or an increased cell death the major culprit of the small myotubes? Minor point: Figure S1C: show C14:00 level at 200 h; text of Fig S2A and 1F: MRF4 and Myogenin are not an early gene in myogenesis please correct, Fig S2B and 2C: changes in transcript does not mean changes in protein or myotube differentiation and therefore, authors must test myotube formation and myosin expression.

      Cers1 inhibition seems to affect differentiation and myoblast fusion. To test other suggested effects we performed more experiments as delineated. Inhibiting Cers1 systemically with the pharmacological inhibitor of Cers1 (P053) or with intramuscular delivery of AAV expressing a short hairpin RNA (shRNA) against Cers1 in mice did not affect Pax7 transcript levels (Author response image 1E). Moreover, we did also not observe an effect of shRNA targeting Cers1 on Pax7 levels in mouse C2C12 muscle progenitor cells (Author response image 1F). To characterize the effect of Cers1 inhibition on muscle progenitor proliferation/renewal, we used scramble shRNA, or shRNA targeting Cers1 in C2C12 muscle progenitors and measured proliferation using CellTiter-Glo (Promega). Results showed that Cers1KD had no significant effect on cell proliferation (Author response image 1G). Next, we assayed cell death in differentiating C2C12 myotubes deficient in Cers1 using FACS Analysis of Annexin V (left) and propidium iodide (right). We found no difference in early apoptosis, late apoptosis, necrosis, or muscle cell viability, suggesting that cell death can be ruled out to explain smaller myotubes (Author response image 1H). These findings support the notion that the inhibitory effect of Cers1 knockdown on muscle maturation are primarily based on effects on myogenesis rather than on apoptosis. Our data in the manuscript also suggests that Cers1 inhibition affects myoblast fusion, as shown by reduced myonucleation upon Cers1KD (Fig S3H right, Fig S5I).

      (b) The phenotype of CESR1 knockdown is milder than 0P53 treated mice (Fig S5D and Figure 3F, 3H are not significant) despite similar changes of Cer18:0, Cer24:0, Cer 24:1 concentration in muscles . Why?

      Increases in very long chain ceramides were in fact larger upon P053 administration compared to AAVmediated knockdown. For example, Cer24:0 levels increased by >50% upon P053 administration, compared to 20% by AAV injections. Moreover, dhC24:1 increased by 6.5-fold vs 2.5-fold upon P053 vs AAV treatment, respectively. These differences might not only explain the slightly attenuated phenotypes in the AA- treated mice but also underlines the notion that very long chain ceramides might cause muscle deterioration. We believe inhibiting the enzymatic activity of Cers1 (P053) as compared to degrading Cers1 transcripts is a more efficient strategy to reduce ceramide levels. However, we cannot completely rule out multi-organ, systemic effects of P053 treatment beyond its direct effect on muscle. We added these details in the discussion of the revised manuscript (see page 8 of the revised manuscript).

      (c) The authors talk about a possible compensation of CERS2 isoform but they never showed mRNA expression levels or CERS2 protein levels aner treatment. Is CERS2 higher expressed when CERS1 is downregulated in skeletal muscle?

      We appreciate the suggestion of the reviewer. We found no change in Cers2 mRNA levels upon Cers1 inhibition in mouse C2C12 myoblasts (Author response image 1I). We would like to point out that mRNA abundance might not be the optimal measurement for enzymes due to enzymatic activities. Therefore, we think metabolite levels are a better proxy of enzymatic activity. It should also be pointed out that “compensation” might not be an accurate description as sphingoid base substrate might simply be more available upon Cers1KD and hence, more substrate might be present for Cers2 to synthesize very long chain ceramides. This “re-routing” has been previously described in the literature and hypothesized to be related to avoid toxic (dh)sphingosine accumulation (PMID: 30131496). Therefore, we changed the wording in the revised manuscript to be more precise.

      (d) Force measurement of AAV CERS1 downregulated muscles could be a plus for the study (assay function of contractility)

      In the current study we measured grip strength in mice, which had previously been shown to be a good proxy of muscle strength and general health (PMID: 31631989). Indeed, our results of reduced muscle grip strength are in line with previous work that shows reduced contractility in muscles of Cers1 deficient mice (PMID: 31692231).

      (e) How are degradation pathways affected by the downregulation of CERS1. Is autophagy/mitophagy affected? How is mTOR and protein synthesis affected? There is a recent paper that showed that CerS1 silencing leads to a reduction in C18:0-Cer content, with a subsequent increase in the activity of the insulin pathway, and an improvement in skeletal muscle glucose uptake. Could be possible that CERS1 downregulation increases mTOR signalling and decreases autophagy pathway? Autophagic flux using colchicine in vivo would be useful to answer this hypothesis

      Cers1 in skeletal muscle has indeed been linked to metabolic homeostasis (see PMID: 30605666). In line with their finding in young mice we also find reduced fat mass upon P053 treatment in aged mice (Author response image 1A-B). We also looked into mitochondrial bioenergetics upon blocking Cers1 with P053 treatment using an O2k oxygraphy (Author response image 1J-L). Results show that Cers1 inhibition in mouse muscle cells increases mitochondrial respiration, similar to what has been shown before (PMID: 30131496). However, we also found that reactive oxygen species production in mouse muscle cells is increased upon P053 treatment, suggesting the presence of dysfunctional mitochondria upon inhibiting Cers1 with P053.We next looked into the mitophagy/autophagy degradation pathways suggested by the reviewer and do not find convincing evidence supporting that Cers1 has a major impact on autophagy or mitophagy derived gene sets in mice treated with shRNA against Cers1, or the Cers1 pharmacological inhibitor P053 (Author response image 1M).

      We then assessed the effect of Cers1 inhibition on transcripts levels related to the mTORC1/protein synthesis, as suggested by the reviewer. Cers1 knockdown in differentiating mouse muscle cells showed only a weak trend to reduce mTORC1 and its downstream targets (new Fig S4A). In line with this, there was no notable difference in protein synthesis in differentiating, Cers1 deficient mouse C2C12 myoblasts as assessed by L-homopropargylglycine (HPG) amino acid labeling using confocal microscopy (new Fig S4B) or FACS analyses (new Fig S4C). However, Cers1KD increased transcripts related to the myostatin-Foxo1 axis as well as the ubiquitin proteasome system (e.g. atrogin-1, MuRF1) (new Fig S4D), suggesting Cers1 inhibition increases protein degradation. We added these details to the revised manuscript on page 7. We recently implicated the ceramide pathway in regulating muscle protein homeostasis (PMID: 37196064). Therefore, we assessed the effect of Cers1 inhibition with the P053 pharmacological inhibitor on protein folding in muscle cells using the Proteostat dye that intercalates into the cross-beta spine of quaternary protein structures typically found in misfolded and aggregated proteins. Interestingly, inhibiting Cers1 further increased misfolded proteins in C2C12 mouse myoblasts expressing the Swedish mutation in APP and human myoblasts isolated from patients with inclusion body myositis (Author response imageure 1N). These findings suggest that deficient Cers1 might upregulate protein degradation to compensate for the accumulation of misfolded and aggregating proteins, which might contribute to impaired muscle function observed upon Cers1 knockdown. Further studies are needed to disentangle the underlying mechanstics.

      (f) The balances of ceramides have been found to play roles in mitophagy and fission with an impact on cell fate and metabolism. Did the authors check how are mitochondria morphology, mitophagy or how dynamics of mitochondria are altered in CERS1 knockdown muscles? (fission and fusion). There is growing evidence relating mitochondrial dysfunction to the contribution of the development of fibrosis and inflammation.

      Previously, CERS1 has been studied in the context of metabolism and mitochondria (for reference, please see PMID: 26739815, PMID: 29415895, PMID: 30605666, PMID: 30131496). In summary, these studies demonstrate that C18 ceramide levels are inversely related to insulin sensitivity in muscle and mitochondria, and that Cers1 inhibition improves insulin-stimulated suppression of hepatic glucose production and reduced high-fat diet induced adiposity. Moreover, improved mitochondrial respiration, citrate synthase activity and increased energy expenditure were reported upon Cers1 inhibition. Lack of Cers1 specifically in skeletal muscle was also reported to improve systemic glucose homeostasis. While these studies agree on the effect of Cers1 inhibition on fat loss, results on glucose homeostasis and insulin sensitivity differ depending on whether a pharmacologic or a genetic approach was used to inhibit Cers1. The current manuscript describes the effect of CERS1 on muscle function and myogenesis because these were the most strongly correlated pathways with CERS1 in human skeletal muscle (Fig 1C) and impact of Cers1 on these pathways is poorly studied, particularly in the context of aging. Therefore, we would like to refer to the mentioned studies investigating the effect of CERS1 on mitochondria and metabolism.

      (2) C.elegans data:

      (a) The authors checked maternal RNAi protocol to knockdown lagr-1 and showed alteration of muscle morphology at day 5. They also give pharmacological exposure of P053 drug at L4 stage. Furthermore, the authors also used a transgenic ortholog lagr-1 to perform the experiments. All of them were consistent showing a reduced movement. It would be important to show rescue of the muscle phenotype by overexpressing CERS1 ortholog in knockdown transgenic animals.

      We used RNAi to knockdown the Cers1 orthologue, lagr-1, in C.elegans. Therefore, we do not have transgenic animals. Overexpressing lagr-1 in the RNAi treated animals would also not be possible as the RNA from the overexpression would just get degraded.

      (b) The authors showed data about distance of C.elegans. It would be interesting to specify if body bends, reversals and stillness are affected in RNAi and transgenic Knockdown worms.

      As suggested, we measured trashing and stillness as suggested by the reviewer and found reduced trashing (new Fig S5B) and a trend towards an increase in stillness (Author response image 1O) in P053 treated worms on day 5 of adulthood, which is the day we observed significant differences in muscle morphology and movement (Fig 4D-E, Fig S5A). These data are now included in the revised manuscript.

      (c) Is there an effect on lifespan extension by knocking down CERS1?

      We performed two independent lifespan experiments in C.elegans treated with the Cers1 inhibitor P053 and found reduced lifespan in both replicate experiments (for second replicate, see Author response image 1P). We added these data to the revised manuscript as new Fig 4H.

      How do the authors explain the beneficial effect of sptlc1 inhibition on healthy aging muscle? Discuss more during the article if there is no possible explanation at the moment.

      We believe that blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore is more beneficial for muscle aging. Our current work suggests that at least a significant part of Sptlc1-KD benefits might stem from blocking very long chain ceramides. While SPTLC1 and CERS2 revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects, which is also visible in Fig 1e and Fig 1f of PMID: 37118545. In the current study, we show that Cers1 inhibition indeed exacerbates aging defects in myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. The fact that the effect of Cers1 on inhibiting muscle differentiation is dependent on the clearance of Cers2-derived C24-ceramides suggests that reducing very long chain ceramides might be crucial for healthy muscle aging. We added details to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      This paper reports how mycobacterial cAMP level is increased under stressful conditions and that the increase is important in the survival of the bacterium in animal hosts.

      Strengths:

      The authors show that under different stresses the response regulator PhoP represses a phosphodiesterase (PDE) that degrades cAMP specifically. Identification of a PDE specific to cAMP is significant progress in understanding Mtb pathogenesis. An increase in cAMP apparently increases bacterial survival upon infection. On the practical side, the reduction of cAMP by increasing PDE can be a means to attenuate the growth of the bacilli. The results have wider implications since PhoP is implicated in controlling diverse mycobacterial stress responses and many bacterial pathogens modulate host cell cAMP level. The results here are straightforward, internally consistent, and of both theoretical and applied interests.

      We thank the reviewers for these extremely encouraging comments.

      Weaknesses:

      Repression of PDE promoter by binding of phosphorylated PhoP could have been shown at higher precision. The binding is now somewhere along a roughly 500 bp region. Although the regulation of PDE is shown to be by transcriptional repression only, it has been described as a homeostatic mechanism. The latter would have required a demonstration of both repression and activation by negative feedback.

      We agree. We have now performed EMSA (Electrophoretic Mobility Shift Assay) experiments and included the data showing DNA binding of PhoP to the upstream regulatory region of rv0805 (rv0805up) as a supplemental figure (see Figure 2-figure supplement 1). The supplemental figure, figure caption, and the relevant results have been adjusted accordingly in the revised manuscript.

      Further, as recommended by the reviewer we have now removed the term ‘homeostatic mechanism’ and rephrased it with ‘maintenance of cAMP level’ in the manuscript.

      Response to Reviewers’ comments

      Reviewer #1:

      The authors have used homeostasis inappropriately. Homeostasis usually requires negative feedback (a clear example is the regulation of Lambda prm promoter). Here, there is no feedback from changes in PDE or cAMP level to their synthesis. Homeostasis does not belong to this paper anywhere.

      As recommended by the reviewer, we have now removed “homeostasis” from the manuscript and mostly replaced it with “maintenance of cAMP level” in the revised manuscript.

      The authors have frequently used adverbs at the beginning of a sentence, such as Notably (l.240, 272, 376), Importantly (l.66, 213), More importantly (l.134), Remarkably (l.264), Interestingly (l.115,301), Intriguingly (l.344), unambiguously (l.347), etc. The use of these words is generally counter-productive. The authors should scan the ms. to eliminate them as far as possible. The sentences would read more clearly and become more impactful.

      Following reviewer’s recommendation, we have now eliminated most of the adverbs, mostly used at the beginning of sentences, in the revised manuscript.

      Specific comments

      (1) L.1: "maintenance of homeostasis" or increasing cAMP level.

      As suggested by the reviewer, we have now replaced “maintenance of cAMP homeostasis” with “maintenance of cAMP level”.

      (2) L.27: mechanism or reason; varying or various.

      As recommended by the reviewer, we have now replaced “mechanism” with “reason” and the word “varying” is deleted while incorporating suggested changes in the abstract.

      (3) L.28-29: The logic of connecting PhoP to cAMP doesn't follow well. The logic is much better in l.54, l.112-5 and l.130.

      We thank the reviewer for this suggestion. We have now modified the statement within the ‘abstract’ in the revised manuscript (duplicated below):

      “cAMP is one of the most widely used second messengers which impacts on a wide range of cellular responses in microbial pathogens including M. tuberculosis. Herein, we hypothesized that intra-mycobacterial cAMP level could be controlled by the phoP locus since the major regulator plays a key role in bacterial response against numerous stress conditions.”

      (4) L.30: discovers or reveals (?). Also, in l.101.

      As recommended by the reviewer, we have now replaced ‘discovers’ with ‘reveals’ in the Abstract and ‘uncovered’ with ‘revealed’ in the Introduction section of the manuscript.

      (5) L.31: Delete "The most - - derived". It is not obvious what most fundamental means here. I suggest: We find that PhoP-dependent ---involves specific binding of the regulator---PDE gene.

      As recommended by the reviewer, we have modified the statement (duplicated below): “In keeping with these results, we find specific recruitment of the regulator within the promoter region of rv0805 PDE, and absence of phoP or ectopic expression of rv0805 independently accounts for elevated PDE synthesis leading to depletion of intra-mycobacterial cAMP level.”

      (6) L.36: --pathway decreases cAMP level, stress tolerance, and survival of the bacilli.

      As recommended by the reviewer, we have now modified the statement (duplicated below): “Thus, genetic manipulation to inactivate PhoP-Rv0805-cAMP pathway decreases cAMP level, stress tolerance, and intracellular survival of the bacilli.

      (7) L.41: 'keeps encountering" or encounters?

      As suggested by the reviewer, we have replaced ‘keeps encountering’ with ‘encounters’ in the ‘Introduction’ section of the revised manuscript.

      (8) L.61: responds, carries.

      Our apologies for the embarrassing grammatical mistakes. We have rectified these errors in the revised manuscript.

      (9) L.67: you mean burst in synthesis level, not burst of cAMP itself.

      To improve clarity, we have now modified the statement in the revised manuscript (duplicated below): “Agarwal and colleagues had shown that burst in synthesis of bacterial cAMP upon infection of macrophages, improved bacterial survival by interfering with host signalling pathways (Agarwal et al., 2009)”

      Reference

      Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR (2009) Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 460: 98-102

      (10) L.77: Change Off to Of.

      We are sorry for the inaccuracy. The suggested change has been made to the text.

      (11) L.83: Did not discuss "degradation" earlier.

      Following reviewer’s recommendation, we have now modified the statement in the revised manuscript (duplicated below).

      “Together, these results strongly suggest that a balance between cAMP synthesis by adenylate cyclases and cAMP degradation by phosphodiesterases contributes to rapid adaptive response of mycobacteria in a hostile intracellular environment (Johnson and McDonough, 2018; McDonough and Rodriguez, 2011).”

      Reference

      Johnson RM, McDonough KA (2018) Cyclic nucleotide signaling in Mycobacterium tuberculosis: an expanding repertoire. Pathog Dis 76 (5)

      McDonough KA, Rodriguez A (2011) The myriad roles of cyclic AMP in microbial pathogens: from signal to sword. Nature reviews Microbiology 10: 27-38

      (12) L.95: Isn't PhoPR a two-component signal transduction system, the terminology that is more specific than a two-protein regulatory system?

      As recommended by the reviewer, we have replaced “two protein regulatory system” with more specific “two-component signal transduction system” in the revised manuscript.

      (13) L.124: check-point prevents things from happening. Here the mechanism you found allows growth and survival.

      We agree. As recommended by the reviewer, we have now modified the sentence in the revised manuscript (duplicated below).

      “Together, the newly identified mechanism of regulation of cAMP level allows intraphagosomal survival and growth program of mycobacteria.”

      (14) L.132: why not say directly-"---under normal, and NO and acid stress conditions (Fig. 1A).

      As recommended by the reviewer, we have now deleted the first part of the sentence and directly stated that “we compared cAMP levels………. under normal, NO and acidic stress conditions” (duplicated below).

      “We compared cAMP levels of WT and phoPR-KO (lacking both phoP and phoR), grown under normal, NO stress and acid stress conditions (Fig. 1A).”

      (15) L.134: The complementation is quite variable. Also true in Fig. 2A. If no simple answer, you can say- cAMP values increased in complemented cells, although to a variable extent, for reasons unknown.

      We agree with the reviewer. We have now incorporated new text in the ‘Results’ section of the revised manuscript (duplicated below):

      “A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress conditions (Khan et al., 2022).”

      (16) L.154: You rather not say "conclude" and "most likely" at the same time. How about replacing "we conclude" with suggests? In that case, no need to say "most likely". Also, in l.306-7 & l.322-3.

      We thank the reviewer for these suggestions. We have now modified the statements in the revised manuscript (duplicated below).

      “We suggest that lower cAMP level of the mutant is not due to its higher efficacy of cAMP secretion.”

      Following reviewer’s recommendation, we have incorporated similar changes in two other places of the ‘Results’ section of the revised manuscript.

      (17) L.161: introduce both the acronyms here and not in l.162.

      Following reviewer’s recommendation, we have made the suggested changes.

      (18) L.164: Second, (to be in line with First).

      We have made the suggested change.

      (19). Fig. 2C: There are no black and white bars. This is an important figure because the results appear in the abstract. The signal change from pH 7 to 4.5 is not much. An independent approach would have been desirable. If it were E. coli, I would have suggested beta-gal assay or in vivo footprints. Is a PhoP binding site recognizable in the promoter region of rv0805?

      We apologize for the inaccuracy. We have corrected it in the revised manuscript. Also, we have now carried out DNA binding assays, and included the EMSA data of rv0805 upstream regulatory region binding to phosphorylated PhoP (P~PhoP) as a supplemental figure (Figure 2-figure supplement 1A-B). In this figure, we have also incorporated our results on the likely PhoP binding site within rv0805up. The new figure, figure caption and the relevant results have been adjusted accordingly in the revised manuscript.

      (20) L.209: ORFs; also delete "of growth" from the sentence.

      The suggested changes were made to the text.

      (21) L.213: Delete Importantly and change "failed to" to 'did not' (since you did not motivate the expectation earlier, it is better to state the results in an unbiased way).

      As recommended by the reviewer, both changes were included in the revised manuscript.

      (22) L.217: The requirement of PhoR is a new result - why say "confirm". Change it to indicate. Also, delete "indeed" here and from L.233.

      As recommended by the reviewer, both changes were included in the revised manuscript.

      (23) L.224: Are the results in Fig 3-S1A under inducing conditions?

      The results shown in Fig 3-S1A are not under inducing conditions of expression. For better clarity, we have modified the sentence describing Figure 3-figure supplement 1A (duplicated below).

      “rv0805 ORF was cloned within the multicloning site of integrative pSTki (Parikh et al., 2013) between EcoRI and HindIII sites under the control of Pmyc1tetO promoter, and expression of rv0805 under non-inducing condition was verified by determining the mRNA level (Figure 3 - figure supplement 1A).

      Reference:

      Parikh et al (2013) Development of a new generation of vectors for gene expression, gene replacement, and protein-protein interaction studies in mycobacteria. Applied and environmental microbiology 79: 1718-1729

      (24) L.225: ---cAMP level. Add (Fig. 3C) at the end of the next sentence.

      As recommended by the reviewer, both the suggested changes were made to the revised text.

      (25) L.231: Delete "Most importantly"- you didn't specify what are other less important results.

      We agree. We have now deleted “most importantly” from the sentence in the revised text.

      (26) L.243 & 254: Change homeostasis to level? Here you are showing mechanisms that can change cAMP level. Homeostasis here would mean how fluctuations in cAMP level are adjusted, usually requiring negative feedback.

      As recommended by the reviewer, ‘homeostasis’ was replaced with ‘level’ in both places.

      (27) L.256: stress response or stress? Also, in l.272

      We are sorry for the inaccuracy. We have corrected these errors in the revised version of the manuscript.

      (28) L.259: Change "maintenance of homeostasis" to 'repressing the rv0805 PDE gene'. It is safer to use a fact-based title. In this section, direct measurement of rv0805 mRNA, and/or cAMP levels in different genetic backgrounds seem desirable.

      We agree. As recommended by the reviewer, we have modified the title of the ‘Results’ section in the revised manuscript (duplicated below).

      “PhoP contributes to mycobacterial stress tolerance and intracellular survival by repressing the rv0805 PDE expression.”

      Please note that direct measurements of rv0805 mRNA and cAMP levels are part of Fig. 3 and Figure 3- figure supplement 1A, respectively.

      (29) Fig, 4A: White and grey symbols are not easily discriminated without zooming. Use color for phoPR-KO.

      We agree. We have now indicated the phoPR-KO in blue in the revised Fig. 4.

      (30) L.264: Delete remarkable or explain what is so remarkable. Aren't the results expected- the PDE level would go up in both cases. Direct measurement of PDE /cAMP levels would take the mystery out of the results.

      As recommended by the reviewer, we have deleted ‘remarkably’ in the revised text. We have measured cAMP and PDE expression levels of the four strains in Fig. 3 and Figure 3-figure supplement 1.

      (31) L.273: --suggesting a role of ---

      We have modified this sentence in the revised version of the manuscript (duplicated below).

      “A previous study had reported that phoP-deleted mutant strain was more sensitive to Cumene Hydrogen Peroxide (CHP), suggesting a role of PhoP in regulating mycobacterial stress response to oxidative stress (Walters et al., 2006).”

      Reference:

      Walters et al. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. Mol Microbiol 60: 312-330

      (32) L.275: Delete "transcriptome". CHP sensitivity alone doesn't speak for transcriptome.

      As suggested by the reviewer, we have deleted “transcriptome”. Also, please see our response to the previous comment (above).

      (33) Fig. 4D and E: % Colocalization in the Merge panels is not much different among the four strains tested (to an untrained eye). Can the results be explained to readers not used to in vivo studies?

      As recommended by the reviewer, we have now incorporated new text to explain the in vivo experiment (duplicated below).

      “In this assay, WT-H37Rv inhibits phagosome maturation, whereas phagosomes with phoPR-KO mature into phagolysosomes (Anil Kumar et al., 2016).”

      Further, for better clarity of the results shown in Fig. 4D, we have (a) increased size of the figure to highlight the difference in the ‘merge’ panel; (b) included “white arrowheads” in the merge panels of Fig. 4D to indicate auramine labeled mycobacteria, which either have inhibited or facilitated trafficking into lysosomes, and finally (c) incorporated method used to calculate percent co-localization in greater details in the ‘Material and Methods’ section of the revised manuscript.

      Reference

      Anil Kumar et al. (2016) EspR-dependent ESAT-6 secretion of Mycobacterium tuberculosis requires the presence of virulence regulator PhoP. J Biol Chem. 291, 19018-19030

      (34) L.275-6: Delete "next" (also in l.347) and "Note that". In this paragraph, I was expecting some explanation on how phoPR-KO and WT-Rv0805 are behaving similarly. Even if the reason is not known, it should be mentioned.

      The suggested changes have been made to the text. Also, as recommended by the reviewer, we have included the following text in the revised manuscript (duplicated below):

      “Together, these results reveal similar behaviour of phoPR-KO, and WT-Rv0805 by demonstrating a comparably higher susceptibility of these strains to acidic pH and oxidative stress relative to WT bacteria and indicate a link between intra-mycobacterial cAMP level and bacterial stress response. Collectively, these data suggest that at least one of the mechanisms by which PhoP contributes to global stress response is attributable to maintenance of cAMP level.”

      (35) L.281: ---WT and indicate a link between cAMP level and stress response in mycobacteria. (No mention of homeostasis).

      The suggested change has been made to the revised text. Please see above our response to point # 34.

      (36) L.288, 290: No Thus and no clearly.

      Both the suggested changes have been made to the text.

      (37) L.297: Can you be more direct and state --is due to reduced cAMP level?

      As recommended by the reviewer, we have now modified the sentence to make it more direct in the revised manuscript (duplicated below):

      “Together, our findings facilitate an integrated view of our results, suggesting that higher susceptibility of WT-Rv0805 to stress conditions, is attributable to its reduced cAMP level.”

      (38) L.307: May delete "most likely----homeostasis". cAMP is not discussed here. The same deletion is desired in l.324.

      We agree. As recommended by the reviewer, we have now modified the relevant texts in the revised manuscript. These are duplicated below.

      “From these results, we suggest that ectopic expression of rv0805 impacts phagosome maturation arguing in favour of a role of PhoP in influencing phagosome-lysosome fusion in macrophages.”

      “Thus, we suggest that one of the reasons which accounts for an attenuated phenotype of phoPR-KO in both cellular and animal models is attributable to PhoP-dependent repression of rv0805 PDE activity, which controls mycobacterial cAMP level.”

      (39) L.342: cAMP level is regulated remains---

      The suggested change has been made to the revised text (duplicated below):

      “Although many bacterial pathogens modulate host cell cAMP level as a common strategy, the mechanism of regulation of mycobacterial cAMP level remains unknown.”

      (40) L.373: tone down "most fundamental". It is not obvious what is so profound about a stress-response system that depends on PhoP also depends on PhoR. OR justify what is most fundamental about it.

      We agree. Following reviewer’s recommendation, we have modified the text in the revised manuscript (duplicated below):

      “In keeping with these results, we find that PhoP-dependent rv0805 expression requires PhoR (Figs. 3A-B), the cognate kinase which activates PhoP in a signal-dependent manner (Gupta et al., 2006; Singh et al., 2023).”

      References:

      Gupta et al. (2006) Transcriptional autoregulation by Mycobacterium tuberculosis PhoP involves recognition of novel direct repeat sequences in the regulatory region of the promoter. FEBS Letters 580, 5328-5338.

      Singh et al. (2023) Dual functioning by the PhoR sensor is a key determinant to Mycobacterium tuberculosis virulence. PLoS Genetics 19(12): e1011070.

      (41) L.395: delete correspondingly (?)

      The suggested change has been made to the text.

      (42) L.396: Delete "appear to" and "somewhat". The uncertainty is already implied in "suggest". The evidence that ectopic expression of rv0805 is functionally equivalent to phoP deletion is quite clear in this paper and not saying that clearly is confusing.

      We agree with the reviewer. The suggested changes have been made to the revised text (duplicated below):

      “Thus, our results suggest that ectopic expression of rv0805 is functionally equivalent to deletion of the phoP locus.”

      (43) L.401: --over-expressing bacilli, induction level of rv0805 expression was significantly different in Matange et al and our studies. The next sentence is also very wordy.

      We have made changes to the text to address the reviewer’s concern. Also, the next sentence has been rewritten (duplicated below).

      “Although both studies were performed with rv0805 over-expressing bacilli, the fact that important differences in the expression of PDEs, in this study (Matange et al., 2013) and in our assays - yielding significantly different levels of rv0805 expression - most likely account for this discrepancy. While we cannot rule out the possibility of cleavage of other cyclic nucleotides by Rv0805 (Keppetipola & Shuman, 2008; Shenoy et al., 2007; Shenoy et al., 2005), consistent with a previous study our results correlate rv0805 expression with intra-mycobacterial cAMP level (Agarwal et al., 2009).”

      References:

      Matange et al. (2013) Overexpression of the Rv0805 phosphodiesterase elicits a cAMP-independent transcriptional response. Tuberculosis (Edinb) 93: 492-500.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity. J Biol Chem 283: 30942-30949

      Shenoy et al. (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis. Journal of molecular biology 365: 211-225

      Shenoy et al. (2005) The Rv0805 gene from Mycobacterium tuberculosis encodes a 3',5'-cyclic nucleotide phosphodiesterase: biochemical and mutational analysis. Biochemistry 44: 15695-15704

      Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR (2009) Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 460: 98-102

      (44) L.409: To avoid saying "conclude" and "most likely" at the same time, can you start the sentence thus: 'We infer that Pho-----rv0805 is a---.

      We agree. We have made suggested changes to the text. The modified sentence is duplicated below:

      “We infer that PhoP-dependent regulation of Rv0805 is a critical regulator of intra-mycobacterial cAMP level.”

      (45) L.424. Delete "According to this model". In the preceding sentence, the subject is results, not model. This whole paragraph needs to be rewritten in fewer lines. The shorter the summary statement, the greater would be its impact (less is more here). I would delete the red circles from the figure- it appears that in the repressed state, you are making more products. Replace the circles with an arrow. The legend could be "Increased cAMP level and effective stress response" and "Decreased cAMP---and reduced---.

      We thank the reviewer for these suggestions. Following reviewer’s recommendations, we have made numerous changes and rewritten the paragraph in the revised manuscript (duplicated below):

      “In summary, upon sensing low acidic pH as a signal PhoR activates PhoP, P~PhoP binds to rv0805 upstream regulatory region and functions as a specific repressor of Rv0805. Therefore, we observed (a) a reproducibly lower level of cAMP in phoPR-KO relative to WT-H37Rv, (b) a significantly reduced expression of rv0805 in WT-H37Rv, grown under acidic pH relative to normal conditions, and (c) comparable cAMP levels in phoPR-KO and WT-Rv0805. This is why the two strains remain ineffective to mount an appropriate stress response, most likely due to their inability to coordinate regulation of gene expression because of dysregulation of intra-mycobacterial cAMP level. However, without uncoupling regulatory control of PhoPR and rv0805 expression, we cannot confirm that dysregulation of cAMP level accounts for virulence attenuation of phoPR-KO. Given the fact that rv0805-depleted M. tuberculosis is growth attenuated in vivo (McDowell et al., 2023), paradoxically ectopic expression of rv0805 leads to dysregulated metabolic adaptation, thereby resulting in reduced stress tolerance and intracellular survival.”

      Also, the suggested changes have been incorporated in Fig. 6 and the figure caption.

      Reference

      McDowell JR, Bai G, Lasek-Nesselquist E, Eisele LE, Wu Y, Hurteau G, Johnson R, Bai Y, Chen Y, Chan J et al (2023) Mycobacterial phosphodiesterase Rv0805 is a virulence determinant and its cyclic nucleotide hydrolytic activity is required for propionate detoxification. Mol Microbiol 119: 401-422

      (46) L.458 & 500: ---was used to transform.

      Following reviewer’s recommendation, the suggested changes were made to the text in the Materials and Methods section of the revised manuscript.

      (47) L.460: --- antibiotics plates.

      Both suggested changes were made to the text.

      (48) L.466-7: --they were transferred-pH 4.5) and grown for further-

      We thank the reviewer for these suggestions. The suggested changes were made to the text.

      (49) L.486: ---full-length ORFs of interest were---

      The suggested changes were incorporated in the revised manuscript.

      (50) L.497: The RNAs were 20 nt long and complementary---

      As recommended by the reviewer, we have modified the text in the revised manuscript (duplicated below).

      “The RNAs were 20 nt long and complementary to the non-template strand of the target gene.”

      Reviewer #2:

      (1) Rephrase this sentence in the abstract: “Because growing evidence connects PhoP with varying stress response, we hypothesized that the level of 3’,5’ cAMP, one of the most widely used second messengers, was regulated by the phoP locus, linking numerous stress responses with cAMP production”.

      As recommended by the reviewer, we have now rewritten the sentence. The modified text is incorporated in the revised manuscript (duplicated below):

      “cAMP is one of the most widely used second messengers, which impacts on a wide range of cellular responses in microbial pathogens including M. tuberculosis. Herein, we hypothesized that intra-mycobacterial cAMP level could be controlled by the phoP locus since the major regulator plays a key role in bacterial responses against numerous stress conditions.”

      Also, please see our response to specific comments #1-3 of Reviewer 1.

      (2) Line 134: please describe the complementation strain features as it is mentioned for the first time (plasmid, copy number, promoter etc.) in the manuscript. Especially under NO stress what could be the authors' justification regarding the high cAMP concentration in the complementation strain?

      As recommended by the reviewer, the details of construction of the complemented strain have been incorporated in the ‘Materials and Methods’ section of the revised manuscript (duplicated below):

      “To complement phoPR expression, pSM607 containing a 3.6- kb DNA fragment of M. tuberculosis phoPR including 200-bp phoP promoter region, a hygromycin resistance cassette, attP site and the gene encoding phage L5 integrase, as detailed earlier (Walters et al., 2006) was used to transform phoPR mutant to integrate at the L5 attB site.”

      To address the reviewer’s other concern, we have now included the following sentence in the ‘Results’ section of the revised manuscript (duplicated below):

      “A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress condition (Khan et al., 2022).”

      Reference:

      Khan et al. (2022) Convergence of two global regulators to coordinate expression of essential virulence determinants of Mycobacterium tuberculosis. eLife 2022, 11:e80965.

      (3) In Figure 1C, it is a bit confusing to see the numbers 1,2,3 and 4 and nothing is referred to these numbers in the figure legend so it's better to remove them.

      We agree with the reviewer. We have now removed the lane numbers from the figure (Fig. 1C) in the revised manuscript.

      (4) Line 852: rephrase it "insignificantly different".

      The suggested change has been made to the text. The modified text is incorporated in the manuscript (duplicated below):

      “Note that the difference in expression levels of rv0805 between WT and phoPR-KO was significant (p<0.01), whereas the fold difference in mRNA level between WT and the complemented mutant (Compl.) remains nonsignificant (not indicated).”

      (5) Line198-200: There are no open/black bars, they all are coloured bars. Correct the same. The significance test should be done for the same gene (suppose rv0805 up) in different pH conditions. Right now, it is not revealing anything and misleading.

      We apologize for the inaccuracy. We have now rectified the error. As recommended by the reviewer, Fig. 4C was modified, and the significance tests were carried out between samples involving identical promoter enrichments under different pH conditions. The modified figure, figure legend, and the relevant results have been adjusted accordingly in the revised manuscript.

      (6) Line 213: Is there any difference between this complementation strain (phoPR-KO:: phoPphoR with the one used in Figure 1A, 1B, and 2A? If yes, then please describe it.

      The same complemented mutant strain, which has been described in the ‘Materials and Methods’ section of the revised manuscript, was used in the experiments described in Fig. 1A, Fig.1B and Fig. 2A.

      (7) Line 223: Please mention the copy number and promoter of the vector construct.

      As recommended by the reviewer, we have now mentioned the promoter of the vector and incorporated new text with regard to copy number of the expression vector in the revised manuscript (duplicated below).

      “Although copy number of episomal vectors with pAl5000 origin of replication (oriM) have been reported to be 3 by Southern hybridization (Ranes et al, 1990), in this case wild-type and mutant Rv0805 proteins were expressed from single-copy chromosomal integrants (Parikh et al., 2013).”

      References

      Ranes et al., (1990) Functional analysis of pAL5000, a plasmid from Mycobacterium fortuitum: construction of a "mini" mycobacterium-Escherichia coli shuttle vector. J Bacteriol 172: 2793-2797

      Parikh et al., (2013) Development of a new generation of vectors for gene expression, gene replacement, and protein-protein interaction studies in mycobacteria. Applied and environmental microbiology 79: 1718-1729

      (8) Figure 3 - Figure Supplement 1: not sure why the authors measured mRNA levels of rv1357 and rv2387? These genes were not overexpressed!

      The mRNA levels of rv1357 and rv2387 were measured to show that overexpression of either the wild-type or mutant Rv0805 did not influence expression of other PDEs like Rv1357 and Rv2387. We have now mentioned it explicitly in the revised manuscript (duplicated below).

      “In contrast, other PDE encoding genes (rv1357 and rv2387), under identical conditions, demonstrate comparable expression levels in WT-H37Rv and rv0805 over-expressing strains.”

      (9) Line 234: Wrong interpretation it should be PDE mRNA levels in WT-Rv0805 and WT-Rv0805M.

      As recommended by the reviewer, we have now modified the statement to improve clarity (duplicated below).

      “The corresponding mRNA levels of PDEs (wild-type and the mutant) are over-expressed approximately 4.5-6 -fold relative to the genomic rv0805 level of WT-H37Rv (Figure 3-figure supplement 1A).”

      (10) Line 237: Remove the sentence "Thus, we conclude......identical expression strategy", you have already talked about why phosphodiesterase activity is crucial for cAMP concentration and it is well understood.

      Following reviewer’s recommendation, we have now removed the sentence from the revised manuscript.

      (11) Figure 3E: Authors should comment on why the cAMP concentration is not significantly changed even though the mRNA level changes are drastic (~90%). How do you correlate that? Is it because of other PDEs?

      We agree. As suggested by the reviewer, we have now incorporated new text in the revised manuscript (duplicated below).

      “We speculate that effective knocking down of phoP or rv0805 is not truly reflected in the extent of variation of cAMP levels possibly due to the presence of numerous other mycobacterial PDEs.”

      (12) Line 505,506: Is it the translation start site or the transcription start site? Because mRNA level changes are reported.

      It is the translational start sites, and gene-specific small guide RNAs were designed to inhibit mRNA expression.

      (13) Line 292: There is a difference between red and green bars. Authors should do statistical analysis and then comment on whether overexpression of WT and mutant pde are different or similar, to me they are different; also, explain why the WT-Rv0805 strain is different than the phoPR-KO strain in the context of cell wall metabolism.

      As recommended by the reviewer, we have now included statistical significance of the data in the revised version, and modified the text accordingly in the manuscript.

      Also, we included text explaining why WT-Rv0805 is different compared to phoPR-KO strain in the context of cell wall metabolism (duplicated below).

      “Together, these results suggest that both strains expressing wild type or mutant PDEs share a largely similar cell-wall properties and are consistent with (a) a recent study reporting no significant effect of cAMP dysregulation on mycobacterial cell wall structure/permeability (Wong et al., 2023), and (b) role of PhoP in cell wall composition and complex lipid biosynthesis (Walters et al., 2006; Asensio et al., 2006; Goyal et al., 2011).”

      References:

      Wong et al. (2023) Cyclic AMP is a critical mediator of intrinsic drug resistance and fatty acid metabolism in M. tuberculosis. eLife 2023; 12: e81177

      Walters et al. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. Mol Microbiol 60: 312-330

      Asensio et al. (2006) The Virulence-associated Two-component PhoP-PhoR System Controls the Biosynthesis of Polyketide-derived Lipids in Mycobacterium tuberculosis. J Biol Chem 281: 1313-1316.

      Goyal et al. (2011) Phosphorylation of PhoP protein plays direct regulatory role in lipid biosynthesis of Mycobacterium tuberculosis. J Biol Chem 286: 45197-45208

      (14) Line 299-303: Authors should explain how the colocalization % are calculated. Also, in the figure 4D merge panel please highlight the difference.

      As suggested by the reviewer, we have now explained the methodology used to calculate percent colocalization in greater details. Also, we have modified Figure 4D to highlight the difference between samples shown in merge panel. Please see our response to comment # 33 from the Reviewer 1.

      (15) General comment: There are multiple instances where writing needs to be improved.

      We are sorry for the inaccuracies. We have now done thorough editing of the manuscript and made numerous corrections throughout.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports a novel measurement for the chemotactic response to potassium by Escherichia coli. The authors convincingly demonstrate that these bacteria exhibit an attractant response to potassium and connect this to changes in intracellular pH level. However, some experimental results are incomplete, with additional controls/alternate measurements required to support the conclusions. The work will be of interest to those studying bacterial signalling and response to environmental cues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper shows that E. coli exhibits a chemotactic response to potassium by measuring both the motor response (using a bead assay) and the intracellular signaling response (CheY phosporylation level via FRET) to step changes in potassium concentration. They find increase in potassium concentration induces a considerable attractant response, with an amplitude larger than aspartate, and cells can quickly adapt (but possibly imperfectly). The authors propose that the mechanism for potassium response is through modifying intracellular pH; they find both that potassium modifies pH and other pH modifiers induce similar attractant responses. It is also shown, using Tar- and Tsr-only mutants, that these two chemoreceptors respond to potassium differently. Tsr has a standard attractant response, while Tar has a biphasic response (repellent-like then attractant-like). Finally, the authors use computer simulations to study the swimming response of cells to a periodic potassium signal secreted from a biofilm and find a phase delay that depends on the period of oscillation.

      Strengths:

      The finding that E. coli can sense and adapt to potassium signals and the connection to intracellular pH is quite interesting and this work should stimulate future experimental and theoretical studies regarding the microscopic mechanisms governing this response. The evidence (from both the bead assay and FRET) that potassium induces an attractant response is convincing, as is the proposed mechanism involving modification of intracellular pH.

      Weaknesses:

      The authors show that changes in pH impact fluorescent protein brightness and modify the FRET signal; this measurement explains the apparent imprecise adaptation they measured. However, this effect reduces confidence in the quantitative accuracy of the FRET measurements. For example, part of the potassium response curve (Fig. 4B) can be attributed to chemotactic response and part comes from the pH modifying the FRET signal. Measuring the full potassium response curve of the no-receptor mutants as a control would help quantify the true magnitude of the chemotactic response and the adaptation precision to potassium.

      Response: We thank the reviewer for the suggestion. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The measured response may also be impacted by adaptation. For other strong attractant stimuli, the response typically shows a low plateau before it recovers (adapts). However, in the case of Potassium, the FRET signal does not have an obvious plateau following the stimuli. Do the authors have an explanation for that? One possibility is that the cells may have already partially adapted when the response reaches its minimum, which could indicate a different response and/or adaptation dynamics from that of a regular chemo-attractant? In any case, directly measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels would shed more light on the problem.

      Response: We appreciate the reviewer’s insightful questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (Sourjik & Berg, PNAS 99:123, 2002), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1 below, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      Author response image 1.

      The response of the cheRcheB mutant (HCB1382-pVS88) to different concentrations of KCl. The blue solid line denotes the original signal, while the red dots represent the pH-corrected signal. The vertical purple (green) dashed lines indicate the moment of adding (removing) 0.01 mM, 0.1 mM, 0.3 mM, 1 mM, 3 mM, 10 mM and 30 mM KCl, in chronological order.

      There seems to be an inconsistency between the FRET and bead assay measurements, the CW bias shows over-adaptation, while the FRET measurement does not.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      Now we clarified it at line 315.

      The small hill coefficient of the potassium response curve and the biphasic response of the Tar-only strain, while both very interesting, require further explanation since these are quite different than responses to more conventional chemoattractants.

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5) and the biphasic response of the Tar-only strain (Fig. 5C). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspected that this Hill coefficient of slightly less than 1 resulted from the different responses of Tar and Tsr receptors to potassium.

      The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al investigated the biophysical mechanism of potassium-mediated chemotactic behavior in E coli. Previously, it was reported by Humphries et al that the potassium waves from oscillating B subtilis biofilm attract P aeruginosa through chemotactic behavior of motile P aeruginosa cells. It was proposed that K+ waves alter PMF of P aeruginosa. However, the mechanism was this behaviour was not elusive. In this study, Zhang et al demonstrated that motile E coli cells accumulate in regions of high potassium levels. They found that this behavior is likely resulting from the chemotaxis signalling pathway, mediated by an elevation of intracellular pH. Overall, a solid body of evidence is provided to support the claims. However, the impacts of pH on the fluorescence proteins need to be better evaluated. In its current form, the evidence is insufficient to say that the fluoresce intensity ratio results from FRET. It may well be an artefact of pH change. Nevertheless, this is an important piece of work. The text is well written, with a good balance of background information to help the reader follow the questions investigated in this research work.

      In my view, the effect of pH on the FRET between CheY-eYFP and CheZ-eCFP is not fully examined. The authors demonstrated in Fig. S3 that CFP intensity itself changes by KCl, likely due to pH. They showed that CFP itself is affected by pH. This result raises a question of whether the FRET data in Fig3-5 could result from the intensity changes of FPs, but not FRET. The measured dynamics may have nothing to do with the interaction between CheY and CheZ. It should be noted that CFP and YFP have different sensitivities to pH. So, the measurement is likely confounded by the change in intracellular pH. Without further experiments to evaluate the effect of pH on CFP and YFP, the data using this FRET pair is inconclusive.

      Response: We thank the reviewer for pointing this out. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The data in Figure 1 is convincing. It would be helpful to include example videos. There is also ambiguity in the method section for this experiment. It states 100mM KCl was flown to the source channel. However, it is not clear if 100 mM KCl was prepared in water or in the potassium-depleted motility buffer. If KCl was prepared with water, there would be a gradient of other chemicals in the buffer, which confound the data.

      Response: We apologize for the ambiguity. The KCl solution used in this work was prepared in the potassium-depleted motility buffer. We have now clarified this at both lines 116 and 497. We now provided an example video, Movie S1, with the relevant text added at line 123.

      The authors show that the FRET data with both KCl and K2SO4, and concluded that the chemotactic response mainly resulted from potassium ions. However, this was only measured by FRET. It would be more convincing if the motility assay in Fig1 is also performed with K2SO4.

      Response: We thank the reviewer for the suggestion. The aim of comparing the responses to KCl and K2SO4 was to determine the role of chloride ions in the response and to prove that the chemotactic response of E. coli to KCl comes primarily from its response to potassium ions. It is more sensitive to compare the responses to KCl and K2SO4 by using the FRET assay. In contrast, the microfluidic motility assay is less sensitive in revealing the difference in the chemotactic responses, making it difficult to determine the potential role of chloride ions.

      Methods:

      • Please clarify the promotes used for the constitutive expression of FliCsticky and LacI.

      Response: The promoters used for the constitutive expression of LacIq and FliCsticky were the Iq promoter and the native promoter of fliC, respectively (ref. 57).

      Now these have been clarified at line 471.

      • Fluorescence filters and imaging conditions (exposure time, light intensity) are missing.

      Response: Thank you for the suggestion. We have now added more descriptions at lines 535-546: The FRET setup was based on a Nikon Ti-E microscope equipped with a 40× 0.60 NA objective. The illumination light was provided by a 130-W mercury lamp, attenuated by a factor of 1024 with neutral density filters, and passed through an excitation bandpass filter (FF02-438/24-25, Semrock) and a dichroic mirror (FF458-Di02-25x36, Semrock). The epifluorescent emission was split into cyan and yellow channels by a second dichroic mirror (FF509-FDi01-25x36, Semrock). The signals in the two channels were then filtered by two emission bandpass filters (FF01-483/32-25 and FF01-542/32-25, Semrock) and collected by two photon-counting photomultipliers (H7421-40, Hamamatsu, Hamamatsu City, Japan), respectively. Signals from the two photomultipliers were recorded at a sampling rate of 1 Hz using a data-acquisition card installed in a computer (USB-1901(G)-1020, ADlink, New Taipei, Taiwan).

      • Please clarify if the temperature was controlled in motility assays.

      Response: All measurements in our work were performed at 23 ℃. It was clarified at line 496.

      • L513. It is not clear how theta was selected. Was theta set to be between 0 and pi? If not, P(theta) can be negative?

      Response: The θ was set to be between 0 and π. This has now been added at line 581.

      • Typo in L442 (and) and L519 (Koff)

      Response: Thank you. Corrected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) From the motor measurements the authors find that the CW bias over-adapts to a level larger than prestimulus, but this is not seen in the FRET measurements. What causes this inconsistency? Fig. 2D seems to rule out any change in CheY binding to the motor.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      We now clarified it at line 315.

      (2) It would be useful to compare the response amplitude for potassium (Fig. 3C) to a large concentration of both MeAsp and serine. This is a fairer comparison since your work shows potassium acts on both Tar and Tsr. Alternatively, testing a much larger concentration (~10^6 micromolar) at which MeAsp also binds to Tsr would also be useful.

      Response: We thank the reviewer for pointing this out. We have now recalculated the response to potassium by correcting the pH-induced effects on fluorescence intensity of CFP and YFP. The response to 30 mM KCl was 1.060.10 times as large as that to 100 μM MeAsp. The aim of the comparison between the responses to potassium and MeAsp was to provide an idea of the magnitude of the chemotactic response to potassium. The stimulus of 100 μM MeAsp is already a saturating amount of attractant and induces zero-kinase activity, thus using a higher stimulus (adding serine or a larger concentration of MeAsp) is probably not needed. Moreover, a larger concentration (~10^6 micromolar) of MeAsp would also induce an osmotactic response.

      (3) The fitted Hill coefficient (~0.5) to the FRET response curve is quite small and the authors suggest this indicates negative cooperativity. Do they have a proposed mechanism for negative cooperativity? Have similar coefficients been measured for other responses?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspect that this Hill coefficient of slightly less than 1 results from the differing responses of Tar and Tsr receptors to potassium.

      (3a) The authors state a few times that the response to potassium is "very sensitive", but the low Hill coefficient indicates that the response is not very sensitive (at least compared to aspartate and serine responses).

      Response: We apologize for the confusion. We described the response to potassium as “very sensitive” due to the small value of K0.5. This has now been clarified at line 236.

      (3b) Since the measurements are performed in wild-type cells the response amplitude following the addition of potassium may be biased if the cell has already partially adapted. This seems to be the case since the FRET time series does not plateau after the addition of the stimulus. The accuracy of the response curve and hill coefficient would be more convincing if the experiment was repeated with a cheR cheB deficient mutant.

      Response: We thank the reviewer for raising these questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (ref. 46), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      (4) The authors show that the measured imprecise adaptation can be (at least partially) attributed to pH impacting the FRET signal by changing eCFP and eYFP brightness.

      (4a) Comparing Fig. 5C and D, the chemosensing and pH response time scales look similar. Therefore, does the pH effect bias the measured response amplitude (just as it biases the adapted FRET level)?

      Response: We agree with the reviewer that the pH effect on CFP and YFP biases the measured response amplitude. We have now performed the measurement of dose-response curve to potassium for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. The pH effects on CFP and YFP were corrected. The dose-response curve and adaptation curve were recalculated and plotted in Fig. S5.

      (4b) It would help to measure a full response curve (at many concentrations) for the no-receptor strain as a control. This would help distinguish, as a function of concentration, how much response can be attributed to pH impacting the FRET signal versus the true chemotactic response.

      Response: We thank the reviewer for the suggestion. We have now performed the measurements for the no-receptor strain. The impact of pH on CFP and YFP has been corrected. The pH-corrected results, previously in Fig.3-5, are now presented in Fig. 3, Fig. S5 and Fig. 5, respectively.

      (5) The biphasic response of Tar is strange and warrants further discussion. Do the authors have any proposed mechanisms that lead to this behavior? For the 10mM and 30mM KCl measurements there is a repellent response followed by an attractant response for both adding and removing the stimuli, why is this?

      Response: We thank the reviewer for pointing this out. The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5a) The fact that Tar and Tsr are both attractant (after the initial repellant response in Tar) appears to be inconsistent with previous work on pH response (Ref 52, Yang and Sourjik Molecular Microbiology (2012) 86(6), 1482-1489). This study also didn't see any biphasic response.

      Response: We thank the reviewer for pointing this out. The Tar-only strain shows a repellent response to stepwise addition of low concentrations of potassium, specifically less than 10 mM. This is consistent with previous observations of the response of Tar to changes in intracellular pH (refs. 44,45) and also with the work of Yang and Sourjik (new ref. 53), although the work in ref. 53 dealt with the response to external pH change, and bacteria were known to maintain a relatively stable intracellular pH when external pH changes (Chen & Berg, Biophysical Journal (2000) 78:2280-2284). Interestingly, the Tar-only strain exhibits a biphasic response to high potassium concentrations of 10 mM and above. This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA (ref. 56), which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5b) The response of Tar to the removal of sodium benzoate (Fig. S2) seems to be triphasic, is there any explanation for this?

      Response: We thank the reviewer for pointing this out. We have now acknowledged in the legend of Fig. S2 that this response is interesting and warrants further exploration: “The response to the removal of sodium benzoate seems to be a superposition of an attractant and a repellent response, the reason for which deserves to be further explored.”

      (6) Fitting the MWC model leads to N=0.35<1. It is fine to use this as a phenomenological parameter, but can the authors comment on what might be causing such a small effective cluster size for potassium response?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We now refit the MWC model to the pH-corrected dose-response curve, obtaining N of 0.85. We think the small N is due partly to the fact that we are fitting the curve with four parameters: N, Kon, Koff, and fm, while only three features of the sigmoid does-response curve are relevant (the vertical scale, the midpoint concentration, and the slope of the sigmoid). Future experiments may determine these parameters more accurately, but they should not significantly affect the simulation results as long as the wild-type dose-response curve is accurate.

      (7) The results of the modeling are closely related to Zhu et. al. Phys. Rev. Lett. 108, 128101. Is the lag time for large T related to the adaptation time?

      Response: We thank the reviewer for pointing this out. We used a similar framework of modeling as Zhu et. al. The potassium response was also analogous to the chemotactic response to MeAsp. Thus, the results are closely related to Zhu et al. We have now cited Zhu et al. (Ref. 52) and noted this at line 366.

      The lag time for large T is related to the adaptation time. We have now simulated the chemotaxis to potassium for large T with different adaptation time by varying the methylation rate kR. The results are shown in Fig. S8. The simulated lag time decreases with the methylation rate kR, but levels off at high values of kR. Now this has been added at line 603.

      Minor issues:

      • Fig. 1C: should the axis label be y?

      Response: Yes, thank you. Now corrected.

      • Line 519: Koff given twice, the second should be Kon.

      Response: Thank you. Corrected.

      • When fitting the MWC model (Eq. 3 and Fig. 6B) did you fix a particular value for m?

      Response: m was treated as a fitting parameter, grouped in the parameter fm.

      Reviewer #2 (Recommendations For The Authors):

      Minor points: - I suggest explaining the acronyms when they first appear in the text (eg CMC, CW, CCW).

      Response: Thank you. Now they have been added.

      • L144. L242. "decrease" is ambiguous since membrane potential is negative. I understand the authors meant less negative (which is an increase). I suggest to avoid this expression.

      Response: Thank you for the suggestion. Now they have been replaced by “The absolute value of the transmembrane electrical potential will decrease”.

      • For Fig 1b - it says the shaded area is SEM in the text, but SD in the legend. Please clarify.

      Response: Thank you. The annotation in the legend has now been revised as SEM.

      • Fig 1C label of x axis should be "y" instead of "x" to be consistent with Fig 1A.

      Response: Thank you. It has now been revised.

      • In Figure 2, the number of independent experiments as well as the number of samples should be included.

      Response: Thank you. The response in Fig. 2C is the average of 83 motors from 5 samples for wild-type strain (JY26-pKAF131). The response in Fig. 2D is the average of 22 motors from 4 samples for the chemotaxis-defective strain (HCB901-pBES38). They have now been added to the legend.

      • Regarding the attractant or repelling action of potassium and sucrose, it would be important to have a move showing the cells' behaviours.

      Response: We thank the reviewer for the suggestion. We have now provided Movie S1 to show the cells’ behavior to potassium. As shown in Fig. 3B, the chemotactic response to 60 mM sucrose is very small compared to the response to 30 mM KCl. This implies that a noticeable response to sucrose necessitates higher concentrations of stimulation. However, Jerko et al. [Rosko, J., Martinez, V. A., Poon, W. C. K. & Pilizota, T. Proc. Natl Acad. Sci. USA 114, E7969-E7976 (2017).] have shown that high concentrations of sucrose lead to a significant reduction in the speed of the flagella motor. Thus, in a motility assay for sucrose, the osmolarity-induced motility effect may overwhelm the minor repellent-like response.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The weaknesses are the brevity of the simulations, the concomitant lack of scope of the simulations, the lack of depth in the analysis, and the incomplete relation to other relevant work.

      A 1 µs simulation of CCh (Video 1, part 2) shows that m3 (ACHA) is stable, throughout. The DG comparisons, in silico versus in vitro, indicate that 200 ns simulations are sufficient to identify LA versus HA conformational populations. Figure 6-table supplement 1 shows distances. New citations have been added.

      Reviewer #2 (Public Review):

      Weaknesses:

      After carrying out all-atom molecular dynamics, the authors revert to a model of binding using continuum Poisson-Boltzmann, surface area, and vibrational entropy. The motivations for and limitations associated with this approximate model for the thermodynamics of binding, rather than using modern atomistic MD free energy methods (that would fully incorporate configurational sampling of the protein, ligand, and solvent) could be provided. Despite this, the authors report a correlation between their free energy estimates and those inferred from the experiment. This did, however, reveal shortcomings for two of the agonists. The authors mention their trouble getting correlation to experiment for Ebt and Ebx and refer to up to 130% errors in free energy. But this is far worse than a simple proportional error, because -24 Vs -10 kcal/mol is a massive overestimation of free energy, as would be evident if the authors were to instead express results in terms of KD values (which would have an error exceeding a billion fold). The MD analysis could be improved with better measures of convergence, as well as a more careful discussion of free energy maps as a function of identified principal components, as described below. Overall, however, the study has provided useful observations and interpretations of agonist binding that will help understand pentameric ligand-gated ion channel activation.

      The objective of the calculations was to identify structural populations, not to estimate binding free energies. We knew the actual LA and HA energies (for all 4 agonists) from real-world electrophysiology experiments. We conclude that the simple PBSA method worked as a tool for identification because the calculated efficiencies match those from experiments (Figure 4B, Figure 4-Source Data 1). We discuss the mismatches in absolute G in the Results and Discussion. Methods for estimating experimental binding free energies are described in a cited, eLife companion paper. The G ratio relates to agonist efficiency.

      Main points:

      Regarding the choice of model, some further justification of the reduced 2 subunit ECD-only model could be given. On page 5 the authors argue that, because binding free energies are independent of energy changes outside the binding pocket, they could remove the TMD and study only an ECD subunit dimer. While the assumption of distant interactions being small seems somewhat reasonable, provided conformational changes are limited and localised, how do we know the packing of TMD onto the ECD does not alter the ability of the alpha-delta interface to rearrange during weak or strong binding? They further write that "fluctuations observed at the base of the ECD were anticipated because the TMD that offers stability here was absent.". As the TMD-ECD interface is the "gating interface" that is reshaped by agonist binding, surely the TMD-ECD interface structure must affect binding. It seems a little dangerous to completely separate the agonist binding and gating infrastructure, based on some assumption of independence. Given the model was only the alpha and delta subunits and not the pentamer with TMD, I am surprised such a model was stable without some heavy restraints. The authors state that "as a further control we carried out MD simulation of a pentamer docked with ACh and found similar structural changes at the binding pocket compared to the dimer." Is this sufficient proof of the accuracy of the simplified model? How similar was the model itself with and without agonist in terms of overall RMSD and RMSD for the subunit interface and the agonist binding site, as well as the free energy of binding to each model to compare?

      The statement that distant interactions are small is not an "assumption", but rather a conclusion based on data. Mutant cycle analysis of 83 pairs shows (with a few exceptions) non-additivity of free energy change prevails only with separations <~15 A (Fig.3 in Gupta et al 2017). Regardless, the adequacy of dimers and convergence by 200 ns are supported by the calculated and experimental agonist efficiencies match (Figure 4B) and the 1 ms simulation (Video 1 part 2). Apo 200ns simulation of the ECD dimer is now added (Figure 2-figure supplement 2) and the dimer interface seems to be adequate (stable).

      Although the authors repeatedly state that they have good convergence with their MD, I believe the analysis could be improved to convince us. On page 8 the authors write that the RMSD of the system converged in under 200 ns of MD. However, I note that the graph is of the entire ECD dimer, not a measure for the local binding site region. An additional RMSD of local binding site would be much more telling. You could have a structural isomerisation in the site and not even notice it in the existing graph. On page 9 the authors write that the RMSF in Figure S2 showed instability mainly in loops C and F around the pocket. Given this flexibility at the alpha-delta interface, this is why collecting those regions into one group for the calculation of RMSD convergence analysis would have been useful. They then state "the final MD configuration (with CCh) was well-aligned with the CCh-bound cryo-EM desensitized structure (7QL6)... further demonstrating that the simulation had converged." That may suggest a change occurred that is in common with the global minimum seen in cryo EM, which is good, but does not prove the MD has "converged". I would also rename Figure S3 accordingly.

      The description is now changed to “aligns well” with desensitized structure (7QL6.PDB)”. RMSD of not just the binding pocket but the whole ECD dimer is well aligned with first apo (m1) and with desensitized state (m3).

      The authors draw conclusions about the dominant states and pathways from their PCA component free energy projections that need clarification. It is important first to show data to demonstrate that the two PCA components chosen were dominant and accounted for most of the variance. Then when mapping free energy as a function of those two PCA components, to prove that those maps have sufficient convergence to be able to interpret them. Moreover, if the free energies themselves cannot be used to measure state stability (as seems to be the case), that the limitations are carefully explained. First, was PCA done on all MD trajectories combined to find a common PC1 & PC2, or were they done separately on each simulation? If so, how similar are they? The authors write "the first two principal components (PC-1 and PC-2) that capture the most pronounced C. displacements". How much of the total variance did these two components capture? The authors write the changes mostly concern loop C and loop F, but which data proves this? e.g. A plot of PC1 and PC2 over residue number might help.

      The PCA analyses have been enriched. Figure 3-Source Data 1. shows the dominance of PC1 and PC2. Because the binding energy match was sufficient to identify affinity states, we did not explore additional PCs. Residue-wise PC1 and PC2 analysis and comparison with RMSF are in Figure 2-figure supplement 2. PC1 and PC2 both correlate with fluctuations in loops C and F. Overlap analysis in different runs is shown in Figure 3-figure supplement 1. Lower variance in a particular region of the PCA landscape indicates that the system frequently visits these states, suggesting stability (a preference for these conformations).

      The authors map the -kTln rho as a free energy for each simulation as a function of PC1 & PC2. It is important to reveal how well that PC1-2 space was sampled, and how those maps converged over time. The shapes of the maps and the relative depths of the wells look very different for each agonist. If the maps were sampled well and converged, the free energies themselves would tell us the stabilities of each state. Instead, the authors do not even mention this and instead talk about "variance" being the indicator of stability, stating that m3 is most stable in all cases. While I can believe 200ns could not converge a PC1-2 map and that meaningful delta G values might not be obtained from them, the issue of lack of sampling must be dealt with. On page 12 they write "Although the bottom of the well for 3 energy minima from PCA represent the most stable overall conformation of the protein, they do not convey direct information regarding agonist stability or orientation". The reasons why not must be explained; as they should do just that if the two order parameters PC1 and PC2 captured the slowest degrees of freedom for binding and sampling was sufficient. The authors write that "For all agonists and trajectories, m3 had the least variance (was most stable), again supporting convergence by 200 ns." Again the issue of actual free energy values in the maps needs to be dealt with. The probabilities expressed as -kTln rho in kcal/mol might suggest that m2 is the most stable. Instead, the authors base stability only on variance (I guess breadth of the well?), where m3 may be more localised in the chosen PC space, despite apparently having less preference during the MD (not the lowest free energy in the maps).

      The motivations and justifications for the use of approximate PBSA energetics instead of atomistic MD free energies should be dealt with in the manuscript, with limitations more clearly discussed. Rather than using modern all-atom MD free energy methods for relative or absolute binding free energies, the author selects clusters from their identified states and does Poisson-Boltzmann estimates (electrostatic, vdW, surface area, vibrational entropy). I do believe the following sentence does not begin to deal with the limitations of that method: "there are limitations with regard to MM-PBSA accurately predicting absolute binding free energies (Genheden & Ryde, 2015; Hou et al., 2011) that depends on the parameterization of the ligand (Oostenbrink et al., 2004)." What are the assumptions and limitations in taking continuum electrostatics (presumably with parameters for dielectric constants and their assignments to regions after discarding solvent), surface area (with its assumptions and limitations), and of course assuming vibration of a normal mode can capture entropy. On page 30, regarding their vibrational entropy estimate, they write that the "entropy term provides insights into the disorder within the system, as well as how this disorder changes during the binding process". It is important that the extent of disorder captured by the vibrational estimate be discussed, as it is not obvious that it has captured entropy involving multiple minima on the system's true 3N-dimensional energy surface, and especially the contribution from solvent disorder in bound Vs dissociated states.

      As discussed above, errors in the free energy estimates need to be more faithfully represented, as fractional errors are not meaningful. On page 21 the authors write "The match improved when free energy ratios rather than absolute values were compared." But a ratio of free energies is not a typical or expected measure of error in delta G. They also write "For ACh and CCh, there is good agreement between.Gm1 and GLA and between.Gm3 and GHA. For these agonists, in silico values overestimated experimental ones only by ~8% and ~25%. The agreement was not as good for the other 2 agonists, as calculated values overestimated experimental ones by ~45%(Ebt) and ~130% (Ebt). However, the fractional overestimation was approximately the same for GLA and GHA." See the above comment on how this may misrepresent the error. On page 21 they write, in relation to their large fractional errors, that they "do not know the origin of this factor but speculate that it could be caused by errors in ligand parameterization". However the estimates from the PBSA approach are, by design, only approximate. Both errors in parameterisation (and their likely origin) and the approximate model used, need discussion.

      Again, the goal of calculating binding free energy was to identify structural correspondence to LA and HA and not to obtain absolute binding free energy values. Along with the least variance (distribution) for the principle component for m3, it also had the highest binding free energy. An association of m1 to LA and m3 to HA was done after comparing them to experimental values (efficiencies). This comparison not only validates our approach but also underscores the utility of PBSA in supplementing MD and PCA analyses with broader energetics perspectives.

      Reviewer #3 (Public Review):

      Weaknesses:

      Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for two other ligands were significantly different than the experiment. It is unclear to what extent the choice of method for the energy calculations influenced the results. See above.

      A control simulation, such as for an apo site, is lacking. Figure 2-figure supplement 2. shows the results of 200 ns MD simulations of the apo structure (n=2).

      Reviewer #4 (Public Review):

      Weaknesses:

      Timescales (200 ns) do not capture global rearrangements of the extracellular domain, let alone gating transitions of the channel pore, though this work may provide a launching point for more extended simulations. A more general concern is the reproducibility of the simulations, and how representative states are defined. It is not clear whether replicates were included in principal component analysis or subsequent binding energy calculations, nor how simulation intervals were associated with specific states.

      We are interested eventually in using MD to study the full isomerization, but these investigations are for the future and likely will involve full length pentamers and longer timescales. However, in response to this query we have in the Discussion raised this issue and offer speculations. See above, PCA has be compared between replicates (Figure 3-figure supplement 1).

      Structural analysis largely focuses on snapshots, with limited direct evidence of consistency across replicates or clusters. Figure legends and tables could be clarified.

      Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories. Incorporated in the legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study gives interesting insights into the possible dynamics of ligand binding in ACh receptors and establishes some prerequisites for necessary and urgent further work. The broad interest in this receptor class means this work will have some reach.

      Suggestions:

      (1) I found the citation of relevant literature to be rather limited. In the following paper, the agonist glutamate was shown to bind in two different orientations, and also to convert. These are much longer simulations than what is presented here (nearly 50 µs), which allowed a richer view of conformational changes and ligand binding dynamics in the AMPA Receptor. Albert Lau has published similar work on NMDA, delta, and kainate receptors, including some of it in eLife. Perhaps the authors could draw some helpful comparisons with this work.

      Yu A et al. (2018) Neurotransmitter Funneling Optimizes Glutamate Receptor Kinetics. Neuron

      Likewise, the comparison to a similar piece of work on glycine receptors (not cited, https://pubs.acs.org/doi/10.1021/bi500815f) could be instructive. Several similar computational techniques were used, and interactions observed (in the simulations) between the agonist and the receptor were tested in the context of wet experiments. In the absence of an equivalent process in this paper (no findings were tested using an orthogonal approach, only compared against known results, from perhaps a narrow spectrum of papers), we have to view the major findings of the paper (docking in cis that leads to a ligand somersault) with some hesitancy.

      The Gharpure 2019 paper is cited in the context of the delta subunit but this paper was about a3b4 neuronal nicotinic receptors. This could be tidied up. Also, the simulations from that paper could be used as an index of the stability of the HA state (if ligand orientation is being cited as transferrable, other observations could be too).

      New citations have been added. It is difficult to generalize from Yu A and Yu R eta al, because in neither study was the ligand orientation associated with LA versus HA binding energy.

      (2) "To start, we associated the agonist orientation in the hold end states as cis in AC-LA versus trans in AC-HA."

      I think this a valid start, but one is left with the feeling that this is all we have and the validity of the starting state is not tested. What was really shown here? Is the docking reliable? What evidence can the authors summon for the ligand orientation that they use as a starting structure? In addition to docking energies, the match between PBSA and electrophysiology Gs and temporal sequence (m1-m2-m3) support the assignment.

      Given that these simulations cover a circumscribed part of the binding process, I think the limitations should be acknowledged. Indeed the authors do mention a number of remaining open questions.

      Paragraphs regarding 'catch' have been added to the Discussion.

      (3) Results around line 90. Hypothetical structures and states that were determined from Markov analyses are discussed as if they are well understood and identified. Plausible though these are, I think the text should underline at least the source of such information. In these simulations, a further intermediate has been identified.

      The model in Figure 1B was first published in 2012 and has been used and extended over the intervening years. In our lab, catch-and-hold is standard. We have published many papers (in top journals), plus reviews, regarding this scheme. We made presentations that are on Youtube. Here, at the end of the Introduction we now cite a new review article (Biophysical Journal, 2024). I am not sure what more we can do to raise awareness regarding catch and hold.

      (4) The figures are dense and could be better organised. Figure 2 is key but has a muddled organization. The placement of the panel label (C) makes it look like the top row (0 ns) is part of (A). Panel B- what is shown in the oval inset (not labeled or in legend). Why not show more than one view, perhaps a sequence of time points? It is confusing to change the colour of the loops in (C). Please show the individual values in D.

      Figure 2 has been redone.

      (5) A lot is made of the aK145 salt bridge with aD200 and the distances - but I didn't see any measurements, or time course. This part is vague to the point of having no meaning ("bridge tightening").

      We present a Table of distance measurements in the SI (Figure 6-table supplement 1).

      Reviewer #2 (Recommendations For The Authors):

      All main comments have been given in the above review. There are a few other minor comments below.

      The 4 agonists examined were acetylcholine (ACh), carbamylcholine (CCh), epibatidine (Ebt), and epiboxidine (Ebx). Could the choices be motivated for the reader?

      New in Methods: the agonists are about the same size yet represent different efficiency classes (citation to companion eLife paper). One of our (unmet) objectives was to understand the structural correlates of agonist efficiency.

      The authors write that state structures generated in the MD simulation were identified by aligning free energy values with those from experiments. It would be good to explain to the reader, in the introduction, how LA and HA free energies were extracted from experiments, rather than relying on them to read older papers.

      In the Introduction, we say that to get G, just measure an equilibrium constant and take the log. We think it is excessive to explain in detail in this paper how to measure the equilibrium binding constants (several methods suffice). However, we have added in Methods our basic approach: measure KLA and L2 by using electrophysiology, and compute KHA from the thermodynamic cycle using L0. We think this paper is best understood in the context of its companion, also in eLife.

      In all equilibrium equations of the type A to B (e.g. on page 5), rather than using "=" signs it would be much better to use equilibrium reversible arrow symbols.

      It is incorporated.

      Reviewer #3 (Recommendations For The Authors):

      (1) Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for Ebt and Ebx were significantly different than the experiment. Are there any alternative methods for calculating binding energies from the MD simulations that could be readily compared to?

      See above. We did not use more sophisticated energy calculations because we already knew the answers. Our objective was to identify states, not to calculate energies.

      (2) It would be nice to see control simulations of an apo site to ensure that the conformational changes during the MD are due to the ligands and not an artifact of the way the system is set up. I am primarily asking about this as the simulation of the isolated ECDs for the binding site interface seems like it may be unhappy without the neighboring domains that would normally surround it. On that note, was the protein constrained in any way during the MD?

      Apo simulation results are presented in Figure 2-figure supplement 2. The dimer interface seems to be adequate (stable).

      (3) Figure 4A-B: Should the colors for m1 and m3 be reversed?

      Colors have been changed and a bar chart has been added.

      Reviewer #4 (Recommendations For The Authors):

      (1) Although simulations are commendably run in triplicate, it is difficult in some places to discern their consistency.

      (1a) Table S1 provides important quantification of deviations in different replicates and with different agonists. Please confirm that the reported values are accurate. All values reported for the epibatidine system are identical to those reported for carbamylcholine, which seems statistically improbable. Similarly, runs 1 and 3 with epiboxidine seem identical to one another, and runs 1 and 2 with acetylcholine are nearly the same.

      Figure 2-Source Data 1 has been corrected.

      (1b) In reference to Figure S3, the authors comment that the simulated system (one replicate with carbamylcholine) converges within 0.5 Å RMSD of a desensitized experimental structure. This seems amazing; please specify over what atoms this deviation was calculated and with reference to what alignment. It would be interesting to know the reproducibility of this remarkable convergence in additional replicates or with other ligands; for example, Figure 5 indicates that loop C transitions to a lesser extent in the context of epibatidine than other agonists.

      The comparison was for the entire dimer ECD; 0.5 Å is the result. It may be worthwhile to pursue this remarkable convergence, but not in this paper. Here, we are concerned with identifying ACLA and ACHA. Similarity between ACHA and AD structures is for a different study.

      (1c) For principal-component and subsequent analyses, it appears that only one trajectory was considered for each system. Please clarify whether this is the case; if so, a rationale for the selection would be helpful, and some indication of how reproducible other replicates are expected to be.

      We have added new PCA results (Results, Figure 3-figure supplement 1) that show comparable principal components in other replicates.

      (2) Figure 3 shows free energy landscapes defined by principal components of fluctuation in Cα positions.

      (2a) Do experimental structures (e.g. PDB IDs 6UWZ, 7QL6u) project onto any of these landscapes in informative ways?

      6UWZ.pdb matches well with the apo (7QKO.pdb), comparable to m1, and 7QL6.pdb with the m3.

      (2b) Please indicate the meaning of colored regions in the righthand panels.

      The color panels in the top left panel indicate the colored regions in the righthand panel also, which is indicative of direction and magnitude of changes with PC1 and PC2.

      (2c) Please also check the legend; do the porcupine plots really "indicate the direction and magnitude of changes between PC1 and PC2," or rather between negative and positive values of each principal component?

      It indicates the direction and magnitude of changes with PC1 and PC2.

      (3) It would be helpful to clarify how trajectory segments were assigned to specific minima, particularly m2 and m3.

      (3a) Please verify the timeframes associated with the m2 minima, reported as "20-50 ns [with acetylcholine], 50-60 ns [with carbamylcholine], 60-100 ns [with epibatidine, and] 100-120 ns [with epiboxidine]." It seems improbable that these intervals would interleave so precisely in independent systems. Furthermore, the intervals associated with acetylcholine and epiboxidine do not appear to correspond to the m2 regions indicated in Figure S8.

      Times are given in Figure 4-Source Data 1 and Figure 3-figure supplement 2. The m2 classification is based on loop displacement as well as agonist orientation. For all agonists, the selection was strictly from PCA and cluster analysis.

      (3b) The text (and legend to Figure 3) indicate that 180+ ns of each trajectory was assigned to m3, which seems surprisingly consistent. However, Figure S5 indicates this minimum is more variable, appearing at 160 ns with acetylcholine but at 186 ns with carbamylcholine. Please clarify.

      see above: the selection was from PCA and cluster analysis. Times are in Figure 3-figure supplement 2 and also in Figure 4-Source Data 1 (none in Fig. 3 legend).

      (3c) Figures 5, 6, S6, and S7 illustrate structural features of free-energy minima in each ligand system. Please clarify what is shown, e.g. a representative snapshot, centroid, or average structure from a particular prominent cluster associated with a given minimum.

      They are all representative snapshots (now in Methods). Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories.

      (4) Figure S4 helpfully shows the behavior of a pentameric control system; however, some elements are unclear.

      (4a) The 2.5-6.5 Å jump in RMSD at ~40 ns seems abrupt; can it be clarified whether this corresponds to a transition to either m2 or m3 poses, or to another feature of e.g. alignment?

      Figure 2-figure supplement 4 left bottom is just the ligand. The jump is the flip, m1 to m2.

      (4b) It seems difficult to reconcile the apparently bimodal distribution of states with the proposed 3-state model. Into which RMSD peak would the m2 intermediate fall?

      The simulations are only to 100 ns, where we found a complete flip of the agonist represented in the histograms. This confirmed that dimer showed similar pattern as the pentamer. In depth analysis was only done only on dimers.

      (4c) The top panel is labeled "Com" with a graphical legend indicating "ACh." Does this indicate the ligand or, as described in the text legend, "the pentamer" (i.e. the receptor)? For both panels, please verify whether they are calculated on the basis of center-of-mass, heavy atoms, Cα, etc.

      "Com" (for complex) has been changed to system (protein+ligand).

      (5) Minor concerns:

      (5a) In Figures 1 and S3, correct the PDB references (6UWX and 7QL7 are not nAChRs).

      They are now corrected.

      (5b) In Figure 4, do all panels represent mean {plus minus} standard deviation calculated across all cluster-frames reported in Table 1?

      Yes.

      Also check the graphical legend in panel A: presumably the red bars correspond to m1/LA, and the blue to m3/HA?

      Corrected

      (5c) In the legend to Figure S1, please clarify that panel B is reproduced from Indurthi & Auerbach 2023.

      This figure has been deleted.

      (5d) As indicated in Figure S2, it seems surprising that the RMSF is so apparently low at the periphery, where the subunits should contact neighbors in the extracellular domain; how might the authors account for this? Specify whether these results apply to all replicates of each system.

      The redness in the periphery for all four systems indicates the magnitude of fluctuation. As we focus on the orthosteric site, we highlight the loops around the agonist binding pocket and kept other regions 75% transparent. We now include Apo simulations and the dimer appears to be stable even without an agonist present.

      (5e) Within each minimum in Figure S5, three "prominent" clusters appear to be colored (by heteroatom) with carbons in cyan, pink, and yellow respectively. If this is correct, note these colors in the text legend.

      Colors have been added to the legend.

      (5f) In Figure S6, note in the legend that key receptor sidechains are shown as spheres, with the ligand as balls-and-sticks, and that ligand conformations in both low- and high-affinity complexes are shown in both receptor states for comparison.

      This is now added in the legend.

      (5g) The legend to Figure S6 also notes "The agonists are as in Fig S4," but that figure contains a single replicate of a different system; please check this reference.

      This has been updated to Figure 5.

      (5h) In Figure S8, the colors in the epibatidine system appear different from the others.

      The colors are the same for m1, m2 and m3 in all systems including epibatidine.

      (5i) In Table 1, does "n clusters" indicate the number of simulation frames included in the three prominent clusters chosen for MM-PBSA analysis? Perhaps "n frames" would be more clear.

      It was a good suggestion. It has now been changed to ‘n frames’

      (5j) Pg 24-ln 453 presumably should read "...that separate it from m1 and m3..."

      This sentence is now changed in the discussion.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We thank you very much for the positive evaluation of our manuscript and your encouragement to continue in this fascinating topic. In this version we made minor changes in the text to address the comments and suggestion of the second reviewer and increase the clarity of the text.

      Reviewer #2 Recommendation to the authors

      We thank the reviewer for the sharp comments that help us improve the clarity of the paper. Below we list the changes we made to correct and revise the paper in accordance to the reviewer’s comments.

      (1) Line 90. Isn't the genus Paracentrotus?

      Yet it is, thank you. We corrected the typo.

      (2) Figure 1 and supplementary figure 2. To this reviewer supplementary Figure 2 doesn't really help the story as written in the paragraph from line 96-110. You want to report expression of ROCK in skeletogenic cells. You do that quite well in Figure 1. Since Fig. S2 reports whole embryo expression of ROCK when only 5% of the cells in the embryo are the subject of interest here, and the Axitinib is selective, presumably for skeletogenic cells, the relative lack of effect in Fig. S2 is not surprising and again, doesn't really help the theme you wish to establish by focusing on the role of ROCK in skeletogenic cells over time. If anything, the data reported in Fig. S2 shows that perturbation of VEGF signaling has very little effect embryo-wide, while Fig. 1 shows that perturbation of VEGF signaling has a noticeable effect on ROCK expression in skeletogenic cells. If you choose to keep Fig. S2, I recommend that you indicate that embryo-wide vs skeletogenic cell difference more succinctly than given at present. It will also strengthen your paragraph in lines 110-127.

      The importance of the western blot presented in Fig. S2 is to validate that the antibody recognizes a protein of the expected size. This strengthen the credibility of this commercial antibody to detect the sea urchin ROCK protein. We agree with the reviewer that the fact that the skeletogenic cells are less than 5% of the embryonic cells is important to explain why we didn’t see an affect of VEGFR inhibition in the western blot, and we changed the text to express it (lines 108-111): “Yet, this measurement was done on proteins extracted from whole embryos, of which the skeletogenic cells, where VEGFR is active, are less than 5% of the total cell mass (42). We therefore wanted to study the spatial expression of ROCK and specifically, its regulation in the skeletogenic cells.”

      (3) Comparison of Fig. 2 and Fig. S3. To me the reader is confused when Fig. S3 is 33hpf as reported in the text (but not in the figure legend), and Fig. 2 shows 2 day old embryos - on the figure and figure legend but not in the text. So, the reader sees the text indicating 33hpf and looks around and the figure 2 says 2dpf. Does that mean 33hpf = 2dpf, the reader is thinking. To clarify, I suggest including the 2dpf in the text or simply drop the time in the text and report it in the two figures. Further, in the middle of the paragraph 130-143 you switch from reporting on Fig.S3 to Fig. 2, yet the reader doesn't know that. The reader is still looking at Fig. S3. The problem here is that at 33hpf the skeleton doesn't yet show the reduction or abnormalities that are shown later at 2dpf in Fig. 2. In clarifying this paragraph both the reduction in ROCK expression and the subsequent alterations in growth and patterning of the skeleton will be clear to the reader.

      Thank you for raising this point. We added in the caption of Fig. S3 that the measurements were done in 33hpf. We also added in the text, that the observations of the skeletogenic phenotypes were done at 2dpf (48hpf). We made a break between the first paragraph discussing Fig. S3 and the paragraph discussing Fig. 2.

      (4) The experiment with Y27632, an inhibitor of ROCK, is significantly improved in this revision. The concern earlier was the possibility that at the concentration used there might be off-target effects since other kinases are affected by higher concentrations of this selective inhibitor. The authors have modified this component of the paper and performed experiments at lower concentrations where other reports indicate the inhibitor is highly selective for ROCK, and they still demonstrate an inhibition of skeletal production. This, plus the added citations greatly increases confidence that this inhibition is selective for ROCK, thus enabling a stronger conclusion that ROCK has a role in skeletal growth and patterning.

      Thank you for asking us to test this lower concentration which improved the credibility of our findings.

      Line 239 - should be: indicating instead of indicting We corrected that.

      (5) Line 402-403."The first step in generating the sea urchin spicules is the construction of the spicule cavity, a membrane filled with calcium carbonate and coated with F-actin (Fig. 8A)". I suggest more precise language. The way this now reads (above) is that somehow the spicule cavity is a membrane and that membrane is filled with CaCO3. And further the membrane is coated with F-actin. Isn't the spicule cavity what is filled with CaCO3? And isn't that cavity surrounded by a membrane? And the F-actin must be in the cortex of the cell since there is very little cytoplasm associated with the pseudopodial extensions that surround the spicule.

      We change this sentence to: “The first step in generating the sea urchin spicules is the construction of the spicule cavity where the mineral is engulfed in a membrane coated with F-actin” (lines 403-404). Our observations show that F-actin is enriched around the spicule cavity. It could be an extension of the cell cortex, but we did not prove it, so we prefer to simply describe what we saw.

      Line 405-408. Thank you for putting in this unknown. It is important to point out that while you've shown that ROCK contributes to regulation of actomyosin, it is not clear whether this is direct or indirect. You have also shown that ROCK somehow contributes to regulation of the GRN that leads to skeletogenesis. Thus, your data are consistent in showing that ROCK perturbation cripples normal skeletogenesis both via morpholino and with a selective inhibitor. Your last part of the discussion then offers speculation as to what might be affected specifically. That discussion sets the stage for digging even deeper to identify specific targets of ROCK activity.

      Thank you, we agree with you that there is an exciting road ahead of us!

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We will make the suggested change of adding “sequence”. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. And the example given is not helpful to ascertain what other changes may be necessary, since we cannot see any problem with the sentence “we assembled a chromosome-scale genome of” as this phrase is widely used in many similar publications.

      As for panel E of figure 2, it is not missing. The panel located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      • Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced, individual tissues derived from juvenile fish were used for the genome annotation and whole larval fish for the developmental analysis. We will specify in the figures and text that the results shown are from whole larvae, and add more detail to the material and methods section about which type of sample was analysed in which way.

      • The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We will upload the code used to assemble and annotate this genome to a public repository or add it to the supplementary material.

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material. The annotation workflow has been described in a previous publication already, but an in-depth description can also be found in the Material and methods section, including parameters used for specific steps. The RNA-seq mapping and analysis part has also been described in the Material and Methods section, including parameters and models for DESEq2.

      • Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications. This would really help (general) readers.

      We will add a comment on this in the manuscript as suggested. Basically, the T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormone-mediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      • Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      As the reviewer notes, there are a large number of differentially expressed genes due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60.

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). Normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/): “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods; an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the possible interaction of corticoids and TH during metamorphosis, a question that is certainly not settled yet in teleost fishes, and 2) the existence of an early larval developmental step also involving TH and corticosteroids.

      Especially 2) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of our study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remain a contentious issue.

      We wrote that “there is contrasting evidence of communication between these two pathways [TH and corticosteroids] in teleost fish with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish (ref 19), golden sea bream (ref 100) and silver sea bream (ref 101). Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder (ref 20). TH exposure increases MR and GR genes expression in zebrafish embryo (ref 55). It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner (ref 56) On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp, decreasing cortisol levels (ref 57), whereas cortisol exposure decreases TH levels in European eel (ref 58). Given this scattered evidence, the existence of a crosstalk active during teleost metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis and cortisol synthesis are activated (i) during early development and (ii) during metamorphosis. This may suggest that in some aspect cortisol synthesis can work in concert with TH, as has been shown in several different contexts in amphibians (ref 17).” In the revised manuscript, we will also add the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field.

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiment in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period.

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. We will make this clear in the revised manuscript.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption, 2) that there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) that we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. To reinforce our point, we plan to add a figure to the revised manuscript, which puts our data in the context of earlier studies done in grouper. This will clearly show that our inference is correct. Additionally, we would like to point out that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. We will make sure to improve the clarity of the revised version of the manuscript to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here.

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We will improve the clarity of the manuscript to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared line 195 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction.

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We will correct this in the revised version.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). We are planning to add a figure reconciling all these data together. However, the main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we will pay attention to improve that part too.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message is unclear, which we will address in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We will also make sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. Author Response

      We provide here a provisional response to the Public Comments and main issues raised by the reviewers. We appreciate the opportunity to submit a revision and will give all of the reviewers’ comments careful consideration when modifying the manuscript.

      (1) BioRxiv version history.

      Reviewer 1 correctly noted that we have posted different versions of the paper on bioRxiv and that there were significant changes between the initial version and the one posted as part of the eLife preprint process. Here we provide a summary of that history.

      We initially posted a bioRxiv preprint in November, 2021 (Version 1) that included the results of two experiments. In Experiment 1, we compared conditions in which the stimulation frequency was at 2 kHz, 3.5 kHz, or 5.0 kHz. In Experiment 2, we replicated the 3.5 kHz condition of Experiment 1 and included two amplitude-modulated (AM) conditions, with a 3.5 kHz carrier signal modulated at 20 Hz or 140 Hz. Relative to the sham stimulation, non-modulated kTMP at 2 kHz and 3.5 kHz resulted in an increase in cortical excitability in Experiment 1. This effect was replicated in Experiment 2.

      In the original posting, we reported that there was an additional boost in excitability in the 20 Hz AM condition above that of the non-modulated condition. However, in re-examining the results, we recognized that the 20 Hz AM condition included an outlier that was pulling the group mean higher. We should have caught this outlier in the initial submission given that the resultant percent change for this individual is 3 standard deviations above the mean. Given the skew in the distribution, we also performed a log transform on the MEPs (which improves the normality and homoscedasticity of MEP distributions) and repeated the analysis. However, even here the participant’s results remained well outside the distribution. As such, we removed this participant and repeated all analyses. In this new analysis, there was no longer a significant difference between the 20 Hz AM and nonmodulated conditions in Experiment 2. Indeed, all three true stimulation conditions (nonmodulated, AM 20 Hz, AM 140 Hz) produced a similar boost in cortical excitability compared to sham. Thus, the results of Experiment 2 are consistent with those of Experiment 1, showing, in three new conditions, the efficacy of kHz stimulation on cortical excitability. But the results fail to provide evidence of an additional boost from amplitude modulation.

      We posted a second bioRxiv preprint in May, 2023 (Version 2) with the corrected results for Experiment 2, along with changes throughout the manuscript given the new analyses.

      Given the null results for the AM conditions, we decided to run a third experiment prior to submitting the work for publication. Here we used an alternative form of amplitude modulation (see Kasten et. al., NeuroImage 2018). In brief, we again observed a boost in cortical excitability in from non-modulated kTMP at 3.5 kHz, but no additional effect of amplitude modulation. This work is included in the third bioRrxiv preprint (Version 3), the paper that was submitted and reviewed at eLife.

      (2) Statistical analysis.

      Reviewer 1 raised a concern with the statistical analyses performed on aggregate data across experiments. We recognize that this is atypical and was certainly not part of an a priori plan. Here we describe our goal with the analyses and the thought process that led us to combine the data across the experiments.

      Our overarching aim is to examine the effect of corticospinal excitability of different kTMP waveforms (carrier frequency and amplitude modulated frequency) matched at the same estimated cortical E-field (2 V/m). Our core comparison was of the active conditions relative to a sham condition (E-field = 0.01 V/m). We included the non-modulated 3.5 kHz condition in Experiments 2 and 3 to provide a baseline from which we could assess whether amplitude modulation produced a measurable difference from that observed with non-modulated stimulation. Thus, this non-modulated condition as well as the sham condition was repeated in all three experiments. This provided an opportunity to examine the effect of kTMP with a relatively large sample, as well as assess how well the effects replicate, and resulted in the strategy we have taken in reporting the results.

      As a first step, we present the data from the 3.5 kHz non-modulated and sham conditions (including the individual participant data) for all three experiments in Figure 4. We used a linear mixed effect model to examine if there was an effect of Experiment (Exps 1, 2, 3) and observed no significant difference within each condition. Given this, we opted to pool the data for the sham and 3.5 kHz non-modulated conditions across the three experiments. Once data were pooled, we examined the effect of the carrier frequency and amplitude modulated frequency of the kTMP waveform.

      (3) Carry-over effects

      As suggested by Reviewer 1, we will examine in the revision if there is a carry-over effect across sessions (for the most part, 2-day intervals between sessions). For this, we will compare MEP amplitude in baseline blocks (pre-kTMP) across the four experimental sessions.

      Reviewer 1 also commented that mixing the single- and paired-pulse protocols might have impacted the results. While our a priori focus was on the single-pulse results, we wanted to include multiple probes given the novelty of our stimulation method. Mixing single- and different paired-pulse protocols has been relatively common in the noninvasive brain stimulation literature (e.g., Nitsche 2005, Huang et al, 2005, López-Alonso 2014, Batsikadze et al 2013) and we are unaware of any reports suggested that mixed designs (single and paired) distort the picture compared to pure designs (single only).

      (4) Sensation and Blinding

      Reviewer 2 bought up concerns about the sham condition and blinding of kTMP stimulation. We do think that kTMP is nearly ideal for blinding. The amplifier does emit an audible tone (at least for individuals with normal hearing) when set to an intensity to produce a 2 V/m E-field. For this reason, the participants and the experimenter wore ear plugs. Moreover, we played a 3.5 kHz tone in all conditions, including the sham condition, which effectively masked the amplifier sound. We measured the participant’s subjective rating of annoyance, pain, and muscle twitches after each kTMP session (active and sham). Using a linear mixed effect model, we found no difference between active and sham for each of these ratings suggesting that sensation was similar for active and sham (Fig 8). This matches our experience that kHz stimulation in the range used here has no perceptible sensation induced by the coil. To blind the experimenters (and participants) we used a coding system in which the experimenter typed in a number that had been randomly paired to a stimulation condition that varied across participants in a manner unknown to the experimenter.

      Reviewer 1 asked why we did not explicitly ask participants if they thought they were in an active or sham condition. This would certainly be a useful question. However, we did not want to alert them of the presence of a sham condition, preferring to simply describe the study as one testing a new method of non-invasive brain stimulation. Thus, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session.

    1. Author Response

      Provisional Response to Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival, and seedling growth of an invasive forb, Ageratina adenophora, and found evidence of stronger effects of leaf microbes on Ageratina compared with soil microbes, which were negative for seed germination and seedling survival but positive for seedling growth. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedback.

      Thank you very much for your assessment.

      Strengths:

      (1) The theoretic framework is well-established.

      (2) Relating the direction of plant-microbe feedback to certain microbial guilds is always hard, but the authors have done a great job of identifying and interpreting such relationships.

      Thank you very much for your assessment.

      Weaknesses:

      (1) In the G0 and G21 inoculation experiments, allelopathic effects from leaf litters had not been accounted for, while these two experiments happened to be the ones where negative feedback was detected.

      We did not directly test the allelopathic effects. However, our inoculation of sterile litter or soil indicated the potential allelopathic role in germination and seedling mortality. Interestingly, such allelopathic effects are elicited by leaf litter not by soil, which include delaying germination time (see Fig. 1a) and killing some seedlings (see Fig. 1c). Nonetheless, microbial effects are significantly more adverse than allelopathic (also see Fig. 1e). We will discuss this point in the resubmitted version.

      (2) The authors did not compare the fungal strains accumulated in dead seedlings to those accumulated in live seedlings to prove that the live seedlings indeed accumulated lower abundances of the strains that were identified to increase seedling mortality.

      Thanks for your concerns. We have not isolated fungi from healthy seedlings to make a comparative study. However, our team work previously found that the seedling-killing Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with mature A. adenophora individual; some seedling-killing Alternaria also occur in healthy seedlings inoculated by leaf litter. We thus assumed that these seedling-killing fungi, e.g., Allophoma and Alternaria, likely exist in A. adenophora mature individual by a lifestyle switch from endophytic to pathogenic, and these fungi can kill seedling only at very early life stage of A. adenophora.

      Thus, we discussed this point as: “We did not isolate fungi from healthy seedlings in this study. However, a previous report revealed that the dominant genera in healthy seedlings inoculated with leaf litter were Didymella and Alternaria (Kai Fang et al., 2019). Based on these results, these fungal genera likely exist in A. adenophora by a lifestyle switch from endophytic to pathogenic. The virulence of these strains for seedling survival under certain conditions may play an essential role in limiting the population density of A. adenophora monocultures.” See Lines 416-435.

      Here, we also will consider adding more sentences to discuss your concerns in the resubmitted version as: “It is worth to explore the dynamic of these strains along with seedling development and to determine if these strains kill seedling only at very early stage.”

      (3) The data of seed germination and seedling mortality could have been analyzed in the same manner as that of seedling growth, which makes the whole result section more coherent. I don't understand why the authors had not calculated the response index (RI) for germination/mortality rate and conducted analyses on the correlation between these RIs with microbial compositions.

      Thanks so much. Response index (RI) was calculated as: (variablenon-sterile–variablesterile)/variablesterile)). Because mortality rates of some sterile groups were zero values, it is impossible to calculate their RIs. Relatively, microbes rarely affect seed germination time (GT) and rate (GR) (see Fig. 1a,b). Therefore, we preferred to make a direct comparison of their difference between non-sterile and sterile treatments (see also Figure S2), and we also conducted a correlation by these values with microbial compositions rather than by RIs (see Fig. 4).

      We will emphasis this point in the Materials and Methods when resubmit our revision.

      (4) The language of the manuscript could be improved to increase clarity.

      We will improve this in the resubmitted version.

      Reviewer #2 (Public Review):

      Summary:

      The study provides strong evidence that leaf microbes mediate self-limitation at an early life stage. It highlights the importance of leaf microbes in population establishment and community dynamics.

      Thank you very much for your assessment.

      The authors conducted three experiments to test their hypothesis, elucidating the effects of leaf and soil microbial communities on the seedling growth of A. adenophora at different stages, screening potential microbial sources associated with seed germination and seedling performance, and identifying the fungus related to seedling mortality. The conclusions are justified by their results. Overall, the paper is well-structured, providing clear and comprehensive information.

      Thank you very much for your assessment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Public Reviewer Comments

      We again thank the reviewers for the time and effort they clearly put into reviewing our manuscript. We have revised our manuscript to take into account the majority of their suggestions, primary among them being refinements of our model and classification approach, detailed sensitivity analysis of our model, and several new simulations. Their very constructive feedback has resulted in what we feel is a much-improved paper. In what follows, we respond to each of their points.

      Reviewer #1:

      COMMENT: The reviewer suggested that our control policy classification thresholds should be increased, especially if the behavioral labels are to be subsequently used to guide analyses of neural data which “is messy enough, but having trials being incorrectly labeled will make it even messier when trying to quantify differences in neural processing between strategies.”

      REPLY: We appreciate the observation and agree with the suggestion. In the revised manuscript, we simplified the model (as another reviewer suggested), which allowed for better training of the classifier. This enabled an increase in the threshold to 95% to have more confidence in the identified control strategies. Figures 7 and 8 were regenerated based on the new threshold.

      COMMENT: The reviewer asked if we could discuss what one might expect to observe neurally under the different control policies, and also suggested that an extension of this work could be to explore perturbation trials, which might further distinguish between the two control policies.

      REPLY: It is indeed interesting to speculate what neural activity could underlie these different behavioral signatures. As this task is novel to the field, it is difficult to predict what we might observe once we examine neural activity through the lens of these control regimes. We hope this will be the topic of future studies, and one aspect worthy of investigation is how neural activity prior to the start of the movement may reflect two different control objectives. Previous work has shown that motor cortex is highly active and specific as monkeys prepare for a cued movement and that this preparatory activity can take place without an imposed delay period (Ames et al., 2014; Cisek & Kalaska, 2005; Dekleva et al., 2018; Elsayed et al., 2016; Kaufman et al., 2014; Lara et al., 2018; Perich et al., 2018; Vyas et al., 2018; Zimnik & Churchland, 2021). It seems possible that the control strategies we observed correspond to different preparatory activity in the motor cortex. We added these speculations to the discussion.

      The reviewer’s suggestion to introduce perturbations to probe sensory processing is very good and was also suggested by another reviewer. We therefore conducted additional simulations in which we introduced perturbations (Supplementary Material; Figure S10). Indeed, in these model simulations the two control objectives separated more. However, testing these predictions via experiments must await future work.

      COMMENT: “It seems like a mix of lambda values are presented in Figure 5 and beyond. There needs to be some sort of analysis to verify that all strategies were equally used across lambda levels. Otherwise, apparent differences between control strategies may simply reflect changes in the difficulty of the task. It would also be useful to know if there were any trends across time?”

      REPLY: We appreciate and agree with the reviewer’s suggestion. We have added a complementary analysis of control objectives with respect to task difficulty, presented in the Supplementary Material (Figures S7 and S8). We demonstrate that, overall, the control objectives remain generally consistent throughout trials and difficulty levels. Therefore, it can be concluded that the difference in behavior associated with different control objectives does not depend on the trial sequence or difficulty of the task. A statement to this extent was added to the main text.

      COMMENT: “Figure 2 highlights key features of performance as a function of task difficulty. …However, there is a curious difference in hand/cursor Gain for Monkey J. Any insight as to the basis for this difference?”

      REPLY: The apparently different behavior of Monkey J in the hand/cursor RMS ratio could be due to subject-to-subject variability. Given that we have data from only two monkey subjects, we examined inter-individual variations between human subjects in the Supplementary Material by presenting individual hand/cursor gain data for all individual human subjects (Figure S1). As can be seen, there was indeed variability, with some subjects not exhibiting the same clear trend with task difficulty. However, on average, the RMS ratio shows a slight decrease as trials grow more difficult, as was earlier shown in Figure 2. We added a sentence about the possibility of inter-individual variations to address the difference in behavior of monkey J with reference to the supplementary material.

      Reviewer #2:

      (Reviewer #2's original review is with the first version of the Reviewed Preprint. Below is the authors' summary of those comments.)

      COMMENT: The reviewer commends the care and effort taken to characterize control policies that may be used to perform the CST, via dual human and monkey experiments and model simulations, noting the importance of doing so as a precursor to future neural recordings or BMI experiments. But the reviewer also wondered if it is all that surprising that different subjects might choose different strategies: “... it makes sense that different subjects might choose to favor different objectives, and also that they can do so when instructed. But has this taught us something about motor control or simply that there is a natural ambiguity built into the task?”

      REPLY: The redundancy in the task that allowed different solutions to achieve the task was deliberate, and the motivation for choosing this task for this study. We therefore did not regard the resulting subject-to-subject variability as a finding of our study. Rather, redundancy and inter-individual variability are features ubiquitous in all everyday actions and we explicitly wanted to examine behavior that is closer to such behavior. As commended by the reviewers, CST is a rich task that extends our research beyond the conventional highly-constrained reaching task. The goal of our study was to develop a computational account to identify and classify such differences to better leverage future neural analyses of such more complex behaviors. This choice of task has now been better motivated in the Introduction of the revised manuscript.

      COMMENT: The reviewer asks about our premise that subjects may use different control objectives in different trials, and whether instead a single policy may be a more parsimonious account for the different behavioral patterns in the data, given noise and instability in the system. In support of this view, the reviewer implemented a simple fixed controller and shared their own simulations to demonstrate its ability to generate different behavioral patterns simply by changing the gain of the controller. The reviewer concludes that our data “are potentially compatible with any of these interpretations, depending on which control-style model one prefers.”

      REPLY: We first address the reviewer’s concern that a simple “fixed” controller can account for the two types of behavioral patterns observed in Experiment 2 (instructed groups) by a small change in the control gain. We note that our controller is also fixed in terms of the plant, the actuator, and the sensory feedback loop; the only change we explore is in the relative weights of position vs. velocity in the Q matrix. This determines whether it is deviations in position or in velocity that predominate in the cost function. This, in turn, generates changes in the gain vector L in our model, since the optimal solution (i.e. the gains L that minimize the cost function) depends on the Q matrix as well as the dynamics of the plant (specifically, the lambda value). Hence, one could interpret the differences arising from changes in the control objective (the Q matrix) as changes in the gains of our “fixed” controller.

      More importantly, while the noise and instability in the system may indeed occasionally result in distinct behavioral patterns (and we have observed such cases in our simulations as well), these factors are far from giving an alternative account for the structural differences in the behavior that we attribute to the control objective. To substantiate this point, we performed additional simulations that are provided in the Supplementary Material (Figures S4—6). These simulations show that neither a change in noise nor in the relative cost of effort can account for the two distinct types of behavior. These differences are more consistently attributed to a change in the control objective.

      In addition, our approach provides a normative account of the control gains needed to simulate the observed data, as well as the control objectives that underlie those gains. As such, the two control policies in our model (Position and Velocity Control) resulted in control gains that captured the differences in the experimental groups (Experiment 2), both at the single trial and aggregate levels and across different task difficulties. Figure S9 in the Supplementary Material shows how the control gains differ between Position and Velocity Control in our model across different difficulty levels.

      We agree,with the reviewer’s overall point, that there are no doubt many models that can exhibit the variability observed in our experimental data, our simulations, or the reviewer’s simulations. Our study aimed to explore in detail not only the model’s ability to generate the variable behavior observed in experimental data, but also to match experimental results in terms of performance levels, gains, lags and correlations across a wide range of lambda values, wherein the only changes in the model were the lambda value and the control objective. Without the details of the reviewer’s model, we are unable to perform a detailed analysis of that model. Even so, we are not claiming that our model is the ‘ground truth,’ only that it is certainly a reasonable model, adopted from the literature, that provides intuitive and normative explanation about the performance of humans and monkeys over a range of metrics, system dynamics, and experimental conditions.

      Finally, we understand the reviewer’s concern regarding whether the trial-by-trial identification of control strategy in Figure 8 suggests that (uninstructed) subjects constantly switch control objectives between Position and Velocity. Although it is not unreasonable to imagine that individuals would intuitively try different strategies between ‘keeping the cursor still’ and ‘keeping the cursor at the center’ across trials, we agree that it is generally difficult to determine such trial-to-trial changes, especially when the behavior lies somewhere in between the two control objectives. In such cases, as we originally discussed in the manuscript, an alternative explanation could be a mixed control objective that generates behavior at the intersection of Position and Velocity Control, i.e., between the two slopes in Figure 8. We believe, however, that our modeling approach is still helpful in cases where performance is predominantly based on Position or Velocity Control. After all, the motivation for this study was to parse neural data into two classes associated with each control objective to potentially better identify structure underlying these behaviors.

      We clarified these points in the main text by adding further explanation in the Discussion section.

      COMMENT: The reviewer suggested additional experiments, such as perturbation trials, that might be useful to further explore the separability of control objectives. They also suggested that we temper our conclusion that our approach can reliably discriminate amongst different control policies on individual trials. Finally, the reviewer suggested that we modify our Introduction and/or Discussion to note past human/monkey research as well as investigations of minimization of velocity-error versus position-error in the smooth pursuit system.

      REPLY: We have expanded our simulations to investigate the effects of perturbation on the separability of different control objectives (Figure S10 in Supplementary Materials). We demonstrated that introducing perturbations more clearly differentiated between Position and Velocity Control. These results provide a good basis for further experimental verifications of the control objectives, but we defer these for future work.

      We also appreciate the additional past work that bridges human and monkey research that the reviewer highlights, including the related discussions in the eye movement literature on position versus velocity control. We have modified our Introduction and Discussion accordingly.

      Reviewer #3:

      COMMENT: The reviewer asked whether the observed differences in behavior might be due to some other factors besides the control policy, such as motor noise or effort cost, and suggested that we more systematically ruled out that possibility.

      REPLY: We appreciate and have heeded the reviewer’s suggestion. The revised manuscript now includes additional simulations in which the control objective was fixed to either Position or Velocity Control, while other parameters were systematically varied. Specifically, we examined the influence of the relative effort cost, the sensory delay, and motor noise, on performance. The results of these sensitivity analyses are presented in the Supplementary Material, Figures S4—6. In brief, we found that changing the relative effort cost, delay, or noise levels, mainly affected the success rate in performance (as expected), but did not affect the behavioral features originally associated with control objectives. We include a statement about this result in the main text with reference to the details provided in the Supplementary Material.

      COMMENT: The reviewer questioned our choice of classification features (RMS position and velocity) and wondered if other features might yield better class separation, such as the hand/cursor gain. In a similar vein, reviewer 2 suggested in their recommendations that we examine the width of the autocorrelation function as a potentially better feature.

      REPLY: We note first that our choice of cursor velocity and position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. However, we also explored other features as suggested. We found that they, too, exhibited overlap between the two different control objectives, and did not provide any significant improvement in classification performance (Figures S2 and S3; Supplementary Materials). Of course, that is not to say that a more exhaustive examination of features may not find ones that yield better classification performance than those we investigated, but that is beyond the scope of our study. We refer to this consideration of alternative metrics in the discussion.

      COMMENT: The reviewer notes that “It seems that the classification problem cannot be solved perfectly, at least on a single-trial level.” To address this point, the reviewer suggested that we conduct additional simulations under the two different control objectives, and quantify the misclassifications.

      REPLY: We appreciate the reviewer’s suggestion, and have conducted the additional simulations as suggested, the results of which are included in the revised manuscript.

      COMMENT: “The problem of inferring the control objective is framed as a dichotomy between position control and velocity control. In reality, however, it may be a continuum of possible objectives, based on the relative cost for position and velocity. How would the problem differ if the cost function is framed as estimating a parameter, rather than as a classification problem?”

      REPLY: A blended control strategy, formulated as a cost function that is a weighted combination of position and velocity costs, is indeed a possibility that we briefly discussed in the original manuscript. This possibility arises particularly for individuals whose performance metrics lie somewhere between the purely Position or purely Velocity Control. While our model allows for a weighted cost function, which we will explore in future work, we felt in this initial study that it was important to first identify the behavioral features unique to each control objective.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      None beyond those stated above.

      Reviewer #2 (Recommendations For The Authors):

      COMMENT: Line 166 states "According to equation (1), this behavior was equivalent to reducing the sum (𝑝 + 𝑥) when 𝜆 increased, so as to prevent rapid changes in cursor velocity". This doesn't seem right. In equation 1, velocity (not acceleration) depends on p+x. So a large p+x doesn't create a "rapid change in cursor velocity", but rather a rapid change in cursor position.

      REPLY: The reviewer is correct and we have corrected this misworded sentence; thank you for catching that.

      COMMENT: The reviewer points out the potential confusion readers may have, given our unclear use of ‘control strategy’ vs. ‘control policy’ vs. ‘control objective’. The reviewer suggests that “It would be helpful if this could be spelled out early and explicitly. 'Control strategy' seems perilously close to 'control policy', and it would be good to avoid that confusion. The authors might prefer to use the term 'cost function', which is really what is meant. Or they might prefer 'control objective', a term that they introduce as synonymous with 'control strategy'.”

      REPLY: We thank the reviewer for noting this ambiguity. We have clarified the language in the Introduction to explicitly note that by strategy, we mean the objective or cost function that subjects attempt to optimize. We then use ‘control objective’ consistently and removed the term ‘policy’ from the paper to avoid confusion. We also now use Position Control and Velocity Control as the labels for our two control objectives.

      COMMENT: The reviewer notes that in Figure 2B and the accompanying text in the manuscript, we need to be clearer about what is being correlated; namely, cursor and hand position.

      REPLY: Thank you for pointing out this lack of clarity, which we have corrected as suggested.

      COMMENT: The reviewer questions our attribution of decreasing lag with task difficulty as a consequence of subjects becoming more attentive/responsive when the task is harder, and points out that our model doesn’t include this possible influence yet the model reproduces the change in lag. The reviewer suggests that a more likely cause is due to phase lead in velocity compared to position, with velocity likely increasing with task difficulty, resulting in a phase advance in the response.

      REPLY: Our attribution of the decrease in lag with task difficulty being due to attention/motivation was a recapitulation of this point made in the paper by Quick et al. [2018]. But as noted by the reviewer, this potential influence on lag is not included in our model. Accordingly, the change in lag is more likely a reflection of the phase response of the closed loop system, which does change with task difficulty since the optimal gains depend upon the plant dynamics (i.e., the value of lambda). We have, therefore, deleted the text in question.

      COMMENT: “The Methods tell us rather a lot about the dynamics of the actual system, and the cost functions are also well defined. However, how they got from the cost function to the controller is not described. I was also a bit confused about the controller itself. Is the 50 ms delay assumed when deriving the controller or only when simulating it (the text seems to imply the latter, which might make sense given that it is hard to derive optimal controllers with a hard delay)? How similar (or dissimilar) are the controllers for the two objectives? Is the control policy (the matrix that multiplies state to get u) quite different, or only subtly?”

      REPLY: Thanks for pointing this out. For brevity, we had omitted the details and referred readers to the original paper (Todorov, 2005). However, we now revised the manuscript to now include all the details in the Methods section. Hence, the entire section on the model is new. This also necessitated updating all data figures (Figures 3, 4, 5, 6, 7, 8) as they contain modeling results.

      COMMENT: “Along similar lines, I had some minor to moderate confusions regarding the OFC model as described in the main text. Fig 3 shows a model with a state estimator, but it isn't explained how this works. …Here it isn't clear whether there is sensory noise, or a delay. The methods say a delay was included in the simulation (but perhaps not when deriving the controller?). Noise appears to have been added to u, but I'm guessing not to x or x'? The figure legend indicates that sensory feedback contains only some state variables, and that state estimation is used to estimate the rest. Presumably this uses a Kalman filter? Does it also use efference copy, as would be typical? My apologies if this was stated somewhere and I missed it. Either way, it would be good to add a bit more detail to the figure and/or figure legend.”

      REPLY: As the lack of detail evidently led to some confusion, we now more clearly spell out the details of the model in the Methods, including the state estimation procedure.

      COMMENT: The reviewer wondered why we chose to plot mean velocity vs. mean position as in Figure 5, noting that, “ignoring scale, all scatter plots would be identical if the vertical axis were final position (because mean velocity determines final position). So what this plot is really examining is the correlation between final position and average position. Under position control, the autocorrelation of position is short, and thus final position tends to have little to do with average position. Under velocity control, the autocorrelation of position is long, and thus final position tends to agree with average position. Given this, why not just analyze this in terms of the autocorrelation of position? This is expected to be much broader under velocity control (where they are not corrected) than under position control (where they are, and thus disappear or reverse quickly). To me, thinking of the result in terms of autocorrelation is more natural.”

      REPLY: The reviewer is correct that the scatter plots in Fig. 5 would be the same (to within a scale factor of the vertical axis) had we plotted final position vs. mean position instead of mean velocity vs. mean position as we did. Our preference for mean velocity vs. mean position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. We now mention these perspectives in the revised manuscript for the benefit of the reader.

      As suggested, we also investigated the width of the (temporal) autocorrelation function (acf) of cursor position for 200 simulated position control trials and 200 simulated velocity control trials, at four different lambda values (50 simulated trials per lambda). Figs. S2A and B (Supplementary Materials) show example trials and histograms of the acf width, respectively. As the reviewer surmised, velocity control trials tend to have wider acfs than position control trials. However, as with the metrics we chose to analyze, there is overlap and there is no visible benefit for the classification.

      COMMENT: “I think equation ten is incorrect, but would be correct if the identity matrix were added? Also, why is the last term of B set to 1/(Tau*M). What is M? Is it mass (which above was lowercase m)? If so, mass should also be included in A (it would be needed in two places in the last column). Or if we assume m = 1, then just ignore mass everywhere, including here and equation 5. Or perhaps I'm confused, and M is something else?”

      REPLY: Thanks for pointing this out. The Matrix A shown in the paper is for the continuous-time representation of the model. However, as the reviewer correctly mentioned, for the discrete-time implementation of the model, a modification (identity matrix) was added in our simulations. We have now clarified this in the Methods section of the revised manuscript. Also, as correctly pointed out, M is the mass of the hand, which depending on whether the hand acceleration (d^2 p/dt^2) or hand force (F) are taken as the state, it can be included in the A matrix. In our case, the A matrix is modified according to the state vector. Similarly, the B matrix is also modified. This is now clarified in the Methods section of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      COMMENT: “Equations 4-8 are written in continuous time, but Equation 9 is written in discrete time. Then Equation 10 is in discrete time. This needs to be tidied up. … I would suggest being more detailed and systematic, perhaps formulating the control problem in continuous time and then converting to discrete time.”

      REPLY: Thank you for this helpful suggestion. The model section in the Methods has been expanded to provide further details of the equation of motion, the discretization process, the control law calculation and the state estimation process.

      COMMENT: “It seems slightly odd for the observation to include only position and velocity of the cursor. Presumably participants can also observe the state of their own hand through proprioception (even if it were occluded). How would it affect the model predictions if the other states were observable?”

      REPLY: Thanks for pointing this out. We initially included only cursor position and velocity since we felt that was the most prominent state feedback, and the system is observable in that case. Nevertheless, we revised the manuscript and repeated all simulations using a full observability matrix. Our findings and conclusions remain unchanged. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “It seems unnecessary to include the acceleration of the cursor in the formulation of the model. …the acceleration is not even part of the observed state according to line 668… I think the model could therefore be simplified by omitting cursor acceleration from the state vector.”

      REPLY: We agree. We have simplified the model, and generated new simulations and figures. Our results and conclusions were unchanged by this modification. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “In the cost function, it's not clear why any states other than position and velocity of the cursor need to have non-zero values. …The choice to have the cost coefficient for these other states be 1 is completely arbitrary… If the point is that the contribution of these other costs should be negligible, then why not just set them to 0?”

      REPLY: We agree, and have made this change in the Methods section. Our findings and conclusions were unaffected.

      COMMENT: “It seems that the cost matrices were specified after transforming to discrete-time. It is possible however (and perhaps recommended) to formulate in continuous time and convert to discrete time. This can be done cleanly and quite straightforwardly using matrix exponentials. Depending on the discretization timestep, this can also naturally lead to non-zero costs for other states in the discrete-time formulation even if they were zero under continuous time. … A similar comment applies to discretization of the noise.”

      REPLY: Thanks for the suggestion. We have expanded on the discretization process in our Methods section, which uses a common approximation of the matrix exponentiation method.

      COMMENT: “Most of the parameters of the model seem to be chosen arbitrarily. I think this is okay as the point is to illustrate that the kinds of behaviors observed are within the scope of the model. However, it would be helpful to provide some rationale as to how the parameters were chosen. e.g. Were they taken directly from prior literature, or were they hand-tuned to approximately match observed behavior?”

      REPLY: We have revised the manuscript to more clearly note that the noise parameters, as well as parameters of the mechanical system (mass, muscle force, time scale, etc) in our model were taken from previous publications (Todorov, 2005, Cluff et al. 2019). As described in the manuscript, the parameter values of the cost function (Q matrix) were obtained by tuning the parameters to achieve a similar range of success rate with the model as observed in the experimental data. This is now clarified in the Methods section.

      COMMENT: “The ‘true’ cost function for this task is actually a 'well' in position space - zero cost within the screen and very high cost elsewhere. In principle, it might be possible to derive the optimal control policy for this more veridical cost function. It would be interesting to consider whether or not this model might reproduce the observed behaviors.”

      REPLY: This is indeed a very interesting suggestion, but difficult to implement based on the current optimal feedback control framework. However, this is interesting to consider in future work.

      Minor Comments:

      COMMENT: “In Figs 4 and 5, the data points are drawn from different conditions with varying values of lambda. How did the structure of this data depend on lambda? Might it be possible to illustrate in the figure (e.g. the shade/color of each dot) what the difficulty was for each trial?”

      REPLY: We performed additional analyses to show the effects of task difficulty on the choice of control objective. Overall, we found that the main behavioral characteristics of the control objective remained fairly unchanged across different task difficulties or across time. The results of this analysis are included in Fig. S7 and S8 of the Supplementary Materials.

      COMMENT: “Should mention trial duration (6s) in the main narrative of the intro/results.”

      REPLY: We now mention this detail when we describe the task for the first time.

      COMMENT: “As an alternative to training on synthetic data (which might not match behavior that precisely, and was also presumably fitted to subject data at some level) it might be worth considering to do a cross-validation analysis, i.e. train the classifier on subsets of the data with one participant removed each time, and classify on the held-out participant.”

      REPLY: This is indeed a valid point. The main reason to train the classifier based on model simulations was two-fold: first, to have confidence in the training data, as the experimental data was limited and noisy, which would result in less reliable classifications; and second, the model simulations are available for different contexts and conditions, where experimental data is not necessarily available. The latter is a more practical reason to be able to identify control objectives for any subject (who received no instructions), without having to collect training data from matching control subjects who received explicit instructions. Nonetheless, we appreciate the reviewer’s recommendation and will consider that for our future studies.

      COMMENT: “line 690 - Presumably the optimal policy was calculated without factoring in any delay (this would be tricky to do), but the 50ms delay was incorporated at the time of simulation?”

      REPLY: The discretization of the system equations allowed us to incorporate the delay in the system dynamics and solve for the optimal controller with the delay present. This was done simply by system augmentation (e.g., Crevecoeur et al., 2019), where the states of the system in the current time-step were augmented with the states from the 5 preceding time-steps to form the new state vector x(t)_aug =[x(t) , x(t-1) , … , x(t-d) ]. Similarly, the matrices A, B, and H from the system dynamics could be expanded accordingly to form the new dynamical system:

      $$x(t+1){aug} = A{aug} * x(t){aug} + B{aug} * u$$

      Then, the optimal control was implemented on the new (augmented) system dynamics.

      We have revised the manuscript (Methods) to clarify this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      Strengths:

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species.

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins.

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant.

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain.

      Altogether, the work is comprehensive, experiments are designed well, and conclusions are made based on the data generated after verification using multiple complementary approaches.

      Weaknesses:

      The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. The authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      Response: We thank the reviewer for this comments. We would like to convince this reviewer about the fact that the VirR mutant is truly caring a more fragile PG. We will perfume additional experiments that would support this notion. We will determine the degree of PG release to the extracellular space and run additional mass spectrometry data on isolated PG.

      Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      Response: We agree with the reviewer that TLC analysis is not quantitative. Additional TLCs will be run to investigate other lipids sharing common precursors. At the present time, we can not run radioactive experiments on the lab.

      The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      Response: We concur with the reviewer that cholesterol as sole carbon source is introducing many changes in Mtb cells beside permeability. Our central hypothesis regarding this data is that cholesterol will make Mtb cell membrane less fluid and this fact will make Ev release to be reduced. We will try to measure membrane fluidity in the presence and absence of cholesterol. However, permeability changes in Mtb cells can be manifested at different levels of the cell envelope. This would suggest that the increased permeability observed in the VirR mutant, could be different than that of observed upon TRZ treatment. The main point on this is that vesiculogenesis could be a general process responding to changes in permeability regardless of the cell envelope compartment affected. We need to define experiments here, but we will try to demonstrate this.

      Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how the SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      Response: Split-GFP has been previously used with cell membrane proteins with success. However, we will repeat the experiments and perform statistics.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vasculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vasculogenesis.

      Strengths:

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production.

      Response: We thank the reviewer for this kind words.

      Weaknesses:

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study.

      Response: We thank the reviewer for bringing up this issue. Contrary to predictions, we believe that virR is an essential gene as we have tried to delete it several times with no success. We used in the study a transposon mutant and its complementing strain since they have been the base of previous studies to establish their genetic implications in vesiculogenesis in Mtb. The choice of CRISPRi was run similar experiments in a background different from transposon mutagenesis. Our data, support similar phenotypes in term of vesicle release.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      More details about the classification and how it is trained

      We included a sentence in the introduction to clarify which data we are using: "In order to demonstrate this improvement, we apply our methods to two classification datasets: a synthetic dataset and a public clinical dataset where the predicted outcome is the survival of the patient"

      And about how the classifier is trained in the "Results" section: "we used the default parameters of the classifier, since our focus is comparing the different imputation methods"

      Availability of the code

      Now the code is publicly available in a github repository https://github.com/AstraZeneca/dpp_imp/ (see Availability of Data and Code section)

      Reviewer #2

      Clarifying that Determinantal Point Processes and their deterministic version have been introduced before but are applied for the first time for data imputation in this work:

      We added explanation in the 6th paragraph of the introduction that we use pre-existing DPP and deterministic-DPP algorithms for our imputation methods and include the references to avoid confusion

      We also added a paragraph at the end of the introduction to summarize this work's contribution

      Explaining the claim about the computational advantage of using quantum determinantal point processes for the imputation methods:

      In the fourth paragraph of the "Discussion" section (page 8), we give an imputation example by numerically comparing the classical and quantum algorithms running time for DPP sampling, which shows the advantage of using the quantum algorithm.

      Regarding running time for classical DPP and quantum DPP sampling algorithms:

      We included Table VIII (page 13) that compares the preprocessing and sampling complexities for both classical and quantum DPP algorithms, we consider the case where we sample d rows from an (n,d) matrix and n=O(d) which is usually the case for our DPP-Random Forest algorithm

      We added some details regarding the quantum advantage in the first paragraph of page 12

      Regarding the comment about the modest improvement of the DPP methods and questions about their practical benefit:

      As mentioned in the third paragraph of the "Discussion" section, we point out that the consistency of the improvement and the removal of variance as a result of using the DPP and deterministic DPP methods make our methods very beneficial to use on clinical data. Further exploration with different data sets can provide a more result in a more complete understanding of the practical advantages of the methods

      Algorithmic complexity of the deterministic DPP algorithm:

      Detailed in the last sentence of the "Determinantal Point Processes" subsection of the "Methods" section: O(N^2 d) for the preprocessing step and O(Nd^3) for the sampling step

      Running time for the quantum deterministic DPP sampling and how it is done in practice:

      While it is difficult to assess the real running time for the quantum detDPP algorithm for large circuits (100 or more qubits), due to the unavailability of such devices, we give more details about our practical implementation in the last paragraph of the "Methods" section. In our case (up to 10 qubits) we used 1000 shots to sample the highest probability elements.

      On which quantum simulator was used

      We point out in the first paragraph of page 5 that we employ the qiskit noiseless simulator

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study identifies the gene mamo as a new regulator of pigmentation in the silkworm Bombyx mori, a function that was previously unsuspected based on extensive work on Drosophila where the mamo gene is involved in gamete production. The evidence supporting the role of Bm-nano in pigmentation is convincing, including high-resolution linkage mapping of two mutant strains, expression profiling, and reproduction of the mutant phenotypes with state-of-the-art RNAi and CRISPR knock-out assays. While the discussion about genetic changes being guided or accelerated by the environment is extremely speculative and has little relevance for the findings presented, the work will be of interest to evolutionary biologists and geneticists studying color patterns and evolution of gene networks.

      Response: Thank you very much for your careful work. In the revised version, we conducted a comparative genomic analysis of the upstream regions of the Bm-mamo gene in 51 wild silkworms and 171 domesticated local silkworms. The analysis of nucleotide diversity (pi) and the fixation index (FSTs) of the Bm-mamo genome sequences in the wild and domesticated silkworm populations were also performed. The results showed that the Bm-mamo genome sequence of local silkworms was relatively conserved, while the upstream sequence of wild silkworms exhibited high nucleotide diversity. This finding suggested a high degree of variability in the regulatory region of the Bm-mamo gene, in wild strains. Additionally, the sequence in this region may have been fixed by domestication selection. We have optimized the description in the discussion section.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This papers performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument.

      The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate.

      Strengths:

      The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      This revision is a much improved manuscript and I command the authors for many of their edits.

      Response: Thank you very much for your careful work. With the help of reviewers and editors, we have revised the manuscript to improve its readability.

      I find the last part of the discussion, starting at "It is generally believed that changes in gene expression patterns are the result of the evolution of CREs", to be confusing.

      In this section, I believe the authors sequentially:

      • emphasize the role of CRE in morphological evolution (I agree)

      • emphasize that TF, and in particular their own CRE, are themselves important mutational targets of evolution (I agree, but the phrasing need to insist the authors are here talking about the CRE found at the TF locus, not the CRE bound by the TF).

      • use the stickleback Pel enhancer as an example, which I think is a good case study, but the authors also then make an argument about DNA fragility sites, which is hard to connect with the present study.

      • then continue on "DNA fragility" using the peppered moth and butterfly cortex locus. There is no evidence of DNA fragility at these loci, so the connection does not work. "The cortex gene locus is frequently mutated in Lepidoptera", the authors say. But a more accurate picture would be that the cortex locus is repeatedly involved in the generation of color pattern variants. Unlike for Pel fragile enhancer, we don't know if the causal mutations at this locus are repeatedly the same, and the haplotypes that have been described could be collateral rather than causal. Overall, it is important to clarify the idea that mutation bias is a possible factor explaining "genetic hotspots of evolution" (or genetic parallelism sensu 10.1038/nrg3483), but it is also possible that many genetic hotspots are repeated mutational targets because of their "optimal pleiotropy" (e.g. hub position in GRNs, such as mamo might be), or because of particularly modular CRE region that allow fine-tuning. Thus, I find the "fragility" argument misleading here. In fact the finding that "bd" and "bdf" alleles are different in nature is against the idea of a fragility bias (unless the authors can show increased mutation rates at this locus in a wild silkmoth species?). These alleles are also artificially-selected ie. they increased in frequency by breeding rather than natural selection in the wild, so while interesting for our understand of the genotype-phenotype map, they are not necessarily representative of the mutations that may underlie evolution in the wild.

      Response: Thank you very much for your careful work. DNA fragility is an interesting topic, but some explanations for DNA fragility are confusing. One study measured the rate of DNA double-strand breaks (DSBs) in yeast artificial chromosomes (YACs), which are chromosomes containing marine Pel that broke ~25 to 50 times more frequently than did the control. These authors believe that the increase in the mutation rate is caused by DNA sequence characteristics, particularly TG-dinucleotide repeats. Moreover, they found that adding a replication origin on the opposite side of Pel did not cause the fungus to switch fragile, making the forward sequence stable and the reverse complement fragile. Thus, Pel fragility is also dependent on the direction of DNA replication. In summary, they suggested that the special DNA sequence is the cause of DNA fragility. In addition, the sequence features associated with DNA fragility in the Pel region are also found in thousands of other positions in the stickleback and human genomes (Xie KT et al, 2019, science).

      In yeast artificial chromosomes (YACs), the characteristics of DNA sequences, such as TG-dinucleotide repeat sequences, may be important reasons for DNA fragility, and these breaks occur during DNA replication. However, the inserted sequence of YAC often undergoes deletion or recombination during cultivation and passage. In addition, yeast is a single-celled organism. Therefore, the results in yeast cannot represent the situation in multicellular organisms. If multicellular organisms are like this, there are several issues as follows:

      (1) The DNA replication process occurs separately in different multicellular organisms. Because DNA breakage and repair are independent, they can lead to the presence of different alleles in different cells. This can potentially lead to the occurrence of extensive chimeric organisms. However, we have not found such a situation in the genome sequencing of many multicellular organisms.

      (2) If the DNA sequence, TG-dinucleotide repeats, is the determining factor, the mutations near the sequence lose their strong correlation with environmental changes. The researchers conducted yeast artificial chromosome experiments in the same environment and found that the frequency of DNA breaks containing TG dinucleotide repeat sequences was 25 to 50 times greater than that of the control group. This means that, whether in the marine population or the lake population, this part of the sticklebacks’ genome has undergone frequent mutations. However, according to related research, populations of lake sticklebacks, rather than marine populations, often exhibit a decrease in the pelvic phenotype.

      (3) Researchers have found thousands of loci in the genome of sticklebacks and humans that contain such sequences (TG-dinucleotide repeats). This means that thousands of sites undergo frequent mutations during DNA replication. Unless these sites do not possess functionality, they will have some impact on the organism, even causing damage. Even if they are not functional sequences, these sequences will gradually be discarded or replaced during frequent mutations rather than being present in large quantities in the genome.

      Therefore, the study of DNA fragility in yeast cannot explain the situation in multicellular organisms.

      As you noted, we want to express that the frequent variation in the cortex gene should be regulated by targeted regulation involving the GRN in Lepidoptera. In addition, studies on specific epigenetic modifications discovered through the referenced fragile DNA sites suggest that DNA fragility is not determined by the DNA sequence (Ji F, 2020, Cell Res) but rather by other factors, such as epigenetic factors. The sequence features discovered at fragile DNA sites are traces of frequent mutations, not causes.

      In this revision, we analyzed the nucleotide diversity of the mamo genome in 51 wild and 171 domestic silkworms. We found high nucleic acid diversity from the third exon to the upstream region of this gene in wild silkworms. We randomly selected 12 wild silkworms and 12 domestic silkworms and compared their upstream sequences to approximately 1 kb. In wild silkworms, there is significant diversity in their upstream sequences. In domestic silkworms, the sequences are highly conserved, but in some silkworms, a long interspersed nuclear element (LINE) is inserted. This finding suggested that there is frequent variation in the sequence of this region in wild silkworms, while fixation occurs in domesticated silkworms. These genomic data are sourced from the pangenome of silkworms (Tong X, 2022, Nat Commun.). In the pangenomic research, 1078 strains (205 local strains, 194 improved strains, 632 mutant strains, and 47 wild silkworms), which included 545 third-generation sequencing genomes, were obtained. An online website was built to utilize these data (http://silkmeta.org.cn/). We warmly welcome you to use these data.

      In summary, for clearer expression, we have rewritten this section.

      Xie KT, Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl ADC, Schluter D, Bell MA, Vasquez KM, Kingsley DM. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science. 2019 Jan 4;363(6422):81-84. doi: 10.1126/science.aan1425.

      Ji F, Liao H, Pan S, Ouyang L, Jia F, Fu Z, Zhang F, Geng X, Wang X, Li T, Liu S, Syeda MZ, Chen H, Li W, Chen Z, Shen H, Ying S. Genome-wide high-resolution mapping of mitotic DNA synthesis sites and common fragile sites by direct sequencing. Cell Res. 2020 Nov;30(11):1009-1023. doi: 10.1038/s41422-020-0357-y.

      Tong X, Han MJ, Lu K, Tai S, Liang S, Liu Y, Hu H, Shen J, Long A, Zhan C, Ding X, Liu S, Gao Q, Zhang B, Zhou L, Tan D, Yuan Y, Guo N, Li YH, Wu Z, Liu L, Li C, Lu Y, Gai T, Zhang Y, Yang R, Qian H, Liu Y, Luo J, Zheng L, Lou J, Peng Y, Zuo W, Song J, He S, Wu S, Zou Y, Zhou L, Cheng L, Tang Y, Cheng G, Yuan L, He W, Xu J, Fu T, Xiao Y, Lei T, Xu A, Yin Y, Wang J, Monteiro A, Westhof E, Lu C, Tian Z, Wang W, Xiang Z, Dai F. High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation. Nat Commun. 2022 Sep 24;13(1):5619. doi: 10.1038/s41467-022-33366-x.

      Lu K, Pan Y, Shen J, Yang L, Zhan C, Liang S, Tai S, Wan L, Li T, Cheng T, Ma B, Pan G, He N, Lu C, Westhof E, Xiang Z, Han MJ, Tong X, Dai F. SilkMeta: a comprehensive platform for sharing and exploiting pan-genomic and multi-omic silkworm data. Nucleic Acids Res. 2024 Jan 5;52(D1):D1024-D1032. doi: 10.1093/nar/gkad956.

      Curiously, the last paragraph ("Some research suggests that common fragile sites...") elaborate on the idea that some sites of the genome are prone to mutation. The connection with mamo and the current article are extremely thin. There is here an attempt to connect meiotic and mitotic breaks to Bm-mamo, but this is confusing: it seems to propose Bm-mamo as a recruiter of epigenetic modulators that may drive higher mutation rates elsewhere. Not only I am not convinced by this argument without actual data, but this would not explain how the mutations at the Bm-mamo itself evolved.

      Response: Thank you very much for your careful work. This section mainly illustrates that DNA fragility is not determined by sequence but is regulated by other factors in animals. In fruit flies, they found that mamo is an important candidate gene for recombination hotspot setting in meiosis. First, we evaluated PRDM9, which plays an important role in setting recombination hotspots during meiosis. Our purpose in mentioning this information is to illustrate that chromosome recombination is a process of programmed double strand breaks and to answer another reviewer's question about programmed events in the genome. In summary, we suggest that some variations in DNA sequences are procedural results. We have optimized the description of this section in this version.

      On a more positive note, I find it fascinating that the authors identified a TF that clearly articulates or orchestrate larval pattern development, and that when it is deleted, can generate healthy individuals. In other words, while it is a TF with many targets, it is not too pleiotropic. This idea, that the genetically causal modulators of developmental evolution are regulatory genes, has been described elsewhere (e.g. Fig 4c in 10.1038/s41576-020-0234-z, and associated refs). To me, the beautiful findings about Bm-mamo make sense in the general, existing framework that developmental processes and regulatory networks "shape" the evolutionary potential and trajectories of organisms. There is a degree of "programmability" in the genomes, because some loci are particularly prone to modulate a given type of trait. Here, Bm-mamo, as a potentially regulator of both CPs and melanin pathway genes, appear to be a potent modulator of epithelial traits. Claiming that there are inherent mutational biases behind this is unwarranted.

      Response: Thank you very much for your careful work. I completely agree with your statement that the genome exhibits a certain degree of programmability. On the one hand, some transcription factors can precisely control the spatiotemporal expression levels of some structural genes (such as pigment synthesis genes). On the other hand, these transcription factors are also subject to strict expression regulation. Because the color pattern is complex, changes in single or minority structural genes result in incomplete or imprecise changes in coloring patterns. Nevertheless, several regulatory factors can regulate multiple downstream target genes. Changes in their expression patterns can lead to holistic and significant changes in color patterns. There are long intergenic regions upstream of many important transcription factors, dozens of kilobase pairs (Kb) to hundreds of Kb, which may contain many different regulatory elements for better control of their expression patterns. Therefore, gene regulatory networks can directly regulate transcription factors to modulate a given type of trait. Transcription factors and their downstream target genes can form a functional module, which is similar to a functional module in software or operating systems. This regulation of transcription factors is simpler in terms of steps, which are similar to a single click switch button. The gene regulatory network regulates these modules in response to environmental changes and is widely recognized.

      Some people do not agree that genetic variations can also be regulated. They claim that this is completely random. The infinite monkey theorem (Félix-Édouard-Justin-Émile Borel, 1909) states that if an infinite number of monkeys were given typewriters and an infinite amount of time, they would eventually produce the complete works of Shakespeare. Although this theory advocates randomness on the surface, its conclusions are full of inevitability (tail event). In nature, some things we observe do not have obvious regularity because they involve relatively complex factors, and the underlying logic is obscure and difficult to understand. We often name them random. However, as we gradually understand the logic behind this complex event, we can also recognize the procedural nature of this randomness.

      Previously, chromosomal recombination during meiosis was believed to be a random event. However, currently, it is believed that the process is procedural. The occurrence of meiotic recombination mentioned earlier indicates that the genome has the ability to self-set the position of double-strand breaks to form new allelic forms. Because meiotic recombination is programmed, transcription factors that recognize DNA sites, enzymes that cleave double strands, and DNA repair systems exist, programming can also introduce genetic variation. A study in plants has provided insights into this programmed mutation (Monroe JG, 2023, nature). Frequent changes in the expression patterns of some transcription factors occur between and/or within species. In this article, we only discuss the possible reasons for variations in the expression patterns of some transcription factors in a general manner and simple reasoning. We have added an analysis of the response of wild silkworms and improved the relevance of the discussion.

      Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D, Weng ML, Imbert E, Ågren J, Rutter MT, Fenster CB, Weigel D. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105. doi: 10.1038/s41586-021-04269-6. Epub 2022 Jan 12. Erratum in: Nature. 2023 Aug;620(7973):

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Please structure your Discussion with section headers.

      Response: Thank you very much for your careful work. We have added relevant section headers.

      • As explained in my public review, I found the two last sections of the Discussion to be dispersed and confusing. I also must say that I carefully read the Response to Reviewers on this, which helped me to better understand the authors' intentions here. Please consider the revision of this Discussion as this feels extremely speculative difficult to connect with Bm-mamo.

      Response: Thank you very much for your careful work. We have rewritten this part of the content.

      • typo: were found near the TTS of yellow --> TSS

      Response: Thank you very much for your careful work. We have made these modifications.

      • l. 234 :"expression level of the 18 CP genes in the integument". Consider adding a mention of Figure 7 here, as only Fig. S10 is cited here.

      Response: Thank you very much for your careful work. We have made these modifications.

      • Editorial comment on the second half of the Abstract:

      Wu et al : "We found that Bm-mamo can comprehensively regulate the expression of related pigment synthesis and cuticular protein genes to form color patterns. This indicates that insects have a genetic basis for coordinate regulation of the structure and shape of the cuticle, as well as color patterns. This genetic basis provides the possibility for constructing the complex appearances of some insects. This study provides new insight into the regulation of color patterns."

      I respectfully suggest a more accurate rephrasing, where the methods are mentioned, and where the logical argument is more straightforward. For example

      "Using RNAi and CRISPR we show that Bm-mamo is a repressor or dark melanin patterns in the larval epithelium. Using in-vitro binding assays and gene expression profiling in wild-type and mutant larvae, we also show that Bm-mamo likely regulate the expression of related pigment synthesis and cuticular protein genes in a coordinated manner to mediate its role in color pattern formation. This mechanism is consistent with a dual role of this transcription factor in regulating both the structure and shape of the cuticle and pigments that are embedded within it. This study provides new insight into the regulation of color patterns as well as in the construction more complex epithelial features in some insects."

      I hope this let the ideas of the original version transpire as the authors intended.

      Response: Thank you very much for your careful work. We have made these modifications.

    1. Author Response

      Public Reviews

      We thank both reviewers for taking the time and effort to think critically about our paper and point out areas where it can be improved. In this document, we do our best to clarify any misunderstandings with the hope that further consideration about the strengths and weaknesses of our approach will be possible. Our responses are in bold.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures. Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study.

      The environments we study represent 12 different concentrations or combinations of two drugs, radicicol and fluconazole. Our hope is that this large dataset (774 mutants x 12 environments) will be useful, both to scientists who are generally interested in the genetic and phenotypic underpinnings of adaptation, and to scientists specifically interested in the evolution of drug resistance.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements.

      This is a misunderstanding that we will work to clarify in the revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons.

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we will explicitly state that these 21,000 isolated lineages do not all represent unique, adaptive lineages. In figure 2 and all associated text, we will change the word “lineages” to “isolates,” where relevant.

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype, and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Note that 774 adaptive lineages is more than most previous studies. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 161 - 162).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance.

      The word “briefly” feels a bit unfair because we discuss this bias on 3 separate occasions (on lines 146 - 147, 260 - 264, and in more detail on 706 - 714). We even walk through an example of a class of mutants that our study misses. We say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we will add more text to the first mention of these missing mutants (lines 146 - 147) so that the implications are more immediately made apparent.

      While we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      We discussed these implications in some detail in the 16 lines mentioned above (146 - 147, 260 - 264, 706 - 714). To add to this discussion, we will also add the following sentence to the end of the paragraph on lines 697 - 714: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”.

      We will also add a new paragraph that discusses these implications earlier in our manuscript. This paragraph will highlight the strengths of our method (e.g., that we “catch” classes of mutants that are often overlooked) while being transparent about the weaknesses of our approach (e.g., that we “miss” mutants with strong tradeoffs).

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations.

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult.

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system).

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      We understand how the reviewer came to this misunderstanding and will adjust our revised manuscript accordingly. Previous work has demonstrated that, in this particular evolution platform, most of the mutations actually occur during the transformation that introduces the DNA barcodes (PMID25731169). In other words, these mutations do not accumulate during the 40 generations of evolution, they are already there. So the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.

      This concern, and all subsequent concerns, seem to be driven by either (a) general concerns about the noisiness of fitness measurements obtained from large-scale barcode fitness assays or (b) general concerns about whether the clusters obtained from our dimensional reduction approach capture this noise as opposed to biologically meaningful differences.

      We will respond to each concern point-by-point, but want to start by generally stating that (a) our particular large-scale barcode fitness assay has several features that diminish noise, and (b) we devote 4 figures and 200 lines of text to demonstrating that these clusters capture biologically meaningful differences between mutants (and not noise).

      In terms of this specific concern, we performed an analysis of noise in the submitted manuscript: Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing noise (Figure S7 panel B). But we agree with the reviewer that this analysis alone is not sufficient to conclude that the clusters distinguish groups of mutants with unique fitness tradeoffs.

      Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      To evaluate the strength of the clustering, we performed numerous analyses including whole genome sequencing, growth experiments, reclustering, and tracing the evolutionary origins of each cluster (Figures 5 - 8). All of these analyses suggested that our clusters capture groups of mutants that have different fitness tradeoffs. We will adjust our revised manuscript to make clear that we do not rely on the results of a clustering algorithm alone to draw conclusions about phenotypic convergence.

      We are also grateful to the reviewer for helping us realize that, as written, our manuscript is not clear with regard to how we perform clustering. We are not using UMAP to decide which mutant belongs to which cluster. Recent work highlights the importance of using an independent clustering method (PMID37590228). Although this recent work addresses the challenge of clustering much higher dimensional data than we survey here, we did indeed use an independent clustering method (gaussian mixture model). In other words, we use UMAP for visualization but not clustering. We also confirm our clustering results using a second independent method (hierarchical clustering; Figure S8). And in our revised manuscript, will confirm with a third method (PCA, see below). We will adjust the main text and the methods section to make these choices clearer.

      This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      The salient question is whether the clusters are so “fuzzy” that they are not meaningful. That interpretation seems unreasonable. Our clusters group mutants with similar genotypes, evolutionary histories, and fitness tradeoffs (Figures 5 - 8). Clustering mutants with similar behaviors is important and useful. It improves phenotypic prediction by revealing which mutants are likely to have at least some phenotypic effects in common. And it also suggests that the phenotypic space is constrained, at least to some degree, which previous work suggests is helpful in predicting evolution (PMID33263280, PMID37437111, PMID22282810, PMID25806684).

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components.

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent some intuitive phenotype, like resistance to fluconazole.

      Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods.

      We will adjust our revised manuscript to explain these reasons why we chose UMAP and GMM over PCA.

      Also, we will include PCA in the supplement of our revised manuscript. Please find below PC1 vs PC2, with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Author response image 1.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages.

      We worry that the idea stems from apriori notions of what the important dimensions should be. It also seems like this would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole.

      Also, we believe the reviewer meant “fitness profile” and not “fitness landscape”. A fitness landscape imagines a walk where every “step” is a mutation. Most lineages in barcoded evolution experiments possess only a single adaptive mutation. A single-step walk is not enough to build a landscape, though others are expanding barcoded evolution experiments beyond the first step (PMID34465770, PMID31723263), so maybe one day this will be possible.

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered.

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. We thank the reviewer for pointing out this gap in our writing. We will adjust our revised manuscript to explain that we ultimately chose to describe 6 clusters that we were able to validate with follow-up experiments. In figures 5, 6, 7, and 8, we use external information to validate the clusters that we report in figure 4. And in lines 697 – 714, we explain that there are may be additional clusters beyond those we tease apart in this study.

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e. merge) when we removed noise suggests these clusters were not capturing noise.

      More generally, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays.

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously (PMID37237236).

      The main assay we use to measure fitness has been previously validated (PMID27594428). No subsequent study using this assay validates using the methods suggested by the reviewer (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203).

      More to the point, bar-seq has been used, without the reviewer’s suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate.

      For all of these reasons, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

      We don’t agree that fitness measurements obtained from this bar-seq assay generally require validation. But we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways, in particular, in that they have different fitness tradeoffs. We have four figures (5 - 8) and 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. Happily, one of these figures (Fig 7) includes growth curves, which are exactly the type of validation experiment asked for by the reviewer.

      Below, we walk through the different types of validation experiments that are present in our original manuscript, and additional validation experiments that we plan to include in the revised version. We are hopeful that these validation experiments are sufficient, or at the very least, that this list empowers reviewers to point out where more work is needed.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S10).

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. Indeed they often do (see pie charts in Figures 6, 7, 8). This method also provides evidence supporting each cluster’s differing fitness tradeoffs.

      For example, mutants in cluster 5 appear to have a tradeoff in a double drug condition (described above). They rarely originate from that evolution condition, unlike mutants in nearby cluster 4 (see Figure 7).

      (3) Mutants from each cluster often fall into different genes: In our original manuscript, we sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6).

      (4) Mutants from each cluster have behaviors previously observed in the literature: In our original manuscript, we compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 457 - 462). Previous work suggests that some mutations to PDR have different tradeoffs than others, which is what we see (lines 540 - 542). IRA1 mutants were previously observed to have high fitness in our “no drug” condition, and are found in the cluster that has the highest fitness in the “no drug” condition (lines 642 - 646). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 652 - 657).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods: In our original manuscript, we performed various different reclustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S9). The clusters of mutants that we observe in figure 4 do not change substantially when we recluster the data. We will add PCA (see above) to these analyses in our revised manuscript.

      (6) We will include additional data showing that mutants in different clusters have different evolutionary origins: Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole (see Fig 4E and Fig 5C). In our revised manuscript, we will show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see figure panel A below). No other cluster’s evolutionary history shows this pattern (figures 6, 7, and 8).

      (7) We will include additional data showing that mutants in different clusters have different growth curves: Cluster 1 lineages are unique in that their fitness advantage is specific to low flu and trades off in higher concentrations of fluconazole. We obtained growth curves for three cluster 1 mutants (2 SUR1 mutants and 1 UPC2 mutant). We compared them to growth curves for three PDR mutants (from clusters 2 and 3). Cluster 1 mutants appear to have the highest growth rates and reach the higher carrying capacity in low fluconazole (see red and green lines in Author response image 2 panel B below). But the cluster 1 mutants are negatively affected by higher concentrations of fluconazole, much more so than the mutants from clusters 2 and 3 (see Author response image 2 panel C below). This is consistent with the different fitness tradeoffs we observe for each cluster (figures 4 and 5). We will include a more detailed version of this analysis and the figures below in our revised manuscript.

      Author response image 2.

      Validation experiments demonstrate that cluster 1 mutants have uniquely high fitness in only the lowest concentration of fluconazole. (A) The mutant lineages in cluster 1 were largely sampled from evolution experiments performed in low flu. This is not true of other clusters (see pie charts in main manuscript). (B) In low flu (4 𝜇g/ml), Cluster 1 lineages (red/UPC2 and green/SUR1) grow faster and achieve higher density than lineages from clusters 2 and 3 (blue/PDR). This is consistent with barseq measurements demonstrating that cluster 1 mutants have the highest fitness in low flu. (C) Cluster 1 lineages are sensitive to increasing flu concentrations (SUR1 and UPC2 mutants, middle and rightmost graphs). This is apparent in that the gray (8 𝜇g/ml flu) and light blue (32 𝜇g/ml flu) growth curves rise more slowly and reach lower density than the dark blue curves (4 𝜇g/ml flu). But this is not the case for the PDR mutants from clusters 2 and 3 (leftmost graph). These observations are consistent with the bar-seq fitness data presented in the main manuscript (Fig 4E).

      With all of these validation efforts combined, we are hopeful that the reviewer is now more convinced that our clusters capture groups of mutants with different fitness tradeoffs (as opposed to noise). We want to conclude by saying that we are grateful to the reviewer for making us think deeply about areas where we can include additional validation efforts as well as areas where we can make our manuscript clearer.

      Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      We are very grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      We think that phrasing the “jump” as a question might help lay readers get from point A to point B. So, in the introduction of our revised manuscript, we will add a paragraph roughly similar to this one: “If two groups of drug-resistant mutants have different fitness tradeoffs, does it mean that they provide resistance through different underlying mechanisms? Alternatively, it could mean that both provide drug resistance via the same mechanism, but some mutations come with a cost that others don’t pay. However, another way to phrase this alternative is to say that both groups of mutants affect fitness through different suites of mechanisms that are only partially overlapping. And so, by identifying groups of mutants with different fitness tradeoffs, we argue that we will be uncovering sets of mutations that impact fitness through different underlying mechanisms. The ability to do so would be useful for genotype-phenotype mapping endeavors.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      In our revised manuscript, we will carefully review all citations. The issue may stem from our attempt to reach two different groups of scientists. We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). Though the 3 papers the reviewer mentions on lines 132 - 133 all pertain to yeast, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should apply broadly, beyond yeast. Similarly, the reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the genotype-phenotype-fitness map should apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So we cited papers from across the tree of life to support this sentence.

      On the other hand, because we study drug resistant mutations, we also hope that our work is of use to scientists studying the evolution of resistance. We agree with the reviewer that in this regard, some of our findings may be especially pertinent to the evolution of resistance to antifungal drugs. We will consider this when reviewing the citations in our revised manuscript and add some text to clarify these points.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae).

      In the revised manuscript, we will make clear that we study S. cerevisiae.

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance. Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper.

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

      Perhaps because our background lies in general study of the genotype-phenotype map, we did not want to make bold assertions about how our work might apply to pathogenic yeasts. But we see how this could be helpful and will add some discussion points about this. Specifically, we will discuss which of the genes and mutants we observe are also found in Candida. We will also investigate whether our observation that low fluconazole represents a seemingly unique challenge, not just a milder version of high fluconazole, has any corollary in the Candida literature.

    1. Author Response

      We would like to thank the reviewers for their thoughtful feedback on our work. One important point that they bring up is a potential issue with our method for accounting for excess NCO events that are detected due to increased marker resolution in the introgressed regions. The method we chose was to simulate average sized NCO tracts over both introgressed and non-introgressed windows to determine the expected increase in NCO detection due to marker density. We then took that expected increase and used it to correct our per-window NCO counts in all windows. We used these corrections for all results and analysis involving genomic windows (maps and genomewide comparisons) but did not include them when focusing on introgression-specific characteristics (e.g. analyzing fine-scale sequence differences around NCO tracts in introgressed regions). We chose this method based on previous work in the field and after some additional analyses on our own data that we did not include in the final manuscript. We will attempt to better communicate our decision making process and include some of the exploratory results that guided us in our revised manuscript. We look forward to responding to all comments and highlighting additional aspects of our findings that we think are of interest to the evolution and recombination communities, including significant changes to the recombination landscape between closely related strains and the impact of introgression on allelic shuffling.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the role of cylicin-1 (CYLC1) in sperm acrosome-nucleus connections and its clinical relevance to male infertility. Using mouse models, the researchers demonstrate that cylicin-1 is specifically expressed in the post acrosomal sheath-like region in spermatids and plays a crucial role in mediating acrosome-nucleus connections. Loss of CYLC1 results in severe male subfertility, characterized by acrosome detachment and aberrant head morphology in sperm. Further analysis of a large cohort of infertile men reveals CYLC1 variants in patients with sperm head deformities. The study provides valuable insights into the role of CYLC1 in male fertility and proposes CYLC1 variants as potential risk factors for human male infertility, emphasizing the importance of mouse models in understanding the pathogenicity of such variants.

      We appreciate the comprehensive summary of reviewer 1.

      Strengths:

      This article demonstrates notable strengths in various aspects. Firstly, the clarity and excellent writing style contribute to the accessibility of the content. Secondly, the employed techniques are not only relevant but also complementary, enhancing the robustness of the study. The precision in their experimental design and the meticulous interpretation of results reflect the scientific rigor maintained throughout the study. Furthermore, the decision to create a second mouse model with the exact CYLC1 mutation found in humans adds significant qualitative value to the research. This approach not only validates the clinical relevance of the identified variant but also strengthens the translational impact of the findings.

      We appreciate the positive comment of reviewer 1.

      Weaknesses:

      There are no obvious weaknesses. While a few minor refinements, as suggested in the recommendations to authors, could enhance the overall support for the data and the authors' messages, these suggested improvements in no way diminish the robustness of the already presented data.

      In the recommendation for the authors, reviewer 1 mentioned a recent study (Schneider et al., eLife, 2023) showing that Cylc1-KO mice exhibits a reduced sperm count, an observation not noted in our current study. We would like to comment that that main and most important phenotype of Cylc1-KO mice in both studies is quite similar, including male subfertility and abnormal head morphology. We think the different targeting strategy and mouse strain may cause this discrepancy. In Schneider’s and our current studies, the total motility abnormality of Cylc1-KO mice are not observed. We appreciate the suggestion of reviewer 1 to further examine the detailed parameters of motility such as VCL, VSL, and ALH. Given that the head deformation is the most obvious phenotype of Cylc1-KO mice and the focus of our study, we feel sorry that this detailed analysis of sperm motility was not performed in the current stage. Reviewer 1 also asked whether Cylc1-KO female mice are fertile or not. Given that Cylc1 is an X chromosome gene and Cylc1-KO (Cylc1-/Y) mice are severely subfertile, we do not obtain enough Cylc1-KO female mice to examine their fecundity. We also would like to thank reviewer 1 to point out several inaccurate descriptions.

      Reviewer #2 (Public Review):

      Summary:

      To verify the function of PT-associated protein CYLC1, the authors generated a Cylc1-KO mouse model and revealed that loss of cylicin-1 leads to severe male subfertility as a result of sperm head deformities and acrosome detachment. Then they also identified a CYLC1 variant by WES analysis from 19 infertile males with sperm head deformities. To prove the pathogenicity of the identified mutation site, they further generated Cylc1-mutant mice that carried a single amino acid change equivalent to the variant in human CYLC1. The Cylc1-mutant mice also exhibited male subfertility with detached acrosomes of sperm cells.

      We appreciate the comprehensive summary of reviewer 2.

      Strengths:

      The phenotypes observed in the Cylc1-KO mice provide strong evidence for the function of CYLC1 as a PT-associated protein in spermatogenesis and male infertility. Further mechanistic studies indicate that loss of cylicin-1 in mice may disrupt the connections between the inner acrosomal membrane and acroplaxome, leading to detached acrosomes of sperm cells.

      We appreciate the positive comment of reviewer 2.

      Weaknesses:

      The authors identified a missense mutation (c.1377G>T/p. K459N) from 19 infertile males with sperm head deformities. The information for the variant in Table 1 is insufficient to determine the pathogenicity and reliability of the mutation site. More information should be added, including all individuals in gnomAD, East Asians in gnomAD, 1000 Genomes Project for allele frequency in the human population; MutationTaster, M-CAP, FATHMM, and more other tools for function prediction. Then, the expression of CYLC1 in the spermatozoa from men with CYLC1 mutation should be explored by qPCR, Western blot, or IF staining analyses. Although 19 infertile males were found carrying the same missense mutation (c.1377G>T/p. K459N), their phenotypes are somewhat different. For example, sperm concentrations for individuals AAX765, BBA344, and 3086 are extremely low but this is not observed in other infertile males. Then, progressive motility for individuals AAT812, 3165, 3172, 3203, and 3209 are extremely low but this is also not observed in other infertile males. It is worth considering why different phenotypes are observed in probands carrying the same mutation.

      We appreciate the suggestion of reviewer 2. First, Table 1 shows the information of the variant identified in CYLC1 gene, including allele frequency in gnomAD and functional prediction by SIFT, PolyPhen-2, and CADD. Given that mutant mice is a gold standard to confirm the pathogenicity of a variant, we generate Cylc1-mutant mice and Cylc1-mutant mice exhibit male subfertility with sperm acrosome detachment. The animal evidence is much more solid than bioinformatics prediction to confirm the pathogenicity of the identified variant in the CYLC1 gene. Second, the expression of CYLC1 in the spermatozoa from patients have been examined by IF staining (Fig. 5B). Unfortunately, the patients declined to continue in the project to donate more semen for qPCR and Western blot analyses. Third, the reviewer 2 asks why not all patients with CYLC1 gene mutation show the identical phenotype. Although some patients exhibit low sperm count or reduced motility, sperm head deformities are the shared phenotype of 19 patients. Many factors, such as way of life, may affect sperm quality. Perfectly identical phenotype of all 19 patients carrying the CYLC1 mutation is idealistic and will not always happen in clinical diagnosis. We also appreciate other suggestions from reviewer 2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major points:

      R1C1: I appreciate that the data are aligned, in some points, with related studies of this niche. However, it would help the reader to have this alignment explored more extensively in the Discussion as well.

      Answer: We acknowledge that the discussion would benefit from additional comparisons to the available datasets. We thus add the following comment after the first paragraph of the discussion: “Previous studies of the different sub-populations of SVZ progenitors were carried out using transcriptomic approaches based on the expression of various more or less specific markers. These approaches have made it possible to identify quiescent and activated neural stem cells as well as mature neuroblasts, but have been faced with the strong influence of the cell cycle on cell clustering. Indeed, neural progenitors in these studies cycling have been gathered in either “mitotic” clusters (Llorens et al. 2015, Zywitza et al. 2018, Cebrian et al. 2021) or “neural progenitor cells” clusters (Dulken et al. 2017) that had no clear biological significance and hindering identification of subtypes of SVZ cycling progenitors. Our study, combining, for the first time, characterization of Facs-isolated cells and an irradiation-based model of sequential regeneration, allowed to clearly distinguish the molecular profiles of TAP and iNB among cycling progenitors reflecting differences in their in vitro and in vivo respective potentials”.

      R1C2: The data on multilineage differentiation, both in culture and upon engraftment, would be greatly strengthened by quantification. What is the relative yield of TUJ1/DCX-positive cells versus the other marker combinations? Specifically regarding the multilineage differentiation in vitro - because different media conditions are used to generate each lineage, it may be difficult to determine relative yield. Could a differentiation system that allows production of all 3 lineages be used instead?

      If the fraction of non-DCX/TUJ1-labeled progeny is low, particularly in vivo, this might suggest that while multilineage differentiation is possible, it is a much less likely cellular state outcome than production of mature neuroblasts. Some suggested references with examples of the culture conditions, experimental conditions, and discussions highlighted in the public review: Culture conditions that allow simultaneous trilineage differentiation. PMID: 17615304 Influence of culture conditions on potency: similar to issues covered in PMID: 21549325.

      Answer: We agree with the reviewer that quantification of a multilineage differentiation in vitro would improve the characterization of the relative potencies of the different SVZ progenitor.

      According to PMID: 17615304 and PMID: 21549325, and in agreement with our own experience, the only culture condition that allows neurosphere-derived neural progenitors to differentiate in vitro into the three lineages is the removal of mitogens from the culture medium. However, this does not work on freshly isolated SVZ cells, which remain in an undifferentiated state in this condition.

      This is why we chose to use specific differentiation media for each of the 3 lineages as in Figure 1C. It is also for this reason that we performed as many experiments as possible in vivo rather than in vitro as in Figure S2. In the new version, we have added a quantitative analysis of stainings by antibodies against GFAP, CNPase or DCX of GFP-positive cells persisting at IS, where high number of grafted cells were found in Figure S2B. This was performed by using the NIS software measuring eGFP-, GFAP-, CNPase- and DCX-positive areas. The intersection between each marker and eGFP areas was then determined as a percentage of staining (Figure S2C). The results showed that approximately one third of GFP+ cells expressed GFAP or DCX. The quantitative analysis of CNPase expression was complicated by CNPase-positive host cells, but the stronger CNPase staining in eGFP-positive areas clearly revealed the expression of CNPase by a significant proportion of eGFP-positive cells.

      R1C3: Additionally, for claims similar to what is currently made in the text, it would be extremely valuable to confirm the purity of the sort for each population - for example by fixing and staining the sorted fraction with additional antibodies that confirm cell identity.

      Answer: We have previously shown in Daynac et al. 2013 that s-iNB expressed the neuroblast markers CD24 and DCX, but also markers of neural progenitors such as Mash1, a basic helix-loop-helix transcription factor. As suggested by the reviewer, we have further investigated the expression of other markers of neural progenitors by sorted cells. The results showed that the proportion of DLX2+ cells a marker of proliferating progenitors (Doetsch et al. 2002) was very high in aNSC/TAP (98%) and progressively decreased in iNB (82%) and mNB (25%). Similarly, the expression of the transcription factor SOX2 that plays an essential role in the maintenance of neural progenitors (PMID: 25126380) accounted for 78% of aNSC/TAP, 70% of iNB and 17% of mNB.

      Altogether, these new data confirmed the identity of the different cell populations and particularly that of iNB. They are commented at the beginning of the Results and shown in Figure S1.

      R1C4: Line 125: GFAP alone doesn't necessarily indicate a "conversion to NSCs" - this conclusion could be greatly strengthened by inclusion of more markers, particularly at the protein level, or cyto-architectural studies.

      Answer: We agree with the reviewer that GFAP expression alone is not sufficient to evidence the presence of NSC in the SVZ. We have thus modified the text accordingly: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with eGFP+s-iNB and eGFP+s-NSC/TAP (Fig. 1Db, Fig. 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R1C5: Could these cellular states be reflective of preferential translation of DCX? It would be very helpful to see the flow cytometry sort data for iNBs / mNBs used in Figure 6, particularly if these cells were also fixed and stained directly for DCX protein.

      Answer: As suggested by the reviewer, freshly FAC-sorted iNB and mNB were fixed and labelled with an anti-DCX monoclonal antibody after permeabilization. As shown in the figure below, we found a higher level of DCX expression in mNB than in iNB. Therefore, this result tends to indicate that the proliferation capacity is somehow related to the level of DCX expression. However, because of the relatively low importance of this result, we decided not to include them in the manuscript.

      Author response image 1.

      Modal histogram representation of DCX expression level in unstained, iNB and mNB cells determined by flow cytometry (FlowJo).

      <R1C6: Figure S8 is all zeroes, showing the GFP+Dcxhigh NBs do not retain proliferative capacity. But we don't get a direct experimental comparison to EGFPnegative/lowDcxlow iNB engraftment, which would strengthen the conclusions of the paper.

      Answer: Unfortunately, there is no method available to analyse the eGFPnegative/lowDcxlow iNB engraftment: by definition, these cells do not express eGFP and the use of a tracker is not appropriate for long periods of time — and thus a high number of cell divisions — after engraftment. However, to us, this control is not needed to conclude that GFP+Dcxhigh iNB have no (or at least a lower) stem cell potential in vivo considering that we have shown in Figure 1 and Table 1 that the whole iNB population is able to generate the different types of neural cells.

      R1C7: Transplant data in Table 1 - a relatively small proportion of transplant derived cells are in OB, etc. Given that A cells are thought to cycle at least once in vivo, is this expected?

      Answer: The reviewer is right considering that a relatively small proportion of transplant derived cells were found in the OB. However, we should consider that we used immunocompetent mice as receivers, which could have significantly reduced the engraftment efficiency, and the migration of engrafted cells outside the injection site.

      R1C8: A caveat is that there is not much functional testing of the proposed model, especially for the interconversion of iNB states suggested by the diagram in Figure 7. The text is relatively restrained in proposing this model, so it is reasonable to keep - but perhaps should be noted that this part of the model will need additional testing.

      Answer: Data presented in Figure 6 clearly suggest that Dcxhigh iNB have similar in vitro potential than Dcxlow iNB, whereas they don’t have such potential in vivo (Figure S10). This suggests that, providing they are in appropriate conditions, Dcxhigh iNB could reacquire stem/progenitor properties. However, we agree that this hypothesis requires further investigation. Therefore, as suggested by the reviewer, we have added in the Figure 7 legend: “Possible interconversion of iNB states would require further experimental confirmation.”

      Additional minor points:

      R1C9: Introduction: the SVZ is described as "the lateral wall" - however, several works in the mouse have also examined the medial wall and callosal roof, as cited later in the intro. Suggest rephrasing the second sentence (line 48) and later sentence (line 66) to clarify that "the SVZ" encompasses all of these subregions, they are not necessarily separate niches. Answer: As indicated by the reviewer, the SVZ encompasses distinct subdomains, with NSCs having a regional identity based on their location in the lateral or septal wall of the ventricle and generating different types of neuronal and glial progeny (PMID:34259628.). To address the reviewer concern about possible confusion and clearly indicate that SVZ encompass several subdomains, we have modified the sentence line 66 as follows: “Since then, the single cell RNA-sequencing has revolutionized the field and has made it possible to precisely elucidate the transcriptome of SVZ cells present in the LW and in the septal wall which also harbors NSC niches”.

      However, we did not modify the line 48, since in this sentence we just indicate that the largest neurogenic niche in the adult brain reside in the LW of the SVZ.

      R1C10: Line 77: "exposure" not "exposition"

      Answer: The error has been corrected in the revised manuscript.

      R1C11: As noted in the Public Review - the use of the term "D1/D2" cells seems likely to confuse readers who are also versed in dentate gyrus neurogenesis. Recommend removing this term from the manuscript.

      Answer: We agree that the D1/D2 terminology could bring confusion, D cells referring to Tanycytes in the hypothalamus. We now refer to iNB1 for DcxLow iNB and iNB2 for DcxHigh iNB in the revised manuscript.

      Reviewer 2

      Major comments:

      Lack of rigor

      R2C1: There is a lack of appropriate normalization controls for the microarray data. As there is a decreased level of transcription in quiescent NSCs, there needs to be a cell number control (spike-ins based on cell numbers). Without this normalization, the readout can be greatly skewed.

      Answer: We agree that qNSC are marked by a decreased level of transcription due to quiescence. To overcome this problem in the Clariom assays, we thus chose to calibrate each population, with a fixed amount of cRNA and cDNA using Hela cells as internal control. We totally agree that this method is not optimal but it appears to be efficient in the end. Indeed, it should be noticed that it has been adopted, thus with the same rigor, in other microarray studies published in the field (PMID: 24811379) and also on skeletal muscle cells (PMID: 29273087). Moreover, interestingly the transcriptomic signature of qNSC matches perfectly with those from other studies and particularly to those of related clusters in single cell experiments (including ours, Figure S5). This is probably linked to the fact that more importantly that the number of cells, the main characteristic of these cells is the lack of expression of genes involved in cell proliferation and metabolism. Whatever so, these data confirming previously published are not the main information of our manuscript, which is mainly dedicated to the characterization of proliferating cells, which is not impaired by our choices of normalization.

      R2C2: The absolute segregation of clusters in the single-cell analysis is currently entirely in agreement with the cell cycle stage. This suggests that in the author's analysis, the clustering in 3F is entirely shaped by the cell cycle, making that the defining characteristic of the author's definitions for their cell types. Has an analysis been done that regresses out cell cycle-associated genes to see if there are clusters for different cell states/types that are identified in the absence of cell cycle stage being the defining factor? (Barron and Li, 2016). For example, just as you would see a difference in cluster if you are a quiescent or activated NSC as compared to a neuroblast for example, even without the contribution of cell cycle. These are different cell types.

      Answer: We agree that cell cycle regression would theoretically allow for further discrimination between cycling cells along successive neurogenic stages. We have already performed regression using several methods, including regressing using S- and G2/M-score regression as indicated in the Seurat workflow, removing cell cycle-related PCs from UMAP calculation as used in the Cebrian-Sylla study, and using alternative gene sets such as the ones provided by the tricycle method (PMID: 35101061). These regression methods have all been used on our datasets, the original Cebrian-Sylla datasets and a combination of our datasets with the Cebrian-Sylla original datasets to increase cell number and clustering resolution. However, none of these methods modified the clustering of cycling cells.

      In fact, the strong influence of the cell cycle over clustering highlights the relevance of our depletion/replenishment approaches to decipher the molecular changes masked by the cell cycle, as discussed below.

      R2C3: The use of the DCX-CreERT2 line is a lineage tracing line. Once DCX is expressed, Cre recombines the DNA to allow for fluorescence. It is binary, on or off associated with DCX expression. And once on, it is always on, whether the cell is currently expressing DCX or not. As the authors had previously described a DCXlow condition, the eGFP- cells would not reflect DCXlow, but no DCX at all. And the eGFP+ cells may not be currently expressing DCX anymore. The authors should have used a system where the DCX promoter itself drives fluorescence.

      Answer: We took advantage of the DCX-CreERT2 line to demonstrate that some neural cells that have recently acquired DCX expression (i.e. eGFP+ iNB) could keep (or recover) the potential of neural progenitors in vitro. Of course, some of these GFP+ cells could have stopped to express DCX. This is probably the case when they differentiate into astrocytes and oligodendrocytes in vitro as shown in Figure 6.

      Whatever so, the use of the Dcx promoter as a direct driver of eGFP fluorescence would have totally impeded our capacity to demonstrate such changes in cell fate in vivo because of the impossibility to track oligodendrocytes or astrocytes derived from iNB because of the loss of Dcx expression.

      R2C4: The lack of analysis of images (differentiation, for example) limits the conclusions of the in-vitro data, and the images with unclear staining, limit the conclusions of the in-vivo experiments.

      Answer: This comment is similar to that of R1C2. We have now added a quantification in Figure S2.

      R2C5: The cited difference in splicing differences in cell types was interesting (though did not show up in the transcriptome enrichment analyses Fig S2) and would be something to further pursue, however, this was a very limited analysis. There was no further study of these splicing mediators beyond single-cell data.

      Answer: We now show enrichments of GO terms corresponding to mRNA splicing isoforms in the different types of sorted SVZ cells (Figure S4). This analysis clearly revealed that spliced genes in SVZ cells are mainly involved in neuron development and neurogenesis. Interestingly this also showed that qNSC logically differed from the other cell types by splicing concerning genes involved in mitosis and cell cycle, consistently with their quiescent state. More importantly, GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features. We agree with the reviewer that further analysis of splicing mediators would be very important for understanding molecular changes involved in neurogenesis. However, we think that it is largely beyond the scope of this study.

      R2C6: Fig 1C - Show values, not just pictures. You may need to shift your current differentiation paradigm to do so by removing growth factors instead of unique differentiation conditions.

      Answer: See the answer to R1C2.

      R2C7: Fig S1A - Stainings for GFAP and DCX are not clear. It is very hard to distinguish which cells are associated with these signals.

      Answer: This figure (now Figure S2A) shows an eGFP+iNB cell (white arrow) that has reached the rostral migratory stream and expressed DCX (inset a3), but not GFAP (inset a2). This is now indicated in the figure legend. We have also moved the arrow for more clarity.

      R2C8: Fig S1B2 - There is red staining everywhere, so it is very hard to see a specific CNPase signal.

      Answer: We have added a new figure (Fig S2B) distinguishing eGFP+CNPase+ cells (yellow arrows) from eGFP+CNPase- cells (white arrow).

      R2C9: Line 174 - It's the mRNA that you are detecting is being downregulated - be more specific as you are not showing protein downregulation.

      Answer: We specified, "encoding" a major splicing repressor in the Line 174 text to refer to the mRNA: “Interestingly, Ptbp1, encoding a major splicing repressor”.

      R2C10: Line 189 - text in this line have some clusters not shown in the figure - (clusters 6 and 15, DCX+ Ki67+ neuroblasts) - which would be an important thing to visualize. As is shown now, the authors are only showing that iNBs are similar to mitotic TAPs.

      Answer: Clusters 6 and 15 have been added to Figure S5.

      R2C11: Fig 3D-E - Why is cluster 17 called aNSCs (3E) when it has the highest GFAP (Fig 3D). Typically, the highest GFAP cells are qNSCs or astrocytes, not aNSCs.

      Answer: We previously reported that the level of gfap mRNA expression in neural stem cells (quiescent and activated) did not exactly reflect the amount of protein in these cells. This is the reason why we also used the Slc1a3 marker (Glast), which is highly expressed both at the RNA and protein levels in quiescent NSCs (Daynac et al. 2013).

      R2C12: Line 216 - You said in line 216 cluster 13 were astrocytes, then you said in line 227 that cluster 13 was s-qNSC. Which is it?

      Answer: This is due to the fact that we performed two distinct analyses.

      In the first one (line 216), cells were scored based on datasets provided by Cebrian et al. with one dataset containing genes enriched in astrocytes, and another one, genes enriched in quiescent B-cells. Therefore, cluster 13 was shown to contain 73% cells expressing astrocyte markers, whereas cluster 4 gathered cells expressing both qNSC (B-cells, 48%) and astrocyte (52%) genes.

      In the second one (line 227), cells were scored using our transcriptomic signatures of FAC-sorted SVZ cells, which do not include differentiated astrocytes. We demonstrated that the cluster 13 cells only expressed s-qNSC genes.

      R2C13: Line 214 - While other clusters were all named in lines 214-221 that were then further discussed in lines 227-230, clusters 15 and 19 were not. You associate both of those clusters with s-iNB - what was it associated with in the above section?

      Answer: Lines 219-221 have been reworded as follows: Clusters 10, 5, 15, 12, and 8 were defined as cycling progenitors based on the expression of proliferative markers such as Top2a, Mki67, Ascl1. Clusters 1, 3, 7 and 9 were identified as mNB due to the loss of Mki67, Top2 a and Ascl1 expressions and the expression of Robo2 and Dcx. Cluster 19 that have lost Ascl1 but still expressing Top2a and Mki67 together with Robo2 and Dcx appears at the transition between iNB and mNB.

      R2C14: Fig 3I-J - 5 days after irradiation, I would like to see from tissue slices how many cells are dividing compared to 1day post-irradiation and controls. In other paradigms, such as temozolomide experiments (Kalamakis et al), by 5 days we should see less cells in quiescence and more of those quiescent cells exiting quiescence into the cell cycle. Why would there be more cells in quiescence in the irradiated brain? Even if they are radiation resistant, the base number should be comparative between controls and irradiated, which is not what you show in Fig 3I-J. And R2C14)

      Line 234-235 - the text says normalized to numbers of qNSCs which is supposed to be the same (which I agree should be the same). However, your graph in 3I and J shows more qNSCs in irradiated conditions, which would influence greatly and is currently hard to interpret.

      Answer: As stated by the reviewer, there is no increase in the absolute number of quiescent cells in the irradiated SVZ. The reconstitution of SVZ cell populations after 4Gy irradiation has already been studied by our group (Daynac et al. 2013, see Fig. 3F), showing that s-iNB and s-mNB are still under-represented after 5 days, while qNSC are in similar numbers as in unirradiated SVZ. Therefore, this led to an over-representation of quiescent cells and early SVZ progenitors in Figure 3J as compared in Figure 3I.

      R2C15: Fig 6A - the authors show a significant difference in neurospheres between eGFP- (DCX-) and eGFP+ (DCX+) iNBs - as would be expected as DCX suggests a further commitment towards neurogenic fates, yet your population doubling is the same.

      Answer: To determine the population doublings, the medium was changed and cells numbered every 7 days. This condition masked the differences between two cell populations reaching the plateau phase at different time, explaining why eGFP-iNB and eGFP+iNB could not be clearly distinguished by this technique.

      R2C16: Fig 6C - Differentiation data (in-vitro) should be quantified in 6C, just as was mentioned for 1C. These values should be done for both of the populations (eGFP-iNB, and eGFP+iNB) and not just compared to the previous pictures which were on total iNB. Again, numbers are required, not just picture examples.

      Answer: Quantitative data have been given in Figure 6D showing that approximately 60-80% of cells eGFP+iNB are able to differentiate in either neurons, oligodendrocytes or astrocytes. We did not analyze the differentiation of eGFP-iNB since it would not add any supplementary information.

      R2C17: Fig S8 - The authors did not show if the lack of engraftment of eGFP+ cells is due to the transplant (previously you showed only 2/3 worked in a similar paradigm). It would be helpful if the authors would have some means to visualize the DCX low cells to confirm they worked as before in the transplantation (another color? Another type of mouse (Thy1 antigen differences)?) Answer: Unfortunately, the Thy1 antigen has not been documented in mouse subventricular zone progenitors, but only in neurons (PMID: 10813783). Thy1 antigen has also been described in bipotent glial progenitor cell (GCP) from the developing human brain giving rise to oligodendrocytes (PMID: 36931245).

      As shown, in Figure S10 we have performed 5 grafts with s-iNB eGFP+ cells, 2 alone and 3 mixed with eGFP- cells and never found any eGFP+ cells 5 weeks after grafting. Moreover, we did not find any eGFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (These data have been added to Figure S10). As compared to the results described in Figure 1 this clearly shows that iNB DCXhigh are not able to generate persistent cells in the grafted brains similarly as mNB.

      R2C18: Fig S8 - Why were there no eGFP cells even at the injection site? DCX expression promotes migration, indeed DCX expression becomes very high in cells in the SVZ as they begin to exit to go to the migratory stream. If one didn't see migration, one would expect you would still have survival. Currently, the authors show no cells at 5 weeks, however, they would need to show earlier timepoints as well to determine what is happening with these cells. It is possible these GFP+ cells are not even expressing DCX anymore (see above).

      Answer: As stated above, we did not find any GFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (see Figure S10).

      R2C19: Line 320 - the authors suggest a subpopulation of NEURONS continues to divide and cite 2 works from the 1990s showing proliferating SVZ cells can differentiate. Our knowledge of this system has come dramatically forward since the 1990s as well as technologically, and to date, neurons have not been shown to divide.

      Answer: We apologize for this lack of clarity, as we agree that neurons correspond to differentiated non-cycling cells, but we used the terminology used in these articles. The incorrect part of the sentence Line 320 has thus been deleted from the text.

      R2C20: Fig 7 - The whole figure is based on changing levels of RSR genes which were not confirmed in any way to be involved in any of these stages, only descriptively in single-cell analyses.

      Answer: As stated above, in our opinion, further characterization of the involvement of RSR genes in neurogenesis is largely beyond the scope of our manuscript. Nevertheless, we think that the role of RSR genes in neurogenesis is an important question that should be addressed in further studies.

      Overstatement of findings

      R2C21: Fig 1 - Authors did not compare all cell types in each condition but made overstatements about their relationships to each other between graphs. There should also be separate graphs showing all cell types at 4% and a separate one at 20%.

      Answer: In the revised version, Figure 1 shows the graph comparing all cell types at 4%O2 and a separate one at 20% as requested by the reviewer. The graphs clearly shows that 4%O2 promotes iNB proliferation compared to the 20% condition.

      R2C22: Fig 1D-b2 - Why does DCX look nuclear? One can't say they are only NSCs if they are GFAP as astrocytes also express GFAP. The authors would need another marker to separate those populations. In the text, the authors say expressing GFAP (line 124) which means NSC, but then in line 127 expressing GFAP means astrocytes - which further shows you need additional markers to validate those 2 different cell types. Answer: DCX nuclear translocation has been shown to improve cellular proliferation (PMID:32050972).

      As indicated in R1C4. The text has been modified as follows: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with s-iNB eGFP+ and s-NSC/TAP eGFP+ (Fig. 1Db, 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R2C23: Fig S2 - The transcriptome signature for s-iNBs is very similar to s-TAP, basically suggesting the iNBs are further along in cell cycle.

      Answer: This is now the Figure S3. Functional enrichment analysis of individual transcriptome signatures revealed that both s-TAP and s-iNB are enriched in genes related to the cell cycle although with different GO terms enrichments. Indeed, s-TAP are enriched in genes related to G1, G1/S and S phase (but with low -log10 adjusted p-values) and s-iNB with genes related to cell cycle mitosis and M phase (with high -log10 adjusted p-values).

      We have previously shown that around 33 % s-iNB have DNA content>2N, versus around 26% of s-TAP and s- aNSC (Daynac et al. 2013), which is in accordance with GO terms enrichments. However, these data have also shown that most s-iNB and s-TAP are in G1, indicating that siNB are not just further along mitosis than TAP.

      Moreover, our transcriptomic data clearly show that s-iNB are distinct from s-TAP: 1) according to principal component analyses (Figure 2B et C), the whole transcriptome of s-TAP is closer to that of s-aNSCs than to that of s-iNB (10% variations in PCA2), 2) the heatmap in Figure 2D shows that they have different RSR genes expression profiles, 3) the new Figure S4 shows that GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features, and 5) Figure S5 shows that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look closer to aNSC. Finally, scRNAsq cell clusters related to s-iNB are distinct from the cluster related to s-TAP as shown 1) in Figure 3D and 2) in Figure 4.

      R2C24: Fig 3 - The lack of information about timepoint 0 after irradiation, and when proliferation and cell cycle entry begins again following irradiation, limits our interpretation of the single-cell irradiated data.

      Answer: We have previously reported the relative abundance of each SVZ neural progenitors in the young adult mouse brain in several papers. Particularly, we based our interpretation on our SVZ irradiation model reported in Daynac et al. 2013 demonstrating a radio resistance of qNSC re-entering into the cell cycle as early as 2 days after 4Gy irradiation successively regenerating aNSC, TAP then iNB and mNB.

      R2C25: Fig S3 - These results effectively show that the s-aNSCs and s-TAPs are actually less specific when compared to that same identity in other studies, and that the iNBs are most similar to mitotic TAPs. This supports what was mentioned above, which is that the transcriptional signatures are very similar between the s-TAPs and i-NBs, showing these are not a unique cell state, but just a bit further along mitosis within the TAP cell state.

      Answer: This is now the Figure S5. In this figure, we show that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look like closer to aNSC. As indicated above in R2C23, s-iNB are not just a bit further along mitosis within the TAP cell state. Indeed, we give several data showing that s-iNB and s-TAP have different transcriptomic profiles.

      R2C26: Fig 4B - The focus on Ptbp1 as being associated with the iNB cluster border to mNB is expected as all previous studies of Ptbp1 have focused on its role in the progression of other cell types through the cell cycle, its control of cell cycle regulators, and a cell cycle mRNA regulon (Monzon-Casanova et al, 2018, 2019, 2020). This further supports these analyses are specifically defined by cell cycle stages.

      Answer: We totally agree that Ptbp1 expression distinguishes cycling cells from postmitotic neuroblasts in accordance with previously published paper, and that based on this unique gene we cannot find any differences between cycling cells ie. aNSC, TAP and iNB. However, as shown in the manuscript and stated above (R2C23 and 25), these cells can be distinguished by their respective expression of many other genes, including other RSR genes.

      R2C27: Line 281-282 is an overstatement - the authors suggest that this is a new type of cycling neural progenitor - when all studies point to it being the end of mitosis TAPs as they go on their way to mNBs. This clearly shows a trajectory and not a defined, binary cell type.

      Answer: We agree with this statement that the use of the word "type" was misleading, and changed it to "stage" to better reflect that s-iNB are a distinct stage along the differentiation process according to our pseudotime cell-trajectory analysis.

      Author response image 2.

      Pseudotime analysis using Monocle 3 (excluding the cluster 13 corresponding to astrocytes and starting from s-qNSC) revealed two branches starting from s-TAP, one towards cell cycle the other towards neuronal differentiation.

      minor comments:

      R2C28: Fig 3D - For ease, please define what you called the clusters in 3D - not just cluster numbers

      Answer: We chose not to call the clusters in 3D because their identification (Group names) is based on data presented after in Figures 3E, F and G.

      R2C29: Fig 3E-F - Show astrocytes by text in 3E and F

      Answer: As discussed above, astrocytes cannot be shown in these figures because they are based on our signatures which did not include astrocyte signature.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the insightful feedback provided by the editors and reviewers who have recognized the novelty of our study. We have mapped the spatial distribution of six endogenous somatic histone H1 variants within the nuclei of several human cell lines using specific antibodies, which strongly suggest functional differences between variants. We are submitting a revised version of the manuscript to accommodate the reviewers comments and recommendations.

      Reviewer #1 (Recommendations For The Authors):

      Minor Comments:

      (1) In Figure 1C, since H1.4 is uniformly distributed among the four sections (A1-A4), its levels are not expected to be significant among the four sections as depicted. Even the violin plots shown do not seem to be significantly different from each other. This requires an explanation.

      We agree with this reviewer that significant differences of H1.4 abundance within areas A1 to A4 seem to not exist, either looking at the images or the data violin plots, as discussed in the manuscript. Nonetheless, statistical testing gave this as significant, due to small differences and the elevated sample N of the analysis. It is clear that H1.4 does not show a relevant peripheral enrichment as shown for the other variants.

      (2) At the end, it would be better to include a figure panel depicting chart/table/pictorial representation, depicting the summary of the work done with respect to all the histone variants, as there are several histone H1 variants studied under different conditions and contexts.

      A table summarizing the location and characteristics of the different H1 variants has been included in the manuscript (Figure 6).

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors may consider adding controls for the specificity of the antibodies used for the studies. While the antibodies used here are commercial, it does not guarantee the quality for immunofluorescence, especially considering their unreliability in the past. The authors may consider including peptide/ recombinant protein-based adsorption controls in addition to knockdown or knockout controls. Having these data will strengthen the exciting observations presented in this MS and significantly increase the impact of the presented findings.

      We totally agree with the reviewers that the use of commercially available antibodies does not guarantee their quality and specificity. As this issue was crucial for our studies, we extensively assayed performance and specificity of the antibodies, using different approaches. The validations were shown in our previous publications where these antibodies where successfully used for ChIP-seq (Serna-Pujol et al. 2022 NAR 50:3892; Salinas-Pena et al. 2024 NAR doi 10.1093/nar/gkae014). In summary, performance of H1.0 (05-629l, Millipore), H1.2 (ab4086, abcam), H1.4 (702876; Invitrogen), H1.5 (711912, Invitrogen) and H1X (ab31972; abcam) antibodies was tested by Western-Blot, ChIP and proteomic analyses (all the results are included in Supplem. Figure 1 in Serna-Pujol et al. 2022 NAR 50:3892). Concretely, we tested specificity using inducible KDs for the depletion of each of the somatic H1 variants in T47D. We also checked that the antibodies did not recognize additional H1 variants using recombinant proteins or cell lines naturally lacking some of the variants. All the experiments confirmed that antibodies were variant-specific. In addition, when the corresponding epitope was absent, the antibodies did not gain new cross-reactivity with other variants. More recently, validation of the specificicity of the H1.3 antibody (ab203948) was performed following the same experimental approaches described for the rest of antibodies (Supplem. Figure 1 in Salinas-Pena et al. 2024 NAR doi 10.1093/nar/gkae014).

      (2) Histone H1 is overexpressed in several cancers. While the authors do not use an overexpression strategy, the cells used in this study are all cancer cell lines. The study would benefit greatly if some of the findings- primarily regarding the spatial distribution of the H1 were to reproduce in non-tumorigenic, diploid cells.

      We have also studied and discussed the spatial distribution of H1 variants in nontumorogenic cell lines 293T and IMR-90, and we have added this in the revised manuscript (Figure 5D and Figure 5-figure supplement 3). The nuclear radiality of H1.4 in 293T cells is also shown (Figure 5-figure supplement 4A).

      Reviewer #3 (Recommendations For The Authors):

      This is an interesting paper that provides convincing evidence of distinct distributions to individual histone H1 variants. There are several aspects of the study that leave me unconvinced that the study accurately captures histone H1 variant distributions.

      (1) Antibody accessibility: (see PMID: 32505195). One means to address this is to express a fluorescent protein-tagged version of histone H1 and demonstrate that the antibody can detect that tagged version of histone H1 independent of its location in the nucleus. In general, these FP-tagged H1s show a much more even distribution than what is observed here. Of course, that could reflect artifacts related to the fusion or the expression of the exogenous construct. However, even if all of the above are true, this will test the ability of the antibodies to recognize their epitopes in different chromatin environments. The fluorescent protein tag enables unambiguous knowledge of the presence or absence of the H1 histone.

      We have used cells expressing HA-tagged H1.0 variant and performed immunofluorescence with HA and H1.0 antibody to investigate co-localization, to test whether an H1 antibodiy recognize all the tagged protein in different chromatin environments or irrespective of its location in the nucleus. A very high correlation between the two antibodies has been found (Figure 1-figure supplement 1B).

      (2) At high concentrations, the fluorescence signal intensity can be quenched. For example, this is common with high-affinity histone H3 serine 10 phosphorylation antibodies in late interphase/prophase nuclei. The artifact can be minimized by serial dilution of the antibody and identifying the minimum usable concentration for immunofluorescence. While I am not certain that this is taking place here, the rate and manner that the intensity drops off from the periphery in the peripheral H1 variant distribution are very similar in appearance. There are biological explanations related to constraints on diffusion that one could imagine also explaining the data so I'm not stating that this must be an artefact. However, I am concerned that it might be. An improved staining may reveal the same result but more convincingly.

      We have performed immunofluorescence with serial dilutions of the H1.3 antibody to show that peripheral distribution was not due to fluorescence signal intensity quenching (Figure 1figure supplement 1A).

      (3) Histone H1 is highly mobile and there is some concern that they could reorganize during the relatively long period of time that it takes to fully fix the cells for both ChIP and immunofluorescence. This should be acknowledged in the manuscript.

      We have added this reviewer’ concern in the Discussion section.

      (4) The paper would benefit from a more rigorous quantification of histone H1 subtypes. Mass spectrometry would be ideal but more classical techniques such as 2D AU-SDS PAGE, HPLC, etc...would be an improvement over immunoblotting. The authors did not explain the quantification of the immunoblots and the assignment of relative contributions of H1 subtypes to the individual coommassie bands in the Image J section of methods, which is referred to as the method of quantification in the immunoblotting methods.

      We have further explained how the relative quantification of H1 variants in different cell lines was performed (Methods section). We agree that more sophisticated mass spectrometrybased quantification is desirable and we are collaborating to do this using internal H1 peptide controls (Parallel Reaction Monitoring), but this is out of the scope of this manuscript as the observed patterns of distribution of H1 variants do not depend on mild differences in variants abundance. Only the absence of H1.3 and H1.5 in some cell lines alters the distribution of other variants.

      Additional author responses to the Public Review comments made by some Reviewer:

      (1) Respect to the functional significance of the results presented here, we want to stress that as a consequence of the differential distribution and abundance of H1 variants among cell types, depletion of different variants has different consequences. For example, H1.2 depletion but not others has a great impact on chromatin compaction. Besides, cell lines lacking H1.3/H1.5 expression present a basal up-regulation of some Interferon stimulated genes (ISGs) and particular repetive elements, as it was previously described upon induced depletion of H1.2/H1.4 in a breast cancer cell line or in pancreatic adenocarcinomas with lower levels of replication-dependent H1 variants (Izquierdo et al. 2017 NAR 45:11622). So, our results reinforce the existing link between H1 content and immune signature. We have added this data in the revised manuscript (Figure 5-figure supplement 5).

      Moreover, we also analyzed the chromatin structural changes upon combined depletion of H1.2 and H1.4. Combined H1.2/H1.4 depletion triggers a global chromatin decompaction, which supports previous observations from ATAC-Seq and Hi-C experiments in these cells (Izquierdo et al. 2017 NAR 45:11622; Serna-Pujol et al. 2022 NAR 50:3892). Although H1 content is more compromised in these cells (30% total H1 reduction) compared to single H1 KDs, the phenotype observed could not be recapitulated when other H1 KD combinations, in which total H1 content was reduced similarly, were investigated (Izquierdo et al. 2017 NAR 45:11622), supporting that the deleterious defects were due to the non-redundant role of H1.2 and H1.4 proteins. Indeed, this manuscript supports this notion, as H1.2 and H1.4 show a different genomewide and nuclear distribution.

      (2) Our immunofluorescence data, together with ChIP-seq data, do not discard binding of H1 variants to a great variety of chromatin, but show enrichment or preferential binding to certain regions or chromatin types. Our data on the interphase nuclei does not suggest at all any type of quenching or saturation. Obviously, detection with antibodies depends on epitope accessibility, just like all immunofluorescence data ever published, and we have acknowledged that post-translational modifications of H1 may occlude antibody accessibility as some phospho-H1 antibodies give distribution patterns different than total/unmodified H1 antibodies. Thus, we cannot exclude that specific modified-H1s exhibit particular distribution patterns that are not being recapitulated in our data. This represents another layer of complexity in H1 diversity and we agree that exploration of the repertoire of H1 PTMs and their functional roles are an interesting matter of study that needs to be addressed. Still, our data is highly relevant as it demonstrates for the first time the unique distribution patterns of H1 variants among multiple cell lines and it does not use overexpression of tagged H1 variants that in our experience may produce mislocalization of H1s.

      (3) We do have investigated co-localization of H1 variants with HP1alpha protein and we have added this data in the revised version of this manuscript (Figure 1-figure supplement 1C-D).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors analyzed the causative association between circulating immune cells and periodontitis, and reported three risk immune cells related to periodontitis. The significance of the findings is fundamental, which substantially advances our understanding of periodontitis. The strength of evidence is convincing.

      Reviewer #1 (Public Review):

      Ye et al. used Mendelian randomization method to evaluate the causative association between circulating immune cells and periodontitis and finally screened out three risk immune cells related to periodontitis. Overall, this is an important and novel piece of work that has the potential to contribute to our understanding of the causal relationship between circulating immune cells related to periodontitis. However, there are still some concerns that need to be addressed.

      We sincerely appreciate the constructive feedback from the editor and reviewers, which has been instrumental in enhancing the quality of our manuscript.

      (1) The authors used 1e-9 as the threshold to select effective instrumental variables (IVs), which should give the corresponding references. Meanwhile, the authors should test and discuss the potential impact of inconsistent thresholds for exposure (1e-9, 5e-6 were selected by the author respectively) and outcome IVs (5e-8) on the robustness of the results.

      Thank you for your insightful comments. We have selected two GWAS databases as the data sources for the exposure group: the BCC Consortium with a sample size of 563,946, and the Sardinian cohort of 3,757. The considerable disparity in sample size between them may result in variations in outcomes, primarily showcased in the differences in positive SNP numbers. We, therefore, adopted an unconventional (non 5e-8) yet rigorously controlled screening strategy, an approach that is widely accepted in MR studies (Li et al., 2022; Liu et al., 2023). We believe that the present thresholds are sufficiently rigorous to guarantee the validity of the subsequent Mendelian randomization analysis.

      However, employing two distinct methods in exposure screening is not typical, and we posit that this method can be viewed as an innovative strategy, providing a reference for future research dealing with two databases with significant discrepancies (Huang et al., 2023; Kong et al., 2023). As you perceptively noted, we acknowledge that this strategy may exert a certain influence on the research outcomes, and we have factored this potential limitation into our manuscript. “Third, the considerable variation in sample size between the two exposure databases contributes to the discrepancies in the number of positive SNPs. Despite our exploration of multiple selection thresholds for IVs, the inconsistency in screening methods and the discrepancy in the included SNPs could potentially introduce bias.” (Page 14)

      As for the "outcome IVs with 5e-8" you mentioned, we didn't implement this screening threshold in the outcome IVs. Indeed, we applied the same screening criteria as specified at 5e-06 (refer to Stable 2). Is the statement that you're referring to the following: "Additionally, SNPs that displayed a direct association with the outcome would also be excluded to uphold the third MR assumption (P < 5e-8)" (Page 6)? In this context, we adopted a standard criterion in the IVs screening process to remove SNPs directly associated with the outcome.

      Reference

      Huang W, Wang Z, Zou C, Liu Y, Pan Y, Lu J, Zhou K, Jiao F, Zhong S, Jiang G. 2023. Effects of metabolic factors in mediating the relationship between Type 2 diabetes and depression in East Asian populations: A two-step, two-sample Mendelian randomization study. J Affect Disorders 335:120–128. doi:10.1016/j.jad.2023.04.114

      Kong L, Ye C, Wang Y, Zheng J, Zhao Z, Li M, Xu Y, Lu J, Chen Y, Xu M, Wang W, Ning G, Bi Y, Wang T. 2023. Causal effect of lower birthweight on non-alcoholic fatty liver disease and mediating roles of insulin resistance and metabolites. Liver Int 43:829–839. doi:10.1111/liv.15532

      Li P, Wang H, Guo L, Gou X, Chen G, Lin D, Fan D, Guo X, Liu Z. 2022. Association between gut microbiota and preeclampsia-eclampsia: a two-sample Mendelian randomization study. Bmc Med 20:443. doi:10.1186/s12916-022-02657-x Liu B, Lyu L, Zhou W, Song J, Ye D, Mao Y, Chen G-B, Sun X. 2023. Associations of the circulating levels of cytokines with risk of amyotrophic lateral sclerosis: a Mendelian randomization study. Bmc Med 21:39. doi:10.1186/s12916-023-02736-7

      (2) What is the reference for selecting Smoking, Fasting plasma glucose, and BMI as covariates? They do not seem to be directly related to immune cells as confounding factors.

      The variables of Smoking, Fasting Plasma Glucose (FPG), and Body Mass Index (BMI) are commonly used as covariates in multivariable Mendelian randomization studies (Kong et al., 2023; Liu et al., 2023). The association between Smoking, FPG, and BMI with immune cells may not be immediately apparent. However, these factors have been identified as potential confounders that could impact overall health, which in turn may indirectly modulate systemic immune responses, susceptibility, and inflammation.

      (1) . Smoking: It has been well-documented that smoking can cause inflammation and impair immune function, thereby increasing individual's susceptibility to infections and diseases (Shiels et al., 2014). As such, smoking is recognized as a covariate that could potentially influence the outcomes of an investigation into immune cells.

      (2) FPG: Elevated FPG levels indicate poor glycemic control, potentially leading to conditions like diabetes (Choi et al., 2018). Consequently, studies have demonstrated that elevated FPG levels can compromise the immune system's ability to combat infections.

      (3) BMI: It is a measure of body fat that takes into account a person's weight and height. Both obesities, characterized by a high BMI, and underweights, characterized by a low BMI, have been associated with a range of health issues, inclusive of a compromised immune system (Piñeiro-Salvador et al., 2022). Consequently, BMI is factored in as a covariate in this study.

      We have thus incorporated these factors as covariates in our study to mitigate their potential confounding effects. The selection of these covariates is primarily guided by previous research and established knowledge concerning the potential influences on immune function. We appreciate your query and will ensure to clarify this point in our revised manuscript. “We have incorporated covariates, including the number of cigarettes smoked, fasting plasma glucose (FPG) levels, and body mass index (BMI) into the MVMR analysis, given that these factors could indirectly affect systemic immune responses and inflammation (Liu et al., 2023).” (Page 6-7)

      Reference

      Choi S-C, Titov AA, Abboud G, Seay HR, Brusko TM, Roopenian DC, Salek-Ardakani S, Morel L. 2018. Inhibition of glucose metabolism selectively targets autoreactive follicular helper T cells. Nat Commun 9:4369. doi:10.1038/s41467-018-06686-0

      Kong L, Ye C, Wang Y, Zheng J, Zhao Z, Li M, Xu Y, Lu J, Chen Y, Xu M, Wang W, Ning G, Bi Y, Wang T. 2023. Causal effect of lower birthweight on non-alcoholic fatty liver disease and mediating roles of insulin resistance and metabolites. Liver Int 43:829–839. doi:10.1111/liv.15532

      Liu Y, Lai H, Zhang R, Xia L, Liu L. 2023. Causal relationship between gastro-esophageal reflux disease and risk of lung cancer: insights from multivariable Mendelian randomization and mediation analysis. Int J Epidemiol 52:1435–1447. doi:10.1093/ije/dyad090

      Piñeiro-Salvador R, Vazquez-Garza E, Cruz-Cardenas JA, Licona-Cassani C, García-Rivas G, Moreno-Vásquez J, Alcorta-García MR, Lara-Diaz VJ, Brunck MEG. 2022. A cross-sectional study evidences regulations of leukocytes in the colostrum of mothers with obesity. BMC Med 20:388. doi:10.1186/s12916-022-02575-y

      Shiels MS, Katki HA, Freedman ND, Purdue MP, Wentzensen N, Trabert B, Kitahara CM, Furr M, Li Y, Kemp TJ, Goedert JJ, Chang CM, Engels EA, Caporaso NE, Pinto LA, Hildesheim A, Chaturvedi AK. 2014. Cigarette smoking and variations in systemic immune and inflammation markers. J Natl Cancer Inst 106:dju294. doi:10.1093/jnci/dju294

      (3) It is not entirely clear about the correction of P-value for the total number of independent statistical tests.

      In our study, we used the Bonferroni correction to adjust the P-values for multiple comparisons. The adjusted P-value is calculated as the original P-value times the total number of independent statistical tests. Specifically, we applied multiple corrections in the following two aspects: First, we corrected the results of the FUSION algorithm in TWAS, with a correction value of P < 6.27 ×10-6 (0.05/7,890 genes) (Page 8). Second, we performed multiple corrections on the initial results of MR (P < 0.05/17 traits = 0.003). However, none of the results met the criteria after the correction, which is one of the limitations detailed in the discussion section of our study (Page 14).

      (4) The author used whole blood data to apply FUSION algorithm. Although whole blood is a representative site, the authors should add FUSION testing of periodontally relevant tissues, such as oral mucosa.

      We appreciate your insightful comments and suggestions. We concur that employing periodontally relevant tissues, like oral mucosa, for FUSION testing might yield more precise and pertinent results. However, in the Genotype-Tissue Expression project (GTEx) database, we could not find transcriptome data related to oral tissues, such as gums, oral mucosa, and alveolar bone (Review Table 1). Owing to the limitations of the database, in the context of our study, we primarily relied on whole blood data, given its availability and the extensive precedent documented in the literature for its utilization (Xu et al., 2023; Yuan et al., 2022).

      We acknowledge that this is a limitation of our study and will certainly consider incorporating periodontally relevant tissues in our future research. In the revised manuscript, we have explicitly stated this limitation and underscored the necessity for additional studies to corroborate our findings with periodontally relevant tissues. Fifth, we relied on the whole blood data For FUSION algorithm due to the lack of transcriptome data associated with oral tissues (such as gums, oral mucosa, and alveolar bone) in the GTEx database. “Fifth, we relied on the whole blood data For FUSION algorithm due to the lack of transcriptome data associated with oral tissues (such as gums, oral mucosa, and alveolar bone) in the GTEx database. This has led to an excessive focus on systemic immunological changes, thereby overlooking the significance of alterations in local periodontal tissue immunity. Such an oversight could potentially compromise the precision and pertinence of our research findings.” (Page 15)

      Author response table 1.

      Organizations and Samplesize in the GTEx database

      Reference

      Xu J, Si H, Zeng Y, Wu Y, Zhang S, Shen B. 2023. Transcriptome-wide association study reveals candidate causal genes for lumbar spinal stenosis. Bone Joint Res 12:387–396. doi:10.1302/2046-3758.126.BJR-2022-0160.R1

      Yuan J, Wang T, Wang L, Li P, Shen H, Mo Y, Zhang Q, Ni C. 2022. Transcriptome‐wide association study identifies PSMB9 as a susceptibility gene for coal workers’ pneumoconiosis. Environmental Toxicology 37:2103–2114. doi:10.1002/tox.23554

      (5) The authors chose gingival hyperplasia as a secondary validation phenotype of periodontitis in this study. However, gingival recession, as another important phenotype associated with periodontitis, should also be tested and discussed.

      We appreciate your insightful feedback highlighting the significance of incorporating gingival recession as a phenotype in periodontitis studies. Our emphasis on gingival hyperplasia in the study was primarily dictated by the initial study design and the data available from FinnGen R9K11. Notwithstanding the lack of gingival recession data in the available databases, we identified chronic gingivitis data in an earlier version of the Finnish database (FinnGen R5K11) as an alternative. We performed a Mendelian Randomization analysis on this dataset, with the results integrated into Supplementary Table 10. Concurrently, Table 1, Supplementary Table 1, Figure 4, and the corresponding descriptions in the manuscript were updated. We trust this adjustment can address the limitations identified in our research. We are confident that this not only augments the comprehensiveness of our study but also fosters a more holistic comprehension of periodontal disease.

      (6) This study used GLIDE data as a replicated validation, but the results were inconsistent with FinnGen's dataset.

      Thank you for your insightful comments and for bringing this issue to our attention. Indeed, it is of utmost importance to ensure the validity and reliability of our findings across various datasets. The observed inconsistency between the GLIDE data and FinnGen's dataset could be attributed to several reasons.

      Firstly, this discrepancy might originate from the differences in population composition. The former is grounded on a comprehensive meta-analysis of cohorts focusing on periodontitis, whereas the latter utilizes a dataset from a full-phenotype cohort. In the former, the ratio of periodontitis to the control groups is approximately 1:2. In contrast, the ratio in the latter seems to be minuscule. The sample size in the FinnGen data may not suffice to detect the effects observed in the GLIDE dataset, given that larger exposure sizes enhance the ability to detect genuine associations.

      Moreover, the heterogeneity of periodontitis can potentially result in variable outcomes. Phenotypic definition methods differ between the two databases. The GLIDE database diagnoses based on the criteria of Centers for Disease Control and Prevention/American Academy of Periodontology (CDC/AAP) and Community Periodontal Index (CPI) for physical signs. While the FinnGen database adopts the International Classification of Diseases (ICD) 10 standard for a comprehensive diagnosis. The former database employs a more practical yet broader standard for periodontitis, which might encompass pseudo-periodontitis.

      Finally, the observed differences could be attributed to the variations in immune responses at distinct stages of periodontitis. During the initial stages of periodontitis, neutrophils and macrophages primarily mediate the immune response. With the progression of the disease, the involvement of T cells and B cells increases, thereby leading to a more intricate immune response (Darveau, 2010). Besides, the immune system's response to these oral health conditions is not uniform and can be influenced by multiple factors, including the individual's overall health, genetics, and lifestyle, potentially impacting the results (Hung et al., 2023).

      Reference

      Darveau RP. 2010. Periodontitis: a polymicrobial disruption of host homeostasis. Nat Rev Microbiol 8:481–490. doi:10.1038/nrmicro2337

      Hung M, Kelly R, Mohajeri A, Reese L, Badawi S, Frost C, Sevathas T, Lipsky MS. 2023. Factors Associated with Periodontitis in Younger Individuals: A Scoping Review. J Clin Med 12:6442. doi:10.3390/jcm12206442

      Reviewer #2 (Public Review):

      This manuscript presents a well-designed study that combines multiple Mendelian randomization analyses to investigate the causal relationship between circulating immune cells and periodontitis. The main conclusions of the manuscript are appropriately supported by the statistics, and the methodologies used are comprehensive and rigorous.

      These findings have significant implications for periodontal care and highlight the potential for systemic immunomodulation management on periodontitis, which is of interest to readers in the fields of periodontology, immunology, and epidemiology.

      We greatly appreciate the positive feedback and valuable insights provided by the reviewer, which have significantly contributed to the improvement of our manuscript.

      Reviewer #2 (Recommendations for The Authors):

      *Abstract

      Line 30-32: "Two-sample bidirectional univariable MR followed by sensitivity testing, multivariable MR, subgroup analysis, and the Bayesian model averaging (MR-BMA) were performed to explore the causal association between them. " What does the term "them" refer to here, please clarify it. The research method here is unclear, please reorganize it.

      Line 39: "S100A9 and S100A12" here should be italic.

      We appreciate your meticulous suggestions and have revised the methods section accordingly. Additionally, the two genes have been highlighted in italics for emphasis.

      "Univariable MR, multivariable MR, subgroup analysis, reverse MR, and Bayesian model averaging (MR-BMA) were utilized to investigate the causal relationships. Furthermore, transcriptome-wide association study (TWAS) and colocalization analysis were deployed to pinpoint the underlying genes." (Page 1)

      Introduction

      Line 78-80: "As reported, the number of immune cells in periodontal tissue changes as periodontitis progresses, featuring an increase in monocytes, and B cells and a decrease in T cells." Does the author mean that both monocytes and B cells increase as periodontitis progresses?

      We are grateful for your meticulous reading and perceptive inquiries. We would like to confirm the accuracy of your understanding. In lines 78-80, our intended message was to communicate that with the progression of periodontitis, there is an increase in both monocytes and B cells in the periodontal tissue. This represents a typical immune response to the infection, where these cells play a pivotal role in counteracting periodontal pathogens. To enhance clarity, we have revised these lines in the manuscript as follows:

      "With the progression of periodontitis, there is a significant alteration in the quantity of immune cells present within the periodontal tissue. Specifically, an increase in the count of both monocytes and B cells is observed, whereas a decrease is noted in the count of T cells." (Page 3)

      Method

      Line 164-165: "As the main test, the MVMR-IVW method, offered by the MVMR-least absolute shrinkage and selection operator (MVMR-LASSO), and the MVMR-Egger method were chosen." The author's expression here is ambiguous.

      In response to your comment on the ambiguity in lines 164-165, we have revised the sentence for clarity. We hope this addresses your concern and clarifies our point more effectively.

      "The MVMR-IVW method was utilized as the primary test, supplemented by the MVMR-least absolute shrinkage and selection operator (MVMR-LASSO) and the MVMR-Egger method." (Page 7)

      Table 1: FinnGen has a greater sample size and more SNPs than GLIDE; why do authors choose the latter as the primary analysis?

      Our choice to utilize GLIDE as the primary analysis tool, instead of FinnGen, was mainly guided by the specific research question we aimed to address. Despite FinnGen offering a larger sample size and more SNPs, GLIDE offers a more specialized and targeted dataset that suits the unique requirements of our study. In most MR studies, a similar strategy is adopted, wherein a large database of disease GWAS meta is utilized for exploration, followed by validation in full phenotype cohort (such as UKBiobank and FinnGen) (Liu et al., 2023; Yuan et al., 2023). To summarize, the reasons may primarily include the following:

      Firstly, GLIDE offers a concentrated and targeted methodology for examining genetic data pertinent to periodontitis. This dataset is grounded in a comprehensive meta-analysis of cohorts centered on periodontitis, wherein the ratio of periodontitis cases to control groups is approximately 1:2. Conversely, the proportion in FinnGen seems to be negligible, given that it employs a dataset derived from a comprehensive phenotype cohort. Consequently, employing the GLIDE database as a primary investigative tool can generate more abundant genetic information associated with periodontitis.

      Furthermore, the methodological facets of GLIDE align more accurately with the analytical framework of our study. For instance, the diagnostic criteria methods vary between the two databases. The GLIDE database derives its basis from the Centers for Disease Control and Prevention/American Academy of Periodontology (CDC/AAP) and Community Periodontal Index (CPI) for physical indicators. In contrast, the FinnGen database employs the International Classification of Diseases (ICD) 10 standard for an exhaustive diagnosis. The former adopts a more pragmatic, yet broader, standard for diagnosing periodontitis. The latter continues to use concepts of diseases such as "chronic periodontitis", which have been replaced by "periodontitis" in the latest disease classification from the "2017 World Workshop on the Classification of Periodontal and Peri-Implant Diseases and Conditions" in the periodontal field (Caton et al., 2018).

      Reference

      Caton JG, Armitage G, Berglundh T, Chapple ILC, Jepsen S, Kornman KS, Mealey BL, Papapanou PN, Sanz M, Tonetti MS. 2018. A new classification scheme for periodontal and peri-implant diseases and conditions - Introduction and key changes from the 1999 classification. J Clin Periodontol 45 Suppl 20:S1–S8. doi:10.1111/jcpe.12935

      Liu Y, Lai H, Zhang R, Xia L, Liu L. 2023. Causal relationship between gastro-esophageal reflux disease and risk of lung cancer: insights from multivariable Mendelian randomization and mediation analysis. Int J Epidemiol 52:1435–1447. doi:10.1093/ije/dyad090

      Yuan S, Xu F, Li X, Chen J, Zheng J, Mantzoros CS, Larsson SC. 2023. Plasma proteins and onset of type 2 diabetes and diabetic complications: Proteome-wide Mendelian randomization and colocalization analyses. Cell Rep Med 4:101174. doi:10.1016/j.xcrm.2023.101174

      Result

      Line 224: "The observed significant results remained robust after removing pleiotropic SNPs." It is not clear what the authors mean by "remain robust".

      Line 229-231: "The causal relationship between neutrophils and periodontitis remained stable with no evidence of heterogeneity or pleiotropy." It is also not clear what the authors mean by "remain stable". How does the author get to the conclusion that there is no evidence of heterogeneity or pleiotropy?

      Figure S5: Please offer a brief explanation on how to investigate outlier or influential changes using scatter plots and Cochran's Q test and Cook's distance.

      Line 224: We apologize for the confusion caused by the term "remain robust". In the revised manuscript, we clarified this by stating, "The observed significant results are considered 'robust' if the effect of sensitivity analyses was identical to that of Inverse Variance Weighted (IVW) method, yielding a P-value less than 0.05." (Page 6)

      Line 229-231: We used the terms "remain stable" and "remain robust" interchangeably to express the same idea. To clarify, we have now unified the expression in the revised manuscript. As for the conclusion of "no evidence of heterogeneity or pleiotropy", it is derived from the results of Cochran's Q and Egger's intercept tests (P<0.05). We have added this explanation to the revised manuscript for better clarity.

      Figure S5: In the revised manuscript and Table, we have provided a succinct explanation regarding the investigation of outliers or influential changes as follows: " A genetic variant was defined as either an outlier or an influential variant if it possessed a q-value exceeding 10 or if its Cook's distance surpassed the median of the corresponding F-distribution. " (Page 7)

      We have made all the necessary changes in the revised manuscript based on your comments. We hope our responses and revisions adequately address your concerns.

      Discussion

      I have consulted several pieces of literature to ensure a thorough explanation, which may be helpful for your writing.

      (1) Hajishengallis G, Li X, Divaris K, Chavakis T. Maladaptive trained immunity and clonal hematopoiesis as potential mechanistic links between periodontitis and inflammatory comorbidities. Periodontol 2000. 2022;89(1):215-230. doi:10.1111/prd.12421

      (2) Hajishengallis G, Chavakis T. Mechanisms and Therapeutic Modulation of Neutrophil-Mediated Inflammation. J Dent Res. 2022;101(13):1563-1571. doi:10.1177/00220345221107602

      We appreciate your valuable feedback and the additional references you provided to enrich our manuscript. Upon receiving your comments, we have meticulously reviewed and incorporated the suggested literature into our revised manuscript. These references have furnished insightful information, which has been assimilated into the revised manuscript (Page 12) to enhance the explanation of the mechanisms of neutrophil-mediated inflammation and the potential association between periodontitis and inflammatory comorbidities.

      "The quantity and functionality of neutrophils both act as critical indicators of inflammation severity. The reduction in neutrophil count and inflammatory mediators, observed after successful periodontitis treatment, suggests a reduction in systemic inflammation (Hajishengallis , 2022)." (Page 12)

      "Trained myeloid cells have the potential to amplify the functionality of neutrophils, thereby fortifying the body's defense against subsequent infections. Nevertheless, within the framework of chronic inflammation, these cells could potentially intensify tissue damage (Hajishengallis and Chavakis, 2022)." (Page 12)

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewers for their constructive feedback. We have revised our manuscript to address some important concerns. The main changes are summarized as follows:

      (1) A major concern as reflected in the eLife assessment and reviewer comments, was that the “evidence supporting the conclusion that striatal neurons encode single-limb gait is incomplete.” We have now provided an expanded analysis of gait phase-locking to different limbs in Figure 2 – figure supplement 1. The analysis reveals three key new insights: 1) most striatal neurons are significantly entrained to only one or two limbs; 2) for neurons entrained to two limbs, most limb pairs are diagonal pairs, whose phases are closely aligned; 3) the strength of phase-locking, as measured by the mean vector length, is biased toward a single limb. From these results we conclude that striatal neurons are indeed better correlated with single-limb (as opposed to multiple limbs’) gait. However, we speculate that because of the inherently correlated motion across limbs, some neurons also display significant phaselocking to multiple limbs, particularly to diagonal pairs.

      (2) Reviewer 2 noted the lack of a manipulation experiment which would help establish the striatum’s relationship to gait control. We have therefore included the results of new experimental data in Figure 6 – figure supplement 2, in which we show that optogenetically activating D2 MSNs alters both some measures of whole-body motion and single-limb gait. We recognize that these experiments are not ideal, for example, the optical stimulation was not entrained to limb phase. Nevertheless, they hopefully allay any concern that the striatum is incapable of influencing gait performance.

      (3) We have further characterized the relationship between vector length and firing rate, and firing rate between D1 and D2 MSNs. We now show that: 1) vector length is negatively correlated with session-wide firing rate (Figure 2 – figure supplement 1E); 2) session-wide firing rates are similar between D1 and D2 MSNs in both healthy and dopamine lesioned animals (Figure 4D and Figure 6H). Thus, the imbalance in the vector length between D1 and D2 MSNs following dopamine lesions is unlikely to be explained by changes in the overall firing rates of these cells.

      (4) We have added new data similar to Figure 1 with distributions of stride frequency, duration, and length to illustrate the difference between sham and 6OHDA mice (Figure 5 – figure supplement 1B,C).

      (5) We have expanded the Discussion section to discuss a number of important points raised by the reviewers. These include: 1) speculating on the origins of gait coding in the striatum; 2) discussion of some literature which reported similar levels of D1/D2 MSN start coding in contrast to our results in healthy mice; 3) discussion of the finding that almost all phase-locked cells also have a firing rate related to speed or start/stop signals; 4) discussion of one of the limitations of the unilateral 6OHDA model, namely, the strong turning bias, and its potential implications for our results.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Yang et al combine high-speed video tracking of the limbs of freely moving mice with in vivo electrophysiology to demonstrate how striatal neurons encode single-limb gait. They also examine encoding other well-known aspects of locomotion, such as movement velocity and the initiation/termination of movement. The authors show that striatal neurons exhibit rhythmic firing phase-locked with mouse gait, while mice engage in spontaneous locomotion in an open field arena. Moreover, they describe gait deficits induced by severe unilateral dopamine neuron degeneration and associate these deficits with a relative strengthening of gait-modulation in the firing of D2-expressing MSNs. Although the source and function of this gait-modulation remain unclear, this manuscript uncovers an important physiological correlate of striatal activity with gait, which may have implications for gait deficits in Parkinson's Disease.

      Strengths:

      While some previous work has looked at the encoding of gait variables in the striatum and other basal ganglia nuclei, this paper uses more careful quantification of gait with video tracking. In addition, few if any papers do this in combination with optically-labeled recordings as were performed here.

      Weaknesses:

      The data collected has a great richness at the physiological and behavioral levels, and this is not fully described or explored in the manuscript. Additional analysis and display of data would greatly expand the interest and interpretability of the findings.

      There are also some caveats to the interpretation of the analyses presented here, including how to compare encoding of gait variables when animals have markedly different behaviors (eg comparing sham and unilaterally 6-OHDA treated mice), or how to interpret the loss of gait modulation when single unit activity is overall very low.

      (1) The authors use circular analysis to quantify the degree to which striatal neurons are phaselocked to individual limbs during gait. The result of this analysis is shown as the proportion of units phase-locked to each limb, vector length, and vector angle (Fig 2H-K; Fig 4E-F; Fig 6E-F). Given that gait is a cyclic oscillation of the trajectories of all four limbs, one could expect that if one unit is phase-locked to one limb, it will also be phase-locked to the other three limbs but at a different phase. Therefore, it is not clear in the manuscript how the authors determine to which limb each unit is locked, and how some units are locked to more than one limb (Fig 2H). More methodological/analytical detail would be especially helpful.

      We thank the reviewer for raising this important issue, which was not sufficiently explored in our original manuscript. This relates to a major concern that “evidence supporting the conclusion that striatal neurons encode single-limb gait is incomplete.” We have now prepared a new figure supplement to address whether neurons are preferentially entrained to only one or multiple limbs (Figure 2 – figure supplement 1, panels A-C).

      Author response image 1.

      Panels A-C. Phase-locking to different limbs.

      Panel A shows the percentage of striatal neurons (all neurons including untagged cells) with significant phase-locking to only 1, 2, 3, or all 4 limbs. The results indicate that most phaselocked cells are entrained to either only 1, or only 2 limbs, as opposed to 3 or all 4 limbs. We next looked more closely at the cells which were entrained to only 2 limbs: Panel B shows that a significant majority of those cells were coupled to diagonal limb pairs. This finding is insightful because diagonal limb pairs move at nearly the same phase during walking, thus some overlap in phase-locking to these limbs is to be expected. Finally, Panel C shows the mean vector length per neuron ranked from the highest to lowest value. The results reveal that the vector length is significantly biased toward the highest ranked limb. This bias would be absent if neurons were entrained to all 4 limbs with similar strength. Together, these results support the conclusion that striatal neuron spiking is preferentially coupled to single limbs as opposed to multiple limbs. However, we speculate that because of the inherently correlated motion across limbs, some neurons also display significant phase-locking to multiple limbs, particularly to diagonal pairs.

      (2) In Figures 2 and 3, the authors describe the modulation of striatal neurons by gait, velocity, and movement transitions (start/end), with most of their examples showing firing rates compatible with rates typical of striatal interneurons, not MSNs. In order to have a complete picture of the relationship between striatal activity and gait, a cell type-specific analysis should be performed. This could be achieved by classifying units into putative MSN, FS interneurons, and TANs using a spike waveform-based unit classification, as has been done in other papers using striatal single-unit electrophysiology. An example of each cell type's modulation with gait, as well as summary data on the % modulation, would be especially helpful.

      We appreciate the reviewer’s suggestion to analyze our data after classifying units into different putative cell types (MSN, FSI, TAN). Indeed, we have frequently adopted this practice in our other publications (e.g., Bakhurin & Masmanidis 2016, 2017; Lee & Masmanidis 2019). However, this study already relies on a more rigorous method – optogenetic tagging – to identify D1 and D2 MSNs. We felt that adding a second, more subjective and therefore less rigorous identification method based on spike waveforms would add unnecessary confusion in how the results are presented and interpreted. For example, we were unsure how to address the situation where an opto-tagged D1 or D2 MSN may be classified as a putative FSI or TAN according to spike waveform criteria. For this reason, we decided not to perform an analysis by putative MSN, FSI, and TAN. Finally, we have made all our electrophysiological data available should someone want to perform this analysis themselves.

      (3) By normalizing limb trajectories to the nose-tail axis, the analysis ignores whether the mouse is walking straight, or making left/right turns. Is the gait-modulation of striatal activity shaped by ipsi- and contralateral turning? This would be especially important to understand changes in the unilateral disease model, given the imbalance in turning of 6-OHDA mice.

      This is an important question, which our data are unfortunately underpowered to address. Lesioned mice turn sharply for nearly the entire duration of walking, while healthy mice walk in a nearly straight line, with occasional brief turning bouts. Thus, we do not have sufficient stride numbers during healthy turning to enable a rigorous analysis of gait phase locking during left/right turns. This raises some questions about the interpretation of the higher D2 MSN vector length in dopamine lesioned mice – does the higher vector length relate to the impaired gait, or the higher incidence of turning in this PD model? We have acknowledged this issue in the Discussion section as a limitation of the unilateral 6OHDA model. And, in future work we hope to investigate turning effects in more detail using behavioral arenas which force animals to turn left or right at specific locations.

      (4) It looks like the data presented in Figure 4 D-F comes from all opto-identified D1- and D2MSNs. How many of these are gait-modulated? This information is missing (line 110). Pooling all units may dilute differences specific to gait-modulated units, therefore a similar analysis only on gait-modulated units should be performed.

      The reviewer is correct that the data presented in Figure 4 comes from all optogenetically tagged cells. We have now included a new panel, Figure 4H, which shows the proportion of D1 and D2 MSNs which encode limb phase, body speed, or start/stop. The reviewer suggested that a similar analysis only gait-modulated units should be performed. We prefer to stick to our current approach (of using all cells, regardless of whether they show significant gait modulation) because it is less biased. For example, even cells which do not pass our threshold for statistical significance may display weak but visible gait modulation.

      (5) Since 6-OHDA lesions are on the right hemisphere, we would expect left limbs to be more affected than right limbs (although right limbs may also compensate). It is therefore surprising that RF and RR strides seem slightly shorter than LF and LR (Fig 5G), and no differences in other stride parameters (Fig 5H-J). Could the authors comment on that? It may be that this is due to rotational behavior. One interesting analysis would be to compare activity during similar movements in healthy and 6-OHDA mice, eg epochs in which mice are turning right (which should be present in both groups) or walking a few steps straight ahead (which are probably also present in both groups).

      Unilateral 6OHDA lesions are associated with ipsiversive turning (in this case, toward the right). The reviewer noted that the stride length is shorter for the two right compared to the two left limbs (Figure 5G), which is consistent with a right turning bias. In line with this observation, the stride speed for the right limbs also seemed slower than for the left limbs (Figure 5I), though we agree this is a bit difficult to see in the plot due to the choice of y-axis range. We appreciate the reviewer’s suggestion to analyze activity during similar movements in healthy and lesioned mice. As discussed in reply to their third comment above, our data did not contain sufficient bouts of straight walking in lesioned mice, or turning in healthy mice, to make such analysis possible. We have acknowledged this issue in the Discussion section as a limitation of the unilateral 6OHDA model. And, in future work we hope to investigate turning effects in more detail using behavioral arenas which force animals to turn left or right at specific locations.

      (6) Multiple publications have shown that firing rates of D1-MSN and D2-MSN are dramatically changed after dopamine neuron loss. Is it possible that changes observed in gait-modulation might be biased by changes in firing rates? For example, dMSNs have exceptionally low overall activity levels after dopamine depletion (eg Parker...Schnitzer, 2018; Ryan...Nelson, 2018; Maltese...Tritsch, 2021); this might reduce the ability to detect modulation in the firing of dMSNs as compared to iMSNs, which have similar or increased levels of activity in dopamine depleted mice. Does vector length correlate with firing rate? In addition, the normalization method used (dividing firing rate by minimum) may amplify very small changes in absolute rates, given that the firing rates for MSN are very low. The authors could show absolute values or Z-score firing rates (Figure 6 A, D).

      The reviewer asked a number of important questions here. First, is it possible that changes in gait modulation are biased by changes in firing rates? We have included a new analysis comparing the average session-wide firing rate of D1 and D2 MSNs (Figure 6D & 6H). This showed that firing rates were statistically similar between D1 and D2 MSNs for both sham and dopamine lesioned mice. Thus, it seems unlikely that the imbalance in vector length is purely due to changes in firing rate. The reviewer referenced some literature (e.g. Parker & Schnitzer; Ryan & Nelson; Maltese & Tritsch) which does appear to show significant changes in the relative firing levels of D1/D2 MSNs after dopamine lesions. While we can only speculate about the reason for the discrepancy (e.g., differences in measurement method, behavioral task, or analysis method), we note that not all prior literature has reported such changes (e.g., Ketzef & Silberberg 2017).

      Author response image 2.

      Panels D & H. No difference in firing between D1 and D2 MSNs.

      Second, does vector length correlate with firing rate? Interestingly, we found that indeed it does. We now show that vector length is negatively correlated with firing rate (Figure 2 – figure supplement 1E), implying that cells with higher overall firing rates tend to have weaker phaselocking to the gait cycle. Though not shown in the manuscript, we found a similar negative correlation for D1 and D2 MSNs in both healthy and dopamine lesioned mice.

      Author response image 3,

      Panel E. Vector length is negatively correlated to firing rate.

      Third, the reviewer asked about our normalization method in Figure 6A etc, in which we divide by the minimum rate. We would like to clarify that this normalization method was only used for visualizing our data, but not for calculating the vector length. Therefore, we chose to leave the plots as they are.

      (7) The analysis shown in Fig 3C should also be done for opto-identified D1- and D2-MSNs (and for waveform-based classified units as noted above).

      We have now performed the same analysis for optogenetically tagged D1 and D2 MSNs from healthy mice (Figure 4H). As with our original analysis, both populations showed a similar proportion of neurons which encoded limb phase, start of movement, body speed, and the combination of these. We did not perform this analysis for waveform-based classified units as per our reason outlined in reply to the reviewer’s second comment above.

      Author response image 4.

      Panel H. Venn diagrams showing the percentage of D1 and D2 MSNs with significant responses to limb phase of at least one limb, body speed, and start and/or stop of motion.

      (8) Discussion: the origin of the gait-modulation as well as the possible mechanisms driving the alterations observed in 6-OHDA mice should be discussed in more detail.

      Our Discussion section includes the following paragraph speculating on the origin of gait modulation: “Movement-related neural activity is widespread in many brain areas, and it is plausible that the striatum receives both motor and sensory signals involved in gait generation. For example, the primary motor cortex, which projects to dorsal striatum, has been shown to exhibit rhythmic spiking activity consistent with gait phase coding (Armstrong & Drew 1984), suggesting a shared mechanism underlying the production of this code.” We appreciate the request to also discuss the possible mechanisms driving the alterations in 6OHDA mice. But this is a very complex topic which our study is not aimed at addressing. The range of possible mechanisms uncovered in the literature is vast – from synaptic changes in striatal microcircuits, to altered intrinsic excitability of D1/D2 MSNs, and network-level alterations. Therefore, we preferred to keep the discussion focused on gait and movement coding.

      Reviewer #2 (Public Review):

      Summary:

      Yang et al. recorded the activity of D1- and D2-MSNs in the dorsal striatum and analyzed their firing activity in relation to single-limb gait in normal and 6-OHDA lesioned mice. Although some of the observations of striatal encoding are interesting, the novelty and implications of this firing activity in relation to gait behavior remain unclear. More specifically, the authors made two major claims. First, the striatal D1- and D2-MSNs were phase-locked to the walking gait cycles of individual limbs. Second, dopamine lesions led to enhanced phase-locking between D2-MSN activity and walking gait cycles. The second claim was supported by the increase of vector length in D2-MSNs after unilateral 6-OHDA administration to the medial forebrain bundle. However, for the first claim, the authors failed to convincingly demonstrate that striatal MSNs were more phase-locked to gait with single-limb and step resolution than to the global gait cycles.

      We thank the reviewer for their feedback and for their comment that “the authors failed to convincingly demonstrate that striatal MSNs were more phase-locked to gait with single-limb and step resolution than to the global gait cycles.” We now present new analysis demonstrating that neurons are more phase-locked to single-limb gait rather than multiple limbs (Figure 2 – figure supplement 1, panels A-C). These results are discussed in detail in response to Reviewer #1’s first comment. For conciseness we will not repeat the same response here but instead refer the reviewer to Reviewer #1, comment #1.

      Strengths:

      It is a technically advanced study.

      Weaknesses:

      (1) The authors focused on striatal encoding of gait information in current studies. However, it remains unclear whether the part of the striatum for which the authors performed neuronal recording is really responsible for or contributing to gait control. A lesion or manipulation experiment disrupting the part of the striatum recorded seems a necessary step to test or establish its relationship to gait control.

      We agree that our study – like many others which employ recordings – is largely correlative, and that a direct causal relationship was lacking. We have therefore decided to present some data which, despite some caveats, shows that the striatum is in principle capable of altering gait performance (Figure 6 – figure supplement 2).

      Author response image 5.

      Optogenetic activation of D2 MSNs alters whole-body movement and single-limb gait.

      These new results are from healthy mice (n=4) receiving optogenetic stimulation of D2 MSNs over a 5 minute period. Panels A-E show changes in a variety of whole-body measures of motion, mostly replicating the results of Kravitz & Kreitzer 2010. Panels F-I show changes (statistically significant or trending) in a variety of gait parameters, with the greatest effects found on the single-limb stride duration and stride speed. Interestingly, Kravitz & Kreitzer 2010 actually examined effects of this stimulation on gait; quoting from their paper: “we examined gait parameters in D1-ChR2 and D2-ChR2 mice in response to illumination, using a treadmill equipped with a high-speed camera. We quantified multiple gait parameters with the laser on and off, and found no significant differences in the average or variance of stride length, stance width, stride frequency, stance duration, swing duration, paw angle and paw area on belt for either line….This indicates that activation of direct and indirect pathways in the dorsomedial striatum regulates the pattern of motor activity, without changing the coordination of ambulation itself.” We wonder therefore if the reviewer’s comment about causality may have stemmed from the negative result in Kravitz & Kreitzer 2010. In any event, we now present results which firmly show a link between striatal D2 MSNs and gait. To be clear, we are not claiming that Kravitz & Kreitzer’s study was fundamentally flawed, but that perhaps their ability to resolve gait changes using a commercial treadmill system, or their choice of dorsomedial as opposed to more lateral regions of the striatum may have contributed to the negative result.

      It is also important to acknowledge a limitation of our optogenetic stimulation experiment. Our optical stimulation was not phase-locked to the gait cycle; thus, technically, we did not address whether the phase code per se is involved in producing gait. We mention this caveat in the manuscript. Despite this, we believe the new data address the reviewer’s concern about lack of causality.

      (2) The authors attributed one of the major novelties to phase-locking of striatal neural activities with single-limb gait cycles. The claim was not clearly supported, as the authors did not demonstrate that phase-locking to single-limb gaits was more significant than phase-locking to global walking gait cycles. In rhythmic walking, the LR and RF limbs were roughly anti-phase with the LF and RR limbs (Fig. 1D, E). In line with this relationship, striatal neurons were mainly in-phase with LR and RF limbs and anti-phase with LF and RR limbs (Fig. 2J, K). One could instead interpret this as the striatal neurons spanned all the phases of the global walking gait cycles (Fig. 3D). To demonstrate phase-locking with individual limb movements, the authors need to show that neural activities were better correlated with a specific limb than to the global gait cycles.

      We sincerely appreciate the reviewer’s comment. As described above we now present new analysis demonstrating that neurons are more phase-locked to single-limb gait rather than multiple limbs (Figure 2 – figure supplement 1, panels A-C). These results are discussed in detail in response to Reviewer #1’s first comment. For conciseness we will not repeat the same response here but instead refer the reviewer to Reviewer #1, comment #1.

      (3) The observation of the enhancement of coupling between D2 MSN firing and the gait cycles was interesting, but the physiological interpretation was not clear (as the authors also noted in the Discussion), which hampers the significance of the observation.

      In the Discussion we comment on the potential behavioral significance of our findings, keeping in mind the reviewer’s earlier concern about the correlative nature of recordings. For example, we speculate that the increase in D2 MSN limb phase-locking strength contributes to bradykinetic symptoms, specifically the production and maintenance of a normal gait cycle and rhythm. We respectfully disagree with the reviewer about the limited significance of the observations, as this is the first study to describe striatal gait phase coding in detail, noting that gait impairments are a major motor symptom in PD. We believe that progress in better understanding and eventually treating PD will be made through a combination of correlative observations (i.e., neural recordings) and causal manipulations. There are both advantages and disadvantages to correlative as well as causal experiments.

      (4) Due to the lack of causality experiments as mentioned in the first comment above, the observations of coupling between striatal neuronal activity and gait control might well result from a third brain region/factor serving as the common source to both, whether in normal or dopamine lesioned brain. If this is the case, the significance and implications of current findings will be greatly limited.

      As mentioned above we have included new data to address this concern (Figure 6 – figure supplement 2). Please refer to Reviewer #2, comment #4 for a detailed discussion of these results and their caveats.

      Reviewer #3 (Public Review):

      In this study, Yang et al. address a fundamental question of the role of dorsal striatum in neural coding of gait. The authors study the respective roles of D1 and D2 MSNs by linking their balanced activity to detailed gait parameters. In addition, they put in parallel the striatal activity related to whole-body measures such as initiation/cessation of movement or body speed. They are using an elegant combination of high-resolution single-limb motion tracking, identification of bouts of movements, and electrophysiological recordings of striatal neurons to correlate those different parameters. Subpopulations of striatal output neurons (D1 and D2 expressing neurons) are identified in neural recordings with optogenetic tagging. Those complementary approaches show that a subset of striatal neurons have phase-locked activity to individual limbs. In addition, more than a third of MSNs appear to encode all three aspects of motor behavior addressed here, initiation/cessation of movement, body speed, and gait. This activity is balanced between D1 and D2 neurons, with a higher activity of D1 neurons only for movement initiation. Finally, alterations of gait, and the associated striatal activity, are studied in a mouse model of Parkinson's Disease, using 6-OHDA lesions in the medial forebrain bundle (MFB). In the 6OHDA mice, there is an imbalance toward D2 activity.

      Strengths:

      There is a long-standing debate on the respective role of D1 and D2 MSNs on the control of movement. This study goes beyond prior work by providing detailed quantification of individual limb kinematics, in parallel with whole-body motion, and showing a high proportion of MSNs to be phase-locked to precise gait cycle and also encoding whole-body motion. The temporal resolution used here highlights the preferential activity of D1 MSN at the movement starts, whereas previous studies described a more balanced involvement. Finally, they reveal neural mechanisms of dopamine depletion-induced gait alterations, with a preponderant phase-locked activity of D2 neurons. The results are convincing, and the methodology supports the conclusions presented here.

      Weaknesses:

      Some more detailed explanations would improve the clarity of the results in the corresponding section. Analysis of the 6OHDA experiments could be expanded to extract more relevant information.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Panels I and J from Figure 6 are referred to in the text (line 158) but they don't exist.

      Thank you, we have corrected this in the text.

      (2) For the classification of striatal units into putative MSN, FS interneurons, and TANs, see Gage et al. DOI: 10.1016/j.neuron.2010.06.034 or Thorn et al. DOI: 10.1523/JNEUROSCI.178213.2014.

      As explained in the Public Reviews, Reviewer #1 comment #2 we opted not to perform an analysis by putative MSN, FSI, and TAN. We have performed analysis of different putative cell types in several of our other publications (e.g., Bakhurin & Masmanidis 2016, 2017; Lee & Masmanidis 2019). However, this study already relies on a more rigorous method – optogenetic tagging – to identify D1 and D2 MSNs. We felt that adding a second, more subjective and therefore less rigorous identification method based on spike waveforms would add unnecessary confusion in how the results are presented and interpreted. For example, we were unsure how to address the situation where an opto-tagged D1 or D2 MSN may be classified as a putative FSI or TAN according to spike waveform criteria. For this reason, we decided not to perform an analysis by putative MSN, FSI, and TAN. Finally, we have made all our electrophysiological data available should someone want to perform this analysis themselves.

      (3) The discussion section could be improved by elaborating on the origin and function of these gait signals in the striatum, as well as the mechanisms underlying changes in the 6-OHDA model. In addition, it would be important to discuss the limitations of this model, since unilateral 6-OHDA lesions may not accurately recapitulate parkinsonian gait deficits, as it results in a very asymmetric gait.

      Our Discussion section includes a paragraph speculating on the origin of gait modulation in the striatum, and another paragraph addressing the limitation that unilateral 6OHDA lesions induce gait asymmetry. We appreciate the request to also discuss the possible mechanisms driving the alterations in 6OHDA mice. But this is a very complex topic which our study is not aimed at addressing. The range of possible mechanisms uncovered in the literature is vast – from synaptic changes in striatal microcircuits, to altered intrinsic excitability of D1/D2 MSNs, and network-level alterations. Therefore, we preferred to keep the discussion focused on gait and movement coding.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors denoted the limb movement sequences as LR-LF-RR-RF, with limbs on the same left/right side moving first. However, considering multiple gait cycles, the sequence could also be described as RF-LR-LF-RR, with movements of the diagonal limbs temporally closer to each other, which was more intuitive from the visual inspection of Fig. 1D. The LR-LF-RR-RF denotation would make more sense if the authors could demonstrate that a walking bout almost always started from LR, as seen in the two examples in Fig. 1D.

      We designated the sequence as LR-LF-RR-RF to illustrate the lateral sequence pattern. But the reviewer is correct that a shifted version of this sequence, such as RF-LR-LF-RR, is also valid. We are not making any claim that the LR limb is always the first to move in a walking bout, but rather, that limbs on the same side of the body move one after the other, followed by the limbs on the opposite side. We have edited the text to hopefully clarify this point: “Mice walked with a lateral sequence gait pattern (e.g., LRLFRRRF), with the limbs on the same side of the body moving one after the other, followed by movement of limbs on the opposite side (Figure 1E).”

      (2) The study identified a biased D1-MSN activation at movement initiation, which was not reported in previous studies that relied on measuring calcium dynamics. The authors attributed the difference to the temporal resolution of electrophysiological versus optic methods. The authors would probably notice that in some previous studies that relied also on optic-tagging and electrophysiological recordings, start/stop activity was not found to be different between direct and indirect pathway MSNs. The authors should discuss these studies and offer some possible explanations.

      This is an oversight on our part, and we thank the reviewer for noting this. We are aware of one such study (Jin & Costa 2014); we apologize if other studies were missed. The Discussion has been updated as follows to discuss this paper: “We also note that another study employing optogenetic tagging did not find significant D1/D2 MSN differences is start/stop activity (Jin & Costa 2014). However, the movement being measured was an instrumental action (rewardguided lever pressing), as opposed to self-initiated motion examined in our work. This suggests either that imbalances between D1 and D2 MSN start activity may be more pronounced under specific behavioral conditions, or that results vary depending on how movement initiation and cessation events are identified.”

      (3) The authors could add some denotations to the peak firing rates in Fig. 3D to aid visualization, so that readers could get a sense of the distribution of neurons preferring each phase of the movements.

      We appreciate this suggestion. We tried adding various colored lines to denote the peak firing rates, but ultimately, we felt the lines were not helpful and potential deleterious for some readers. We thus decided not to add any lines to the plot.

      (4) Although the relative strength of D1/D2-MSN coding of body speed and movement cessation was found after dopamine lesion, it seemed that D1-MSNs cessation coding, as well as D1- and D2-MSN speed coding, were all altered after dopamine lesion (Fig. S3). The authors could mention these to avoid misunderstandings.

      We thank the reviewer for their observation. In the Results, we now mention that “while speed coding remained balanced between D1 and D2 MSNs, there was a substantial reduction in the speed coding score of both cell types after dopamine lesions.” The stop modulation index did not change appreciably.

      Reviewer #3 (Recommendations For The Authors):

      (1) A suggestion would be to put more emphasis in the title on the first parts of the study, i.e. detailed correlation between striatal activity and quantified motion, and not only focus on the dopamine depletion model.

      We considered other titles, but felt that our current choice is appropriate given that the study’s climax is with the dopamine lesion results in Figures 5 & 6.

      (2) The calculation and the significance of the vector length should be more detailed in the results as it is used all along as a measure of "the strength of neural entrainment to the gait cycle".

      We have added the following statement in the Results section to clarify the significance of vector length: “The vector length is a unitless parameter which can theoretically vary from 0 to 1, with 0 representing a neuron whose spikes occur at random limb phases, and 1 representing a neuron which always spikes at the same phase. Thus, higher vector length indicates a stronger entrainment of spiking activity to a specific limb phase.” For details on how vector length is calculated we refer readers to our Methods, specifically the section entitled “Gait phase coding analysis.”

      (3) There is no difference in the ipsi- or contralateral limbs while recordings are made only in the right hemisphere. Given that MSNs receive inputs from IT and PT neurons from the motor cortex, would it not be expected to have differences in the phase-locked activity to right versus left limbs? This is a question also with the dopamine depletion model which is performed with unilateral 6OHDA injections.

      This is something we also wondered and were somewhat surprised by the lack of a contralateral bias in the phase locking vector length, as shown in Figure 2 – figure supplement 1D. We have two hypotheses as to why there is no ipsi/contra-lateral bias. First, it is possible that striatal neurons receive similar levels of synaptic input signaling ipsi/contra-lateral limb movements. Second, the strongly correlated motion of diagonally opposed limbs may give the appearance that neurons that are phase-locked to one limb (e.g., LF) are also locked to the diagonally opposite limb (i.e., RR). We see evidence of this diagonal limb coupling in Figure 2 – figure supplement 1B.

      (4) Among the 45% of striatal neurons that display significant phase-locking to at least one limb, it would be interesting to describe the % of neurons being phase-locked to several limbs and whether they are specific subtypes. Are there animals with more phase-locked cells in several limbs?

      This is indeed a very interesting and important point which relates to the major concern that “evidence supporting the conclusion that striatal neurons encode single-limb gait is incomplete.” As described above we now present new analysis demonstrating that neurons are more phaselocked to single-limb gait rather than multiple limbs (Figure 2 – figure supplement 1, panels AC). These results are discussed in detail in response to Reviewer #1’s first comment. For conciseness we will not repeat the same response here but instead refer the reviewer to Reviewer #1, comment #1. With regard to whether there are specific subtypes, we performed the same analysis on optogenetically identified D1/D2 MSNs and found similar trends, but did not show these results in the manuscript to avoid redundancy.

      (5) The Venn diagram in Fig. 3C shows ~40% of striatal cells encoding body speed, single-limb and start/stop information. Nevertheless, this percentage is limited by the number of single-limb phase-locked cells as almost all have a firing rate related to body speed and start/stop signals. This could be discussed.

      This is a very interesting observation. Basically, the reviewer is noting that almost all the phaselocked cells also encode start/stop and/or speed. We have now updated the Discussion to specifically discuss this observation: “We found a different percentage of striatal neurons which encoded limb phase, movement initiation or cessation, and speed (Figure 3). Among these three categories, limb phase coding cells represented the smallest population with ~45% of neurons, as opposed to ~90% for start/stop or speed. In addition, nearly all phase coding cells were also significantly responsive to start/stop or speed, whereas a sizable proportion of start/stop or speed coding cells were not entrained to limb phase. It is unclear, however, whether these population size differences reflect a proportionally smaller role for the striatum in regulating single-limb gait as opposed to whole-body movement initiation, cessation or speed.”

      (6) D1/D2 analysis:

      For optogenetic identification of D1 and D2 neurons, 39 D1 neurons and 40 D2 neurons were extracted from the total of 274 recorded neurons while 222 neurons were optogenetically tagged according to the mat and meth. Were there any technical difficulties that made it difficult to identify more neurons?

      The low yield of optogenetic tagging is quite common in the literature due to the rigorous criteria which must be satisfied in order to qualify as a tagged neuron (e.g., Kvitsiani & Kepecs 2013). The number 222 neurons quoted in the methods reflects the entirety of optogenetically tagged neurons in this study. Our study contained 33 mice, thus the average number of tagged units per animal was 222/33 ~ 6.7 units/animal. This is actually comparable to or slightly better than the yield reported in some other striatal literature (see for example, Figure 1 of Ryan & Nelson 2018).

      It is mentioned that "a subset" of these were phase-locked to a single limb. It would be interesting to specify the exact percentage of those neurons for D1 and D2 populations.

      Phase-locking of D2 neurons seems less sharp than D1 neurons, with a lower firing rate (Fig. 4D), please comment. Also difference in vector length for LR while none for other limbs, why? There is a balanced activity of D1 and D2 MSNs during walking (speed) and single-limb movements, but more D1 MSNs active at movement initiation. Is it also true for stop signals? Are they separated based on the speed threshold of 20 mm/s?

      As mentioned above, our new analysis specifically examines the percentage of all neurons which are phase locked to a single limb (Figure 2 – figure supplement 1, panels A-C). We have performed the same analysis on optogenetically tagged D1/D2 MSNs and found similar trends, but not show these results in the manuscript to avoid redundancy. With regard to whether phase-locking of D2 is less sharp than D1 MSNs, the “sharpness” of phase-locking is characterized by the mean vector length. And we show that on average, the vector length is statistically the same for D1 and D2 MSNs in healthy mice (Figure 4F). The reviewer noted that the D2 vector length in Figure 4F appears visibly higher for LR while not for other limbs, however, this difference is not statistically significant. With regard to whether more D1 MSNs are active during movement cessation, we show that both sham and dopamine lesioned mice have similar levels of D1/D2 MSN activity during stop (Figure 6 – figure supplement 1, panels A & B). Details of how start, stop, and speed are calculated are provided in the Methods.

      The relationship between firing and body speed (Fig. 4H) displays differences between D1 and D2. If a speed inferior to 20 mm/s, corresponds to "start or stop signal" as mentioned in the mat and meth, then early difference would correspond to start, but still there is a difference between 20 and 100 mm/s and after 150 mm/s. These results should be commented on.

      The reviewer is correct that in the plot of firing rate vs body speed (Figure 4J), there visibly appears to be a difference between D1 and D2 MSNs at low speeds. However, according to our pre-determined measure of speed coding which relies on the correlation coefficient between firing rate and speed, D1 and D2 MSNs have similar speed coding indices. Since there is a precedent for using the correlation coefficient to quantify speed coding (Fobbs & Kravitz 2020; Kropff & Moser 2015), we prefer to stick with this measure despite some caveats. Furthermore, the apparent difference between D1 and D2 MSNs in Figure 4J is not seen in either sham or dopamine lesioned mice (Figure 6 – figure supplement 1, panels D & E). Taken together, we do not believe the apparent speed coding difference in Figure 4J rises to the level of a consistent result.

      (7) The timing of normalized firing rate in relation to start/stop signals might be also quite interesting to comment on. D1 neurons have stronger activation for start signals and it seems that it is also earlier, with D2 activated after the onset of the movement (Fig. 4G).

      We appreciate the observation that D1 neurons appear to fire a little earlier than D2 neurons in Figure 4I. However, this did not rise to the level of a statistically significant result by our attempted quantitative analysis (not shown). Furthermore, the earlier timing of D1 is not apparent in sham lesioned animals in Figure 6I, thus overall we cannot make any confident statements about earlier timing of D1 start signals.

      In dopamine lesion experiments, in sham mice, it seems that both D1 and D2 have higher activity after the onset of the movement and that the peak of D2 activity is earlier (Fig. 6G). In 6OHDA mice, both peaks are after the onset of the movement although they are much less clearly defined.

      Both peaks become less sharp after 6OHDA lesions, but in terms of amplitude the main effect is a reduction in the D1 start signal. This is reflected in the reduced D1 start modulation index whereas the D2 index remains relatively constant.

      (8) 6OHDA model displays much fewer walking bouts with lower speed and initiation rate. It would be important to include in the figure a similar representation to Fig.1 with distributions of stride frequency, duration, and length to illustrate the difference between control and 6OHDA mice. On average, how many walking bouts were analyzed in control and 6OHDA animals?

      We have added new data similar to Figure 1 with distributions of stride frequency, duration, and length to illustrate the difference between sham and 6OHDA mice (Figure 5 – figure supplement 1, panels B & C). We also added the following information on the number of walking bouts: “The mean number of walking bouts per session was reduced from 124 ± 42 in sham to 47 ± 19 in dopamine lesioned mice (mean ± SD).”

      The initiation rate is particularly low in 6OHDA animals, 3-4 per minute, did the authors make longer behavioral recordings to extract enough initiation/stop signals for neural correlation analysis?

      All of our recordings were of the same duration (30 minutes). This duration was pre-determined at the beginning of the study to ensure consistency.

      The stride length seems smaller on the right limbs in 6OHDA mice and vector length in D2 neurons as well, while there is no change in D1 neurons. Is it a significant effect? If yes, it would be important to comment on this.

      The ANOVA test in those figures was not designed to perform post-hoc multiple comparisons between different limbs. However, if one changes the ANOVA design then the effect for stride length is significant. This is probably related to the ipsiversive turning bias in the unilateral 6OHDA lesion model. Though we have not changed the ANOVA design, in the Discussion we do comment on the shorter stride length on the right limbs in 6OHDA mice in Figure 5G. There is no significant difference in D2 vector length between different limbs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The authors report a novel hepatic lncRNA FincoR regulating FXR with therapeutic implications in the treatment of MASH. The findings are important and use an appropriate methodology in line with the current state-of-the-art, with convincing support for the claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the article titled "Hammerhead-type FXR agonists induce an eRNA FincoR that ameliorates nonalcoholic steatohepatitis in mice," the authors explore the role of the Farnesoid X Receptor (FXR) in treating metabolic disorders like NASH. They identify a new liver-specific long non-coding RNA (lncRNA), FincoR, regulated by FXR, notably induced by agonists such as tropifexor. The study shows that FincoR plays a significant role in enhancing the efficacy of tropifexor in mitigating liver fibrosis and inflammation associated with NASH, suggesting its potential as a novel therapeutic target. The study makes a promising contribution to understanding the role of FincoR in alleviating liver fibrosis in NASH, providing initial insights into the mechanisms involved. While it offers a valuable starting point, there is potential for further exploration into the functional roles of FincoR and their specific actions in human NASH cases. Building upon the current findings to elucidate more detailed mechanistic pathways through which FincoR exerts its therapeutic effects in liver disease would elevate the research's significance and potential impact in the field.

      Strengths:

      This study stands out for its comprehensive and unbiased approach to investigating the role of FincoR, a liver-specific lncRNA, in the treatment of NASH. Key strengths include: 1) The application of advanced sequencing methods like GRO-seq and RNA-seq offered a comprehensive and unbiased view of the transcriptional changes induced by tropifexor, particularly highlighting the role of FincoR. 2) Utilizing a genetic mouse model of FXR KO and a FincoR liver-specific knockdown (FincoR-LKD) mouse model provided a controlled and relevant environment for studying NASH, allowing for precise assessment of tropifexor's therapeutic effects. 3) The inclusion of tropifexor, an FDAapproved FXR agonist, adds significant clinical relevance to the study. It bridges the gap between experimental research and potential therapeutic application, providing a direct pathway for translating these findings into real-world clinical benefits for NASH patients. 4) The study's rigorous experimental design, incorporating both negative and positive controls, ensured that the results were specifically attributable to the action of FincoR and tropifexor.

      Weaknesses:

      The study presents several notable weaknesses that could be addressed to strengthen its findings and conclusions: 1) The authors focus on FincoR, but do not extensively test other lncRNAs identified in Figure 1A. A more comprehensive approach, such as rescue experiments with these lncRNAs, would provide a better understanding of whether similar roles are played by other lncRNAs in mitigating NASH. 2) FincoR was chosen for further study primarily because it is the most upregulated lncRNA induced by GW4064. Including another GW4064-induced lncRNA as a control in functional studies would strengthen the argument for FincoR's unique role in NASH. 3) The study does not conclusively demonstrate whether FincoR is specifically expressed in hepatocytes or other liver cell types. Conducting FincoR RNA-FISH with immunofluorescent experiments or RT-PCR, using markers for different liver cell types, would clarify its expression profile. 4) Understanding the absolute copy number of FincoR is crucial. Determining whether there are sufficient copies of FincoR to function as proposed would lend more credibility to its suggested role. 5) The manuscript, although technically proficient, does not thoroughly address the relevance of these findings to human NASH. Questions like the conservation of FincoR in humans and its potential role in human NASH should be discussed.

      Reviewer #2 (Public Review):

      Summary:

      Nonalcoholic fatty liver disease (NASH), recently renamed as metabolic dysfunctionassociated steatohepatitis (MASH) is a leading cause of liver-related death. Farnesoid X receptor (FXR) is a promising drug target for treating NASH and several drugs targeting FXR are under clinical investigation for their efficacy in treating NASH. The authors intended to address whether FXR mediates its hepatic protective effects through the regulation of lncRNAs, which would provide novel insights into the pharmacological targeting of FXR for NASH treatment. The authors went from an unbiased transcriptomics profiling to identify a novel enhancer-derived lncRNA FincoR enriched in the liver and showed that the knockdown of FincoR in a murine NASH model attenuated part of the effect of tropifexor, an FXR agonist, namely inflammation and fibrosis, but not steatosis. This study provides a framework for how one can investigate the role of noncoding genes in pharmacological intervention targeting known protein-coding genes. Given that many disease-associated genetic variants are located in the non-coding regions, this study, together with others, may provide useful information for improved and individualized treatment for metabolic disorders.

      Strengths:

      The study leverages both transcriptional profile and epigenetic signatures to identify the top candidate eRNA for further study. The subsequent biochemical characterization of FincoR using FXR-KO mice combined with Gro-seq and Luciferase reporter assays convincingly demonstrates this eRNA as a FXR transcriptional target sensitive to FXR agonists. The use of in vitro culture cells and the in vivo mouse model of NASH provide multi-level evaluation of the context-dependent importance of the FincoR downstream of FXR in the regulation of functions related to liver dysfunction.

      Weaknesses:

      As discussed, future work to dissect the mechanisms by which FincoR facilitates the action of FXR and its agonists is warranted. It would be helpful if the authors could base this on the current understanding of eRNA modes of action and the observed biochemical features of FincoR to speculate potential molecular mechanisms explaining the observed functional phenotype. It is unclear if this eRNA is conserved in humans in any way, which will provide relevance to human disease. Additionally, the eRNA knockdown was achieved by deletion of an upstream region of the eRNA transcription. A more direct approach to alter eRNA levels, e.g., overexpression of FincoR in the liver would provide important data to interpret its functional regulation.

      We thank the Editor and Reviewers for their constructive comments. We believe we have addressed all of the issues (detailed below) and the revisions have greatly strengthened the manuscript.

      Reviewer 1:

      The study presents several notable weaknesses that could be addressed to strengthen its findings and conclusions:

      (1) The authors focus on FincoR, but do not extensively test other lncRNAs identified in Figure 1A. A more comprehensive approach, such as rescue experiments with these lncRNAs, would provide a better understanding of whether similar roles are played by other lncRNAs in mitigating NASH.

      (2) FincoR was chosen for further study primarily because it is the most upregulated lncRNA induced by GW4064. Including another GW4064-induced lncRNA as a control in functional studies would strengthen the argument for FincoR's unique role in NASH.

      (3) The study does not conclusively demonstrate whether FincoR is specifically expressed in hepatocytes or other liver cell types. Conducting FincoR RNA-FISH with immunofluorescent experiments or RT-PCR, using markers for different liver cell types, would clarify its expression profile.

      (4) Understanding the absolute copy number of FincoR is crucial. Determining whether there are sufficient copies of FincoR to function as proposed would lend more credibility to its suggested role.

      Response to 1 - 4): We thank Reviewer 1 for the positive comments on the strength of our work, including the open-ended approach, the novel eRNA FincoR and its strong relevance to liver disease. We also value the constructive feedback provided by the reviewer and agree that additional studies are important to fully understand the mechanisms of FincoR and the functional significance of other FXR-induced lncRNAs. In this manuscript we report the discovery and initial characterization of FincoR, as well as its potential function in FXR action in response to hammerhead agonists, but a number of interesting questions are raised. Future experiments, as suggested by reviewer, will be needed to examine the role of other FXR-induced lncRNAs, the potential role of FincoR induction by other nuclear receptors with binding sites at FincoR, whether FincoR is expressed in liver cell types in addition to hepatocytes, and the expression abundance of FincoR. These are all excellent suggestions for future experimentation which we feel are beyond the scope of the present report. For example, generating a genetic CRISPR/Cas9 of another lncRNA is not trial as it takes a significant amount of work with murine models. Also, we did not mean to exclude if other lncRNAs induced by FXR also bear functions. Technically, rescue experiment is not possible as FincoR RNA can be potentially very long (~10 kb if estimated by RNA-seq pattern in Fig.1C), and it is not feasible now to properly express it by exogenous vectors to ensure the expression levels are similar to endogenous ones. We therefore consider that these important questions are more suitable for future work to fully address. Our belief is that a comprehensive exploration of FXR-regulated lncRNAs holds the potential to unveil novel insights crucial for the development of therapies targeting NASH and other metabolic diseases. The study of FincoR is the beginning of this area of research.

      (5) The manuscript, although technically proficient, does not thoroughly address the relevance of these findings to human NASH. Questions like the conservation of FincoR in humans and its potential role in human NASH should be discussed.

      Response: These are important questions. To respond to the reviewer’s comment, new experiments are presented in our final revised manuscript in which we utilized mouse models of NAFLD/NASH and cholestatic liver injury to determine FincoR’s role in these diseases. Hepatic FincoR levels were significantly increased in mice fed with high fat diet (HFD) for 12 weeks (Supplementary Figure S1A) and in mice fed a HFD with high fructose (HFHF) in drinking water for 12 weeks (Supplementary Figure S1B). Elevated hepatic FincoR levels were also observed in mice treated with α-naphthylisothiocyanate (ANIT), a chemical inducer of liver cholestasis (Supplementary Figure S1C), and in mice with bile duct ligation (BDL), a surgical method to induce cholestatic liver injury (Supplementary Figure S1D).

      In terms of the human relevance, we have provided additional information and figures showing that there is sequence similarity between mouse FincoR and a human loci. FincoR sequence is moderately conserved between mice and humans as displayed in the UCSC genome browser (Supplementary Figure S1E). Annotation of these conserved human sequences revealed that they overlap with a functionally uncharacterized human lncRNA XR_007061585.1 (Supplementary Figure S1F). Further, we conducted qRT-PCR experiment from human patient’s RNA samples, which demonstrated that hepatic lncRNA XR_007061585.1 levels are elevated in patients with NAFLD and PBC, but not in severe NASH-fibrosis patients (Supplementary Figure S1G, H). These results demonstrate that hepatic levels of a potential human analog of FincoR are elevated in NAFLD and PBC patients, which is consistent with FincoR’s upregulation in mouse models of chronic liver disease with hepatic inflammation and liver injury. Whether human lncRNA XR_007061585.1 is entirely analogous to mouse FincoR in terms of functions and mechanisms, and whether the elevation of this human lncRNA has a role in liver disease progression or is an adaptive response to liver injury remains to be determined.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction Line 96, "..., while the vast majority are transcribed into ncRNAs" may not be accurate. Please refer to Pointing and Haerty Annu Rev 2022 for a related discussion.

      Response: We would like to thank the reviewer for pointing out this inaccurate information in the introduction. We have changed the content in the text, “While a significant portion of the genome was initially thought to be "junk DNA", it has been established that many non-coding regions give rise to functional non-coding RNAs.”

      (2) Figure 5: the authors should provide a clear illustration demonstrating the sequence targeted by the sgRNA in relation to the transcriptional and epigenetic profile (i.e., RNAseq and H3K27ac ChIP-seq data).

      Response: The illustration (Figure 5-figure supplement 1A, right panel) demonstrating the sequence targeted by the sgRNA has been updated as suggested by the reviewer.

      In this model, the upstream of FincoR is deleted, leading to the inhibition of FincoR transcription. Does the deleted region include FXR binding sites? If so, would the phenotype be due to the deletion of these binding sequences, rather than the decreased FincoR transcripts? Accordingly, the limitation or alternative interpretation should be discussed.

      Response: The reviewer made a good point. The deleted region includes FXR binding sites so that we cannot rule out decreased binding of FXR or decreased transcription of the region per se, in addition to the decreased levels of FincoR, to bear a role in the phenotypic changes we observed. In the final revision, we have added discussion of this alternative (6th paragraph in the revised discussion section).

      (3) Figure 6C, the images should be accompanied by quantification. It appears the FincoR-KD shows a visible difference as compared to Tropifexor-treated control mice, which does not match entirely what is written in the results.

      Response: The quantitation of Oil Red O staining has been done as suggested by the reviewer (Figure 6C). The result is consistent with the triglyceride result showing that tropifexor treatment markedly reduced neutral lipids determined by Oil Red O staining of liver sections (Figure 6C) and liver TG levels (Figure 6D) and these beneficial effects on reducing fatty liver were not altered by FincoR.

      (4) Figure 7, does AST show the same pattern as ALT? As indicated from Line 335, "tropifexor treatment reduced mRNA levels of several genes that promote fibrosis (Col1a1, Col1a2, ...)". Fig. 7D does not seem to match the description of Col1a1. Authors may need to modify the results.

      Response: AST has been measured and has the same pattern as ALT. The new data have been added to Figure 7B. Col1a1 expression has been re-measured and the results have been updated in Figure 7D.

      (5) Is FincoR level reduced in NASH conditions?

      Response: We thank the Reviewer for this question. We now added new data to examine the levels of FincoR in mouse liver disease models and also examined levels of a potential human analog of FincoR in human liver specimens from PBC, NAFLD, and NASH patients. Please see our new data and description above in the response to comment 5 by Reviewer 1 (most data now included in the new Supplementary Figure S1).

      (6) Please provide information on the conservation of FincoR (DNA and RNA) in humans. This would be important to provide the human disease relevance.

      Response: As described above in the response to comment 5 of reviewer 1, a human loci shows sequence similarity to mouse FincoR and this conserved region has an annotated uncharacterized human lncRNA. We also examined the levels of this human homolog in human diseased liver samples. Our new results demonstrate that hepatic levels of a potential human analog of FincoR are elevated in NAFLD and PBC patients, which is consistent with FincoR’s upregulation in mouse models of chronic liver disease with hepatic inflammation and liver injury. Whether human lncRNA XR_007061585.1 is entirely analogous to mouse FincoR in terms of functions and mechanisms, and whether the elevation of this human lncRNA has a role in liver disease progression or is an adaptive response to liver injury remains to be determined.

      (7) Several discussion points for the authors' consideration:

      (7.1) human-mouse conservation as alluded to in #6;

      Response: Potential human-mouse conservation is discussed with new data in the last paragraph of the Results section.

      (7.2) potential molecular mechanism involved in FincoR-regulated hepatocyte function;

      Response: We thank Reviewer for this comment. We have added more discussion as shown below: “RNA inside the cells usually associates with different RNA-binding proteins (RBPs). To predict those potential binding proteins of FincoR. Additional bioinformatic analysis identified proteins that potentially binding FincoR, including KHDRBS1, RBM38, YBX2 and YBX3 (Supplemental Table S5). These findings and potential functions of the binding proteins are discussed in the 5th paragraph of the discussion section in the final revised manuscript. Whether these predicted RBPs interact with FincoR and the underlying mechanisms will need to be investigated in future experimentation to understand the mechanisms involved in FincoR-regulated hepatocyte function.”

      (7.3) any disease-associated SNPs in the FincoR locus.

      Response: No SNPs were noted in the annotation of the human loci with sequence similarity to mouse FincoR in the NCBI genome data viewer.

      (7.4) the in vitro induction of FincoR is transient but in vivo this occurs after 12 days of drug treatment. How do the authors reconcile the differential induction patterns?

      Response: To clarify, the induction of FincoR after a single dose of GW4064 in vivo was transient, peaked within 1 h and then declined gradually (Figure 1-figure Supplement 1C). In the tropifexor treatment protocol (also in vivo), the mice were treated daily with tropifexor for 12 days so that the multiple doses maintained FincoR induction. The beneficial effect of tropifexor by inducing FincoR, therefore, accumulated over the 12 days.

      It is worthy to note that we failed to see induction of FincoR in isolated primary mouse hepatocytes treated with GW4064 in vitro. We can only detect FincoR in primary hepatocytes isolated from GW4064-treated mice liver. This may be due to the loss of key factors mediating FincoR induction in the cultured primary hepatocytes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript describes valuable information on how the extraocular muscles (EOM) are preserved in a mouse model of familial Amyotrophic lateral sclerosis (ALS) that carries a G93A mutation in the Sod1 gene. The authors provide convincing evidence of how the integrity of neuromuscular junction is preserved in EOM but not in limb and diaphragm muscles of G93A mice. Overall, this interesting work provides new evidence regarding the etiopathogenesis of ALS and insights for the development of therapeutic targets to slow the loss of neuromuscular function in ALS.

      Public Reviews:

      Reviewer#1 (Public Review):

      Summary:

      The study explores the mechanisms that preserve satellite cell function in extraocular muscles (EOMs) in a mouse model of familial Amyotrophic lateral sclerosis (ALS) that carries the G93A mutation in the Sod1 gene. ALS is a fatal neuromuscular disorder driven by motor neuron degeneration, leading to progressive wasting of most skeletal muscles but not EOM. The study first established that integrity of neuromuscular junction (NMJ) is preserved in EOM but not in limb and diaphragm muscles of G93A mice, and sodium butyrate (NaBu) treatment partially improves NMJ integrity in limb and diaphragm muscles of G93A mice. They also found a loss of synaptic satellite cells and renewability of cultured myoblasts in hindlimb and diaphragm muscles of G93A mice, but not in EOM, and NaBu treatment restores myoblast renewability. Using RNA-seq analysis, they identify that exon guidance molecules, particularly Cxcl12, are highly expressed in EOM myoblasts, along with more sustainable renewability. Using a neuromuscular co-culture model, they convincingly show that AAV-mediated Cxcl12 expression in G93A myotubes enhances motor axon extension and innervation. Strikingly, NaBu-mediated preservation of NMJ in limb muscles of G93A mice is associated with elevated expression of Cxcl12 in satellite cells and improved renewability of myoblasts. These results together offer molecular insights into genes critical for maintaining satellite cell function and revealing a mechanism through which NaBu ameliorates ALS.

      Strengths:

      Combination of in vivo and cell culture models. Nice imaging of NMJ and associated satellite cells. Using motoneuron-myotube coculture to establish the mechanism. Tested and illustrated a mechanism through which a clinically used drug ameliorates ALS.

      Weaknesses:

      Data presentation could be improved (see details in the Recommendation for Authors).

      It would have been nice to have included G93A motoneurons in the coculture study.

      This is indeed a plan of our future study. In the revised version, we discussed the limitation of not including G93A motor neurons in the coculture assay. (Page 11, Line 445-448)

      “However, it is possible that motor neurons carrying ALS mutations will respond differently to Cxcl12 mediated axon guidance than WT motor neurons. This is a limitation of the current study which will be investigated in future co-culture studies.”

      Reviewer #2 (Public Review):

      Summary:

      The work is potentially interesting as it outlines the role of satellite cells in supporting the functional decline of skeletal muscle due to the denervation process. In this context the authors analyze the functional and molecular characteristics of satellite cells in different muscle types differently affected by the degenerative process in the ALS model.

      Strengths:

      The work illustrates a relevant aspect of the differences in stem cell potential in different skeletal muscles in a mouse model of the disease through a considerable amount of data and experimental models.

      Weaknesses:

      However, there are some criticisms of the structuring of the results:

      It is not clear how many animals were used in each experimental group (Figs 1 and 2, Fig. 2-9). In particular, it is unclear whether the dots in the histograms represent biological or technical replicates. Furthermore, the gender used in experimental groups is never specified. This last point appears to be important considering the gender differences observed in the SOD1G93A mouse model.

      The original quantification data and mouse gender specification were actually listed in the corresponding supplementary tables. We now added the gender specification and number of the mice used in all corresponding figure legends. The number of mice used for sorting SCs from different muscles were also specified in the Methods section in the revised manuscript. (Page 12, Line 489-493).

      We also added one more supplementary figure (Figure 1-figure supplement 2) to compare the innervation status between male and female mice. The following description has been added in the updated manuscript (Page 3-4; Line 125-130):

      “The data shown in Figure 1B has also been replotted to compare the innervation status between male and female mice (Figure 1- figure supplement 2). In terms of well- or partially- innervated ratios, there are no significant gender difference observed in our experimental condition, in which the muscle samples were collected at the end stage of the disease, although there is marginally lower “poorly innervated ratio” in the EDL muscle of G93A female mice compared to G93A male mice.”

      However, we acknowledge that the current study has limitations to fully detect cross-gender differences in our experiments due to low “n” numbers per gender. We hope this is understandable as we have to split limited resource of ALS G93A mice between different kinds of experiments, including NMJ integrity assessment, peri-nuclear SC abundance assessment, whole muscle-qPCR, cell sorting for imaging, cell sorting for RNA-Seq, cell-sorting for qPCR, cell-sorting for neuromuscular co-culture, etc., in this pioneer study. However, we do intend to gradually build up “n” numbers for characterization of cross-gender difference in our ongoing studies.

      As to what the dots in each plot represent, we have inserted the description in each relevant figure legend as detailed below:

      For Fig 1, each dot represents quantification result from a single mouse. Please see Figure 1-figure supplement 1, Figure 1-figure supplement 2 and Figure 1-table supplement 1 for NMJs measured per muscle type per gender. Briefly, EDL, soleus and diaphragm muscles were from 4 male and 6 female mice per group; WT EOM group was from 4 male and 4 female mice; G93A EOM group was from 3 male and 4 female mice; G93A EOM with NaBu feeding group was from 6 female mice.

      For Fig 2, each dot represents quantification result from a single mouse. Please see Figure 2-table supplement 1 for NMJs measured per muscle type per gender. Briefly, WT EDL group was from 2 male and 2 female mice; G93A EDL group was from 3 male and 3 female mice; G93A EDL with NaBu feeding group was from 2 male and 4 female mice; WT soleus group was from 2 male and 3 female mice; G93A soleus group was from 3 male and 2 female mice; G93A soleus with NaBu feeding group was from 1 male and 4 female mice; WT diaphragm group was from 1 male and 4 female mice; G93A diaphragm group was from 1 male and 4 female mice; G93A diaphragm with NaBu feeding group was from 4 female mice; WT EOM group was from 1 male and 3 female mice; G93A EOM group was from 5 female mice; G93A EOM with NaBu feeding group was from 1 male and 3 female mice.

      For Fig 3, each dot in the box-and-dot plots represents result from one round of sorting. WT HL SCs were from 8 male and 6 female mice; G93A HL SCs were from 9 male and 5 female mice; WT diaphragm SCs were from 6 male and 3 female mice; G93A diaphragm SCs were from 12 male and 5 female mice. WT EOM SCs were from 6 batches of male and 1 batch of female mice (each batch contains 5-6 mice of the same gender). G93A EOM SCs were from 5 batches of male and 2 batches of female mice.

      *Please note these results were from sorting in which the FACS profiles were recorded. Not all rounds of sorting were with FACS profile recorded.

      For Fig 4A, each dot in the box-and-dot plots represents one image analyzed. For WT HL SCs, 94 images from 3 rounds of sorting; For WT Dia SCs, 107 images from 3 rounds of sorting; For WT EOM SCs, 75 images from 3 rounds of sorting; For G93A HL SCs, 96 images from 3 rounds of sorting; For G93A Dia SCs, 62 images from 3 rounds of sorting; For G93A EOM SCs, 79 images from 3 rounds of sorting. For the 3 rounds of sorting, 1 was from male and 2 were from female mice.

      *Please note that the number of mice used for sorting SCs in different muscles were specified in the Method Section in the revised manuscript. (Page 12, Line 489-493)

      For Fig 4B, each dot in the box-and-dot plots represents one image analyzed. For WT HL SCs, 52 images from 3 rounds of sorting; For WT Dia SCs, 51 images from 3 rounds of sorting; For WT EOM SCs, 51 images from 3 rounds of sorting; For G93A HL SCs, 52 images from 3 rounds of sorting; For G93A Dia SCs, 47 images from 3 rounds of sorting; For G93A EOM SCs, 56 images from 3 rounds of sorting. For the 3 rounds of sorting, 1 was from male and 2 were from female mice.

      For Fig 5A, each dot in the box-and-dot plots represents one replicate of culture. HL SCs were from male mice.

      For Fig 5B, each dot in the box-and-dot plots represents one image analyzed. For G93A HL SCs, 52 images from 3 rounds of sorting; 1-day NaBu treatment, 45 images from 3 rounds of sorting; 3-day NaBu treatment, 51 images from 3 rounds of sorting; For G93A Dia SCs, 47 images from 3 rounds of sorting; 1-day NaBu treatment, 60 images from 3 rounds of sorting; 3-day NaBu treatment, 57 images from 3 rounds of sorting. For the 3 rounds of sorting, 2 were from male and 1 was from female mice.

      For Fig 6, all samples used for bulk RNA-Seq were from female mice.

      For Fig 7C, each dot in the box-and-dot plots represents one replicate of culture. RNA samples were collected from 3-6 rounds of sorting and sorted cells were seeded into 3 dishes as replicates. WT HL SCs were from 3 male and 1 female mice. WT diaphragm SCs were from 2 male and 2 female mice; WT EOM SCs were from 3 male mice; G93A HL SCs were from 4 male and 2 female mice. G93A diaphragm SCs were from 1 male and 3 female mice; G93A EOM SCs were from 3 male mice.

      For Fig 7D, each dot in the box-and-dot plots represents one replicate of culture. RNA samples were collected from 6 rounds of sorting and sorted cells were seeded into 3 dishes as replicates. G93A HL SCs were from 4 male and 2 female mice; G93A diaphragm SCs were from 2 male and 4 female mice.

      For Fig 8D, each dot in the box-and-dot plot represents one neurite measured. HL and EOM SCs used for co-culture experiments were all from male mice.

      For Fig 9D, each dot in the box-and-dot plot represents one image analyzed. HL and EOM SCs used for co-culture experiments were all from male mice.

      For Figure 1-figure supplement 1, each dot in the box-and-dot plots represents quantification result from one mouse. Please also see Figure 1-table supplement 2. Briefly, muscles in WT and G93A groups were from 3 male and 3 female mice per group; G93A EDL with NaBu feeding group was from 3 male and 3 female mice. G93A soleus with NaBu feeding group was from 2 male and 3 female mice; G93A diaphragm with NaBu feeding group was from 2 male and 4 female mice; G93A EOM with NaBu feeding group was from 4 male and 2 female mice.

      The first paragraph of the results lacks a functional analysis of the motor decline of the animals after the administration of sodium butyrate. The authors, in fact, administered NaBu around 90 days of age while in previous work the drug had been administered at a pre-symptomatic age. It would therefore be useful, to make the message more effective, to characterize the locomotor functions of the treated animals in parallel with the histological evidence of the integrity of the NMJ.

      We are still in the process of collecting locomotor function data for G93A mice with and without NaBu treatment. We plan to report them in a future manuscript while this manuscript focuses on the molecular and histological aspect. Additionally, in the revised manuscript, we revised the rationale of the NaBu treatment starting after the disease onset. (Page 4, Line 131-134)

      “In the previous study, NaBu treatment initiated at a pre-symptomatic age delayed disease progression in G93A mice. As treatment of ALS patients is initiated after symptoms appear, we further tested whether NaBu treatment started after disease onset (at the age of 3 months, 2% NaBu in water for 1 month) was effective in preserving NMJ integrity.”

      Figure 5 should be completed with the administration of NaBu also to the satellite cells isolated from the WT mouse, the same for figure 9 where AAV-CMV-Cxcl12 transduction of WT myotubes is missing. We appreciate the reviewer’s suggestion of conducting the additional experiment with AAV-delivery of CXCL12 into the myotubes derived from the WT mice. Extensive studies by other investigators have been performed with butyrate on satellite cells derived from WT mice. To name a few here: Fiszman et al., 1980 (DOI: 10.1016/0014-4827(80)90467-X); Johnston et al., 1992 (DOI: 10.1128/mcb.12.11.5123-5130.1992); Lezzi et al., 2002 (DOI: 10.1073/pnas.112218599). To avoid performing redundant experiments, we focus on the effect of butyrate on the proliferation and differentiation of SCs derived from G93A mice. Thanks to the reviewer’s comment, we added additional discussion in the Results section (Page 6, line 216-217). Regarding the effect of Cxcl12, published studies have demonstrated its role in promoting axon growth. To name a few here: Negro et al., 2017 (DOI: 10.15252/emmm.201607257); Lieberam et al., 2005 (DOI: 10.1016/j.neuron.2005.08.011); Whitman et al., 2018 (DOI: 10.1167/iovs.18-25190). (Page 10, line 434, 440-442).

      In the experiment illustrated in Figure 8, treatment of cell cultures with NaBu would improve the outcome as well as the interference of Cxcl12 expression in myotubes derived from G93A EOM SC (Fig.9) would strengthen the specificity of this protein in axon guidance in this NMJ typical of a spared muscle in ALS.

      This is a great suggestion. Our study demonstrated the overexpression of CXCL12 in G93A myotube can enhance the axonal guidance and innervation of the co-cultured myotube/moto-neurons. We have also demonstrated the NaBu treatment can enhance the expression of CXCL12 and slow ALS progression. Combining NaBu treatment with CXCL12 overexpression may indeed have additive therapeutic benefits to slow ALS progression. We have added this statement in the revised Discussion. (Page 11, Line 466-468)

      In the "materials and methods" section the paragraph relating to the methods used for statistical analysis is missing.

      We have added it accordingly. (Page 15, Line 631-636)

      Reviewer #3 (Public Review):

      Summary:

      In their paper, Li et al. investigate the transcriptome of satellite cells obtained from different muscle types including hindlimb, diaphragm, and extraocular muscles (EOM) from wild-type and G93A transgenic mice (end-stage ALS) in order to identify potential factors involved in the maintenance of the neuromuscular junction. The underlying hypothesis is that since EOMs are largely spared from this debilitating disease, they may secrete NMJ-protective factors. The results of their transcriptome analysis identified several axon guidance molecules including the chemokine Cxcl12, which are particularly enriched in EOM-derived satellite cells. Transduction of hindlimb-derived satellite cells with AAV encoding Cxcl12 reverted hindlimb-derived myotubes from the G93A mice into myotubes sharing phenotypic characteristics similar to those of EOM-derived satellite cells. Additionally, the authors were able to demonstrate that EOM-derived satellite cell myotube cultures are capable of enhancing axon extensions and innervation in co-culture experiments.

      Strengths:

      The strength of the paper is that the authors successfully isolated and purified different populations of satellite cells, compared their transcriptomes, identified specific factors released by EOM-derived satellite cells, overexpressed one of these factors (the chemokine Cxcl12) by AAV-mediated transduction of hindlimb-derived satellite cells. The transduced cells were then able to support axon guidance and NMJ integrity. They also show that administration of Na butyrate to mice decreased NMJ denervation and satellite cell depletion of hind limbs. Furthermore, the addition of Na Butyrate to hindlimb-derived satellite cell myotube cultures increased Cxcl12 expression. These are impressive results providing important insights for the development of therapeutic targets to slow the loss of neuromuscular function characterizing ALS.

      Weaknesses:

      Several important aspects have not been addressed by the authors, these include the following points which weaken the conclusions and interpretation of the results.

      (a) Na Butyrate was shown to extend the survival of G93A mice by Zhang et al. Na butyrate has a variety of biological effects, for example, anti-inflammatory effects inhibit mitochondrial oxidative stress, positively influence mitochondrial function, is a class I / II HDAC inhibitor, etc. What is the mechanism underlying its beneficial effects both in the context of mouse muscle function in the ALS G93A mice and in the in vitro myotube assay? Cytokine quantification as well as histone acetylation/methylation can be assessed experimentally and this is an important point that has not been appropriately investigated.

      Great suggestion by the reviewer.

      Our previous publications (DOI: 10.3390/biom12020333; DOI: 10.3390/ijms22147412) have shown the beneficial roles of NaBu in ameliorating mitochondrial function in both motor neuron-like cells and adult muscle fibers. A focus of the current study is to test whether NaBu treatment also affect the SCs by regulating their gene transcription. Regarding the potential on HDAC/acetylation modification, there are previous studies by other investigators. We have added these references in the Discussion (Page 11, line 466-468).

      (b) In the context of satellite cell characterization, on lines 151-152 the authors state that soleus muscles were excluded from further studies since they have a higher content of slow twitch fibers and are more similar to the diaphragm. This justification is not valid in the context of ALS as well as many other muscle disorders. Indeed, soleus and diaphragm muscles contain a high proportion of slow twitch fibers (up to 80% and 50% respectively) but soleus muscles are more spared than diaphragm muscles. What makes soleus muscles (and EOMs) more resistant to ALS NMJ injury? Satellite cells from soleus muscles need to be characterized in detail as well.

      We agree with the reviewer’s comment that our original statement is misleading regarding the difference between soleus and diaphragm muscles in terms of the content of slow twitch fibers. Our histological studies revealed similar defects in denervation of diaphragm and soleus muscles derived from the G93A mice. Most importantly, the degree of NMJ degeneration and atrophy is less severe in soleus compared to other hindlimb muscles, such as EDL, during ALS progression. We have cited related studies such as Valdez et al., 2012 (DOI: 10.1371/journal.pone.0034640), Atkin et al., 2005 (DOI: 10.1016/j.nmd.2005.02.005). To avoid any confusion, we have removed the original statement and revised the paragraph (Page 4, line 159-162).

      “The three groups were determined because they represent the most severely affected, moderately affected and least affected muscles by ALS progression, respectively. Soleus was not included in the hindlimb SCs pool because its less affected than other hindlimb muscles based on our study and others [6,42].”

      Furthermore, EOMs are complex muscles, containing many types of fibers and expressing different myosin heavy chain isoforms and muscle proteins. The fact that in mice both the globular layer and orbital layers of EOMs express slow myosin heavy chain isoform as well as myosin heavy chain 2X, 2A, and 2B (Zhou et al., 2010 IOVIS 51:6355-6363) also indicates that the sparing is not directly linked to the fast or slow twitch nature of the muscle fiber. This needs to be considered.

      We greatly appreciate your suggestions and have included these points in the revised Discussion. “It is known that EOMs are complex muscles. Besides the developmental myosin isoforms, EOMs also express both adult fast and slow myosin contractile elements (Zhou et al., 2010 IOVIS 51:6355-6363), suggesting that the sparing may not be solely linked to the fast or slow twitch nature of the muscle fiber, rather the changes in SCs may play a pivotal role in preserving the EOM function during the progression of ALS. ” (Page 9, line 389-392)

      (c) In the context of myotube formation from cultured satellite cells on lines 178-179 the authors stained the myotubes for myosin heavy chain. Because of the diversity of myosin heavy chain isoforms and different muscle origins of the satellite cells investigated, the isoform of myosin heavy chain expressed by the myotubes needs to be tested and described. It is not sufficient to state anti-MYH.

      We used the pan-anti-MYH antibody (MF20 from DSHB) for the immunostaining of myosin heavy chain for identification of the differentiated myotubes. As described in the commercial website: https://dshb.biology.uiowa.edu/MF-20), FM20 recognizes all myosin heavy chain isoforms. We are happy to examine whether specific myosin heavy chain isoforms may contribute to the differences observed in future studies.

      (d) The original RNAseq results have not been deposited and while it is true that the authors have analyzed the results and described them in Figures 6 and 7 and relative supplements, the original data needs to be shown both as an xls list as a Volcano plots (q value versus log2 fold change). This will facilitate the independent interpretation of the results by the readers as some transcripts may not be listed. As presented it is rather difficult to identify which transcripts aside from Cxcl12 are commonly upregulated. Can the data be presented in a more visual way?

      We have uploaded the Fastq files and the text files containing TPM values to the Gene Expression Omnibus (GEO) database and included the GEO access number GSE249484 in the revised text. Per recommendation of the reviewer, we have added supplementary tables for Figure 6, to list the top 20 differentially expressed genes (ranked by Log2FC, both the upregulated and downregulated) comparing 1) EOM SCs to their hindlimb and diaphragm counterparts (Figure 6-table supplement 1); 2) G93A SCs to WT SCs of the same muscle origin (Figure 6-table supplement 2); 3) G93A hindlimb and diaphragm SCs with 3 day-NaBu treatment to those without (Figure 6-table supplement 3). (Page 6, Line 237-257)

      (e) There is no section describing the statistical analysis methods used. In many figures, more than 2 groups are compared so the authors need to use an ANOVA followed by a post hoc test.

      Thank for the comments. We have added it accordingly. (Page 15, Line 631-636)

      The authors have achieved their aim in showing that satellite cells derived from EOMs have a distinct transcriptome and that this may be the basis of their sparing in ALS. Furthermore, this work may help develop future therapeutic interventions for patients with ALS.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The prevailing hypothesis of ALS is that motoneuron degeneration subsequently induces muscle atrophy and wasting. However, evidence also suggests that ALS is a muscle disease independent of motoneuron degeneration. The results from the current study support the latter. The RNA-seq data from cultured myoblasts (without innervation) suggest cell cell-autonomous effect of G93A on muscle cells. While the current analyses in this study identify axon guidance pathways in EOM satellite cells that may underlie their unique gene program that enhances motoneuron function, the powerfulness of the RNA-seq data is underutilized. I suggest that the authors explore the RNA-seq further by comparing genes and pathways altered by G93A in various muscles to better pinpoint how G93A influences satellite cell function.

      Thanks for the comments and advice. Further analysis of the RNA-seq data is planned. As our original sequencing provider became unavailable to us since last year, we are currently negotiating with other sequencing providers. We have deposited the raw data files into the GEO database (GSE249484) to foster further analyses by other researcher teams.

      To address the reviewer’s concern, we have added three more supplementary tables for Figure 6, which list the top 20 differentially expressed genes (DEG) (ranked by Log2FC, both the upregulated and downregulated) comparing 1) EOM SCs to their hindlimb and diaphragm counterparts (new Figure 6-table supplement 1); 2) G93A SCs to WT SCs of the same muscle origin (new Figure 6-table supplement 2); 3) G93A hindlimb and diaphragm SCs with and without 3 day-NaBu treatment (new Figure 6-table supplement 3). These three DEG lists are discussed in the results section of the revised manuscript as following (Page 6, Line 237-257).

      Figure 4 presentation could be improved by adopting a similar comparison (WT vs G93A) as used in Figure 1-3. The current comparison is not straightforward. In addition, a magnified image of panel A would demonstrate the loss of myoblast homeostasis more clearly. (AKA Figure 2B)

      The WT vs G93A comparison was presented in the supplementary figure of Figure 4 (Figure 4-figure supplement 1 in the previous version, and now in Figure 4-figure supplement 2 in the revised version).

      As requested, we have added magnified single channel representative images of cultured SCs in the new Figure 4-figure supplement 1 in the revised manuscript.

      Co-culture results in Figure 8 are very impressive. It would be nice if the data were quantified. The figure legend states that panel D is the quantification, but I don't see panel D. As the study used rat motoneurons (presumably SOD1 wildtype), it is unknown if G93A motoneurons would respond to muscle-derived CXCL12 similarly to the wildtype motoneurons. This information is crucial for understanding whether the SOD1 mutant ALS1 is a motoneuron disease or muscle disease or both. Some discussion should be provided to reflect the limitation (of not including G93A motoneurons in the coculture).

      Panel D (the quantification data) was presented in the original figure setting (but may not be obvious). We have now revised Figure 8 to enlarge panel D to clearly present the quantification data.

      We acknowledge the limitation of not including mutant G93A motor neurons in the coculture assay, and have added this important point (and our future plans to do so) in the discussion section of the revised manuscript: (Page 11, Line 445-448)

      “However, it is possible that motor neurons carrying ALS mutations may respond differently to Cxcl12 mediated axon guidance than WT motor neurons. This is a limitation of the current study, which will be investigated in future co-culture studies.”

      Reviewer #2 (Recommendations For The Authors):

      Line 108. The sentence: "Z-stack scans of glycerol-cleared 109 whole muscles were obtained using a high working distance lens in a confocal microscope. The z-stacks were compacted into 2D images by maximal intensity projection" and should be moved to the material and methods section.

      Removed from the Result section and added to the Method section as recommended (Page 13, Line 564-568).

      Linea 113. The sentence: " In order to quantify the extent of denervation in a categorical manner, NMJs were arbitrarily defined as "well innervated" if SYP staining was present in >60% of the BTX positive area, "partially innervated" if between 60% and 30%, and "poorly innervated" if SYP staining corresponded to less than 30% of the BTX positive area" has already been written in the figure legend.

      Thanks for the advice. We have rephrased the sentence to remove the redundant part.

      In lines 445-7, it would be better to indicate the enzymatic units instead of the concentrations.

      We included enzymatic units for the four enzymes in the Methods Section of revised manuscript (Page 12, Line 497-499).

      Reviewer #3 (Recommendations for The Authors):

      There are several points that need to be addressed by the authors including:

      (a) The authors need to provide experimental evidence as to the mode of action of Na Butyrate and more specifically whether its beneficial effect is mediated by its anti-inflammatory action, inhibition of HDACs, or the combination of several mechanisms. Additionally, it should be clearer why Na Butyrate was administered. The sentence referring to reference 36 is not sufficient and some mechanistic insight needs to be provided in the results section.

      Thanks for the great suggestion. We have revised the Results section accordingly to clarify the rationale for NaBu usage (please also see our detailed response to your suggestion above). (Page 4, line 131-134)

      (b) Their reason for excluding soleus-derived-satellite cells from the analysis is not valid. Soleus muscles are "more" speared than diaphragm muscles and analysis may help shed light on this observation.

      Please see our response to your question (b) in the above public review section.

      (c) DATA AVAILABILITY: The RNAseq raw untransformed data has not been provided and Volcano plots are also not shown. I find it quite difficult to follow the results of the RNAseq experiments and this is central to the interpretation of the paper's results. Ideally, one should be able to look at the data and draw his/her own conclusions but as it stands this is difficult to do.

      We have uploaded the raw FastQ files and the excel files containing TPM values to the GEO database with the access number GSE249484.

      (d) A detailed description of all statistical tests that were used needs to be provided.

      Yes, this has been added to the revised manuscript.

      (e) Many figure legends are incomplete and some panels are not described appropriately, indicating that the authors need to thoroughly revise all aspects of the manuscript.

      We have extensively edited the figure legends to address the issues raised by reviewers.

      (f) Line 96-98: it is unlikely that muscles from ALS patients will be biopsied frequently. Furthermore, what biomarkers exactly could be followed in patients in response to therapy? This is unclear.

      While it is true that it is not generally part of the diagnostic workup for ALS, muscle biopsy is increasingly being used pre- and post-treatment in ALS clinical trials to examine responses to potential new therapies. Muscle biopsy is also being explored in several ongoing studies as a potential ALS-relevant peripheral tissue amenable to biopsy (as opposed to brain or spinal cord) for predictive, pharmacodynamic, and prognostic biomarkers. This includes studies attempting to recapitulate pathophysiological patient clusters observed in CNS autopsy tissues and studies to detect aberrant TDP-43 aggregates in intramuscular nerve twigs, among others. Indeed, Dr. Ostrow’s clinical duties include performing muscle biopsies and interpreting muscle pathology, and he is involved in several ongoing studies attempting to correlate postmortem CNS and muscle analyses for these purposes.

      To avoid potential controversy on the feasibility of multiple biopsies, we rephrased the sentence as follows (Page 3, Line 96-98)

      “Characterizing the distinct EOM SC transcriptomic pattern could provide clues for identifying potential biomarkers in therapeutic trials in both ALS patients and animal models, in addition to identifying therapeutic targets.”

      (g) Line 388-389. What do the authors mean by this sentence? It is not clear.

      Thanks for the comment, we have expended the discussion to make it clearer in the revision. (Page 10, Line 428-431)

      “It is possible that the more frequent self-renewal and spontaneous activation of EOM SCs contribute to higher rate of mitochondrial DNA replication, leading to accelerated spreading of mitochondrial DNA defects, resulting in higher proportion of COX-deficient myofibers than other muscles”.

      (h) Were the experimenters blinded as to the results shown in Figures 2, 7, 8, and 9?

      We endeavored to blind experiments whenever possible. Not all experiments were blinded due to logistic complexity and the clear difference in microscopic and gross appearances of wild-type and mutant muscle. The differences observed in Figures 2, 7, 8, 9 are qualitative (ie more than just quantitative), which should minimize the impact of possible human bias. Additionally, we employed multiple different experimental approaches to assess our hypotheses.

      For Fig 2, the physical appearance is notably different between G93A and WT muscles. The different innervation status (Fig 2A) is also not amenable to blinding.

      For Fig 7, the expression level of Hmga2, Notch3 and Cxcl12 detected by the qPCR assay are substantially greater in EOM derived SCs than counterparts from other muscles, and these results are also consistent with RNA-Seq, immunofluorescence assays. For Fig 8, the overexpression of Cxcl12 and the coculture with EOM SC derived myotubes not only increased the length of the longest neurites but also promoted axon branching, which can be easily observed.

      For Fig 9, only the EOM SC derived myotubes were capable of aligning the neurites along with them on a global scale. This qualitative difference is easy to appreciate, even under low magnification.

      (i) Line 64 -65 The authors refer to a very old paper by Fischer et al in 2002 for the expression profile of EOMs. There are more recent papers including that of Eckhardt et al. (eLife 2023, 12:e83618) showing the differences in proteome between EOMs and soleus and EOMs and EDL muscles. There are more than 2000 (and not 300!!) differentially expressed proteins.

      Thank you for the newly published reference. We have revised the Introduction section to include this new proteomic study. (Page 2, Line 64-69)

      (j) Figure 7 C. The Y axis is mislabeled as they should be log2 fold change and not the growth conditions.

      Thank you for catching this. We have fixed it.

      (k) In all figures, if each symbol represents the results obtained on 1 mouse, this needs to be clearly stated. What do the panels on the right of Figures 4 and 5B show?

      Thanks for the comments. For Figure 1B and 2C, as well as Figure 1-figure supplement 1B, one dot in the box-and-dot plots represents result obtained from 1 mouse. For Figure 3B, one dot represents one round of sorting. Generally, one mouse was euthanized for each round of sorting for HL and diaphragm SCs. But the sorting of EOM SCs could take up to 6 mice (as the EOMs are much smaller). For Figure 4 and 5B, each dot represents one image analyzed. All images were collected from three rounds of sorting. For Figure 5A, each dot represents one replicate of culture. For Figure 5B, each dot represents one image analyzed. All images were collected from three rounds of sorting. We have indicated those details in the revision.

      Please also see our response to the 1st question of Reviewer 2 in the public review section.

      (l) Figure 6 Table supplement 3 does NOT show the FDR but only the log2 fold change. Please amend.

      We have amended the supplementary table accordingly.

    1. Author Response

      We would like to thank the editors for giving us an opportunity to address the insightful comments made by the referees. In our response to the comments, we provide a guide to important information that may have been overlooked, and hope to elaborate on the context for better evaluating this study.

      As mentioned in the introduction of our manuscript, mosquito-transmitted diseases cause nearly a million deaths every year and significant worldwide morbidity. Moreover, the geographical range of mosquito vectors is rapidly expanding due to climate change and mosquito-borne disease risks are emerging in new parts of the world. DEET was discovered in the 1940s and has remained the primary insect repellent for >70 years in the developed world. The US Environmental Protection Agency (EPA) regulates mosquito repellents, and DEET-based commercial products are typically assigned protection times that vary with concentration. Products with lower concentration need repeated applications, whereas those with higher concentrations feel oily and cost more.

      We also mentioned that DEET inhibits mammalian cation channels and human acetylcholinesterase. The latter is a target of carbamate insecticides that are commonly used in disease-endemic areas, raising additional concerns about prolonged use of DEET. DEET is also a solvent and damages several forms of plastics, synthetic fabrics, and painted . Unfortunately, DEET has been of little value in disease control in Africa and Asia. Even in developed countries, a natural, cosmetically pleasant alternative could benefit millions of people who currently avoid repellents.

      Innovation in finding new repellents has been slow due to limitations in current research approaches and high costs for EPA registration (specially for synthetic compounds). Since DEET only five additional actives have been approved by the EPA for repellent products. In the 20+ years since discovery of insect odorant receptors from genomes, not a single novel repellent compound has been identified registered by the EPA. Thus, there is a both a strong need for new approaches to find insect repellents and need for new active ingredients that are safe and strategically effective. In fact, this goal of finding new mosquito repellents has been the topic of multiple Gates Foundation Grand Challenge grants, and numerous NIH funded grants to many research groups around the world.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors set up a pipeline to predict insect repellents that are pleasant and safe for humans. This is done by daisy-chaining a new classification model based on predicting repellents with a published model on predicting human perception. Models use a feature-engineered selection of chemical features to make their predictions. The predicted molecules are then validated against a proxy humanoid (heated brick) and its safety is tested by molecular assays of human cells. The humanistic approach to modeling these authors have taken (which considers cosmetic/aesthetic appeal and safety) is novel and a necessary step for consumer usage. However, the importance of pleasantness over effectiveness is still up for debate (DEET is unpleasant but still used often) and the generalization of safety tests is unknown and assumed. The effectiveness of the prediction models is also still warranted. They pass the authors' own behavioral tests, but their contribution to the field is unknown as both models (new and published) have not been rigorously benchmarked to previous models. Moreover, the author's breadth of literature in this field is sparse, ignoring directly related studies.

      Strengths:

      Humanistic approach to modeling considers pleasantness and safety. Chaining models can help limit the candidate odorants from the vastness of odor space.

      Weaknesses:

      The current models need to be bench-marked against leading models predicting similar outcomes. Similarly, many of these papers need to be addressed and discussed in the introduction. The authors might even consider their data sources for model training to increase performance and lexical categorization for interoperability. For instance, the Dravnikes data lexicon, currently used in the human perception lexicon, has been highly criticized for its overlapping and hard-to-interpret descriptive terms ("FRAGRANT", "AROMATIC").

      Human Perception:

      Khan, R. M., Luk, C. H., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., & Sobel, N. (2007). Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world. Journal of Neuroscience, 27(37), 10015-10023.

      Keller, A., Gerkin, R. C., Guan, Y., Dhurandhar, A., Turu, G., Szalai, B., ... & Meyer, P. (2017). Predicting human olfactory perception from chemical features of odor molecules. Science, 355(6327), 820-826.

      Gutiérrez, E. D., Dhurandhar, A., Keller, A., Meyer, P., & Cecchi, G. A. (2018). Predicting natural language descriptions of mono-molecular odorants. Nature communications, 9(1), 4979.

      Lee, B. K., Mayhew, E. J., Sanchez-Lengeling, B., Wei, J. N., Qian, W. W., Little, K. A., ... & Wiltschko, A. B. (2023). A principal odor map unifies diverse tasks in olfactory perception. Science, 381(6661), 999-1006.

      Author Response: The human perception predictions were performed using models that we had reported in two earlier publications: Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021). Three of the four references pointed out by the referee were cited in these prior studies, which involved computational validation by predicting on a test set of the data which was left out of training (as typically done), and also predicting across different human studies with a high degree of success. A rigorous benchmarking of the odor perception models was done in Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021). This included a favorable comparison with the two references indicated by the referee: Keller et al. Science (2017) as well as the Gutiérrez et. al. Nat. Communication (2018). The 4th reference, Lee et al, Science (2023) describes a neural network approach and was published much after our mosquito behavior studies were completed. Although using an advanced Neural network model Lee et al. worked with 2-D structures of compounds in contrast to our 3-D approach. They also did not report cross-study validations or comparisons with Keller et al, 2017 or benchmark to past studies, so it is difficult to compare advances if any.

      The intent of the current study was to move beyond testing approaches, of which there are many, and instead work on a practical use case. As we see it, it is not necessarily the prediction of fragrance character or quality alone that matters but overlap with other predicted bioactivities. From the perspective of human use, a molecule with a pleasing scent that also repels insects is likely to be far more useful than one with an unappealing scent. Accordingly, our task in this study was to select molecules that fit into specific use categories: display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances.

      Insect Repellents:

      Wright, R. H. (1956). Physical basis of insect repellency. Nature, 178(4534), 638-638.

      Katritzky, A. R., Wang, Z., Slavov, S., Tsikolia, M., Dobchev, D., Akhmedov, N. G., ... & Linthicum, K. J. (2008). Synthesis and bioassay of improved mosquito repellents predicted from chemical structure. Proceedings of the National Academy of Sciences, 105(21), 7359-7364.

      Bernier, U. R., & Tsikolia, M. (2011). Development of Novel Repellents Using Structure− Activity Modeling of Compounds in the USDA Archival Database. In Recent Developments in Invertebrate Repellents (pp. 21-46). American Chemical Society.

      Author response: The Katritzky et. al. PNAS (2008) paper is cited in our study, and we have indicated that the chemical analogs reported therein are part of the training data set in our study. We thank the reviewer for pointing us to the book chapter by Bernier & Tsikolia (2011), which reviews the QSAR approaches taken for repellent discovery and in large measure focuses on the Katritzky et. al. PNAS (2008) paper. We did cite two relevant studies by Uli Bernier, but agree that citation of the book chapter would make a nice addition.

      The current study assumes that insect repellents repel via their odor valence to the insect, but this is not accurate. Insect repellents also mask the body odor of humans making them hard to locate. The authors need to consult the literature to understand the localization and landing mechanisms of insects to their hosts. Here, they will understand that heat alone is not the attractant as their behavioral assay would have you believe. I suggest the authors test other behaviour assays to show more convincing evidence of effectiveness. See the following studies:

      De Obaldia, M. E., Morita, T., Dedmon, L. C., Boehmler, D. J., Jiang, C. S., Zeledon, E. V., ... & Vosshall, L. B. (2022). Differential mosquito attraction to humans is associated with skin-derived carboxylic acid levels. Cell, 185(22), 4099-4116.

      McBride, C. S., Baier, F., Omondi, A. B., Spitzer, S. A., Lutomiah, J., Sang, R., ... & Vosshall, L. B. (2014). Evolution of mosquito preference for humans linked to an odorant receptor. Nature, 515(7526), 222-227.

      Wei, J. N., Vlot, M., Sanchez-Lengeling, B., Lee, B. K., Berning, L., Vos, M. W., ... & Dechering, K. J. (2022). A deep learning and digital archaeology approach for mosquito repellent discovery. bioRxiv, 2022-09.

      Author response: In this study we took an unbiased approach to compile the training data set, including several known insect repellents of varying chemical structures and volatility, for most of which there is no information on how they are sensed by insects. Not surprisingly, the repellents we identified are varied in structure and in functional groups, and are likely detected in more than one way by the mosquitoes, using olfactory and/or gustatory systems. We did not consider “masking” of skin attraction as a factor in the training data set in this study, which precluded the need to discuss the papers pointed out by the referee in any detail. In fact there is an extremely vast and rich body of literature regarding human skin odor, CO2 and breath emanations, which includes our own contributions of research and review articles that are not discussed in the current paper.

      We did in fact conduct human arm-in-cage experiments with a few of the compounds reported in this study using female Aedes aegypti mosquitoes; a preprint describes the smaller scale analysis, the results of which show strong repellency, in Boyle et. al. bioRxiv (2016) https://doi.org/10.1101/060178 (Figure 4). However, heat offers a practical proxy for evaluating prospective repellents in a high-throughput manner. It would certainly be desirable to further evaluate additional candidates from the heat attraction assay with human subjects in the future.

      We thank the reviewer for pointing out the preprint by Wei, et. al. bioRxiv (2022). Our approaches differ in that Wei et al do not consider properties such as fragrance and toxicity. We also cannot assume that their newer neural network model is superior because although the model uses a large training dataset, it does not use 3D chemical structures that are extremely relevant for biological activity. While very little information is available for the actives reported in Wei et. al., we independently evaluated their top compounds similar or better than DEET (CAS#3731-16-6, 4282-32-0, 2040-04-2, 32940-15-1 and 3446-90-0) and could not find information about toxicity, smell, or natural source. In contrast, the top repellents that we identify here as similar or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmin, mushroom, grapes, etc), and have pleasant smells. The Dermal toxicity values in rabbits are known for six of our compounds and are at the best possible levels (5000mg/kg).

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study that seeks to identify novel mosquito repellents that smell attractive to humans.

      Strengths:

      The combination of standard machine learning methods with mosquito behavioral tests is a strength.

      Weaknesses:

      The study would be strengthened by describing how other modern ML approaches (RF, decision trees) would classify and identify other potential repellents.

      Author response: The current approach already shows a success rate >85% for repellency coefficient >0.5 and identifies eight naturally occurring GRAS compounds with repellency as strong as or greater than DEET. This substantially expands the repertoire of strong natural repellents. Since the 1950s only six active ingredients have been registered by US EPA for use in topical repellents, of which only two are natural in origin (Oil of lemon eucalyptus and catmint oil) and they typically do not protect as well as DEET does. That being said, we have since explored other predictive algorithms, for instance Neural Networks. The experimental evaluation of these newer pipelines will take significant resources and time and will be the focus of future grants.

      A comparison in the repellent activity between DEET and the top ten hits identified in this new study indicates little change in repellent activity (~3%), suggesting that DEET remains the gold standard. Without additional toxicity tests, the study is arguably incremental. The study's novelty should be better clarified.

      Author response: There is an urgent need to find new insect repellents that have better chances of being adopted by people who avoid DEET, such as in Africa and Asia. Having more natural actives that are effective, expands the tools against disease transmitting mosquitoes. As mentioned above, the top repellents that we identified as similar to or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmin, mushroom, grapes), and have pleasant smells. The Dermal toxicity values in rabbits are known for six and they are of the best possible levels (5000mg/kg).

      The Methods in the repellency tests are sparse, and more information would be useful. Testing the top repellents at low doses (<<1%) and for long periods (2-12 h) would strengthen the manuscript. Without this information, the manuscript is lacking in depth.

      Author response: The US Environmental Protection Agency (EPA) regulates mosquito repellents, and DEET-based commercial products are typically assigned protection times that vary with concentration (10% ~2 hrs, 30% ~5hrs, 100% ~8hrs). These would be the relevant concentrations for testing protection times on human volunteers, not lower as suggested. Such studies fall within the realm of EPA registration efforts, involving extensive GLP-testing for safety, physical chemistry, and Human Subjects Board approvals. This is outside the scope of the current study and is typically accomplished during development efforts.

      Testing human subjects on their olfactory perceptions of the repellents would also increase the depth and utility of the manuscript. Without additional experiments, the authors' conclusions lack support and have limited impact on the state-of-the-art.

      This manuscript is a mix of different approaches, which makes it lack cohesion. There is the ML method for classifying new repellents that smell good, but no testing of the repellents on human volunteers. The repellents are not tested at realistic concentrations and durations. And the calcium mobilization test is strange and makes little sense in the context of the other experiments and framing of the manuscript.

      Author response: The human olfaction validation that we present in this paper is consistent with most current publications in the field (for example, Keller et al, Gutiérrez et al.). More systematic validation of the human odor character prediction pipelines used was presented in two previous papers Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021).

      Reviewer #3 (Public Review):

      While I am not a specialist in this field, I do have some knowledge of the subject matter and the computational aspects involved. The authors employ simple machine learning techniques (such as SVM) for the following purposes:

      (a) Prediction of aversive valence.

      (b) Predicting anti-repellent chemicals.

      (c) Predicting calcium mobilization.

      The approach is commonplace in chemoinformatics literature.

      Weaknesses:

      • All the above models are presented discretely, making it difficult to discern experiment design principles and connectedness.

      • The ML work is rudimentary, lacking adequate details. Chemoinformatics has reached great heights, and SVM does not seem contemporary.

      • There is significant existing research on finding repellents.

      Author response: In the current study, we aimed to showcase how computational research may be combined with basic science to create scalable pipelines that address real world problems, rather than to demonstrate methodological novelty of chemoinformatics approaches. Specifically we wanted to use different predictive models to identify compounds that display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances. Unfortunately, there is very little existing research on insect repellents that have these types of properties, which would make them better candidates for EPA registration. Most tested compounds are synthetic, and are often analogs of known repellents like DEET, and necessitate substantial time and resources to register. Moreover the identities of chemosensory receptors that are responsible for repellency to DEET and other compounds, and that are conserved across Anopheles, Aedes and Culex mosquitoes are not known.

      It is true that the field of cheminformatics has experimented with a variety of newer approaches, based in part on neural networks (e.g., Graph Neural Networks and graph embeddings to encode chemical structure rather than a more conventional Extended Connectivity Fingerprint (ECFP)). Importantly, however, novelty does not imply usefulness. The mosquito behavior experiments that we present show a very high success rate (>85%), validating our approach and identifying several excellent candidates already.

      Strengths:

      • Authors attempt to make a case for calcium mobilization in the context of repellency. This aspect sounds interesting but is not surprising.

      • Behavioral profiling of repellents could be useful.

      Author Comment: We thank the referee for this comment. We have indeed done behavioral profiling for several repellents that evoke calcium mobilization, but we do not see any clear correlation thus far.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents a valuable approach to exploring CD4+ T-cell response in mice across stimuli and tissues through the analysis of their T-cell receptor repertoires. The authors use a transgenic mouse model, in which the possible diversity of the T-cell receptor repertoire is reduced, such that each of a diverse set of immune exposures elicits more detectably consistent T-cell responses across different individuals. However, whereas the proposed experimental system could be utilized to study convergent T-cell responses, the analyses done in this manuscript are incomplete and do not support the claims due to limitations in the statistical analyses and lack of data/code access.

      We worked to address the reviewers' concerns below, point-by-point.

      All data on immune repertoires are deposited here: https://figshare.com/articles/dataset/Convergence_plasticity_and_tissue_residence_of_regulatory_and_effector_T_cell_response/22226155

      We added the Data availability statement to the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the alpha chain TCR landscape in conventional vs regulatory CD4 T cells. Overall I think it is a very well thought out and executed study with interesting conclusions. The authors have investigated CDR3 alpha repertoires coupled with a transgenic fixed CDR3beta in a mouse system.

      Strengths:

      • One of a kind evidence and dataset.

      • State-of-the-art analyses using tools that are well-accepted in the literature.

      • Interesting conclusions on the breadth of immune response to challenges across different types of challenges (tumor, viral and parasitic).

      Thank you for the positive view.

      Weaknesses:

      • Some conclusions regarding the eCD4->eTreg transition are not so strong using only the data.

      The overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • Some formatting issues.

      We are working on the manuscript to correct minor errors and formatting.

      Reviewer #2 (Public Review):

      This study investigates T-cell repertoire responses in a mouse model with a transgenic beta chain, such that all T-cells in all mice share a fixed beta chain, and repertoire diversity is determined solely by alpha chain rearrangements. Each mouse is exposed to one of a few distinct immune challenges, sacrificed, and T-cells are sampled from multiple tissues. FACS is used to sort CD4 and Treg cell populations from each sample, and TCR repertoire sequencing from UMI-tagged cDNA is done.

      Various analyses using repertoire diversity, overlap, and clustering are presented to support several principal findings: 1) TCR repertoires in this fixed beta system have highly distinct clonal compositions for each immune challenge and each cell type, 2) these are highly consistent across mice, so that mice with shared challenges have shared clones, and 3) induction of CD4-to-Treg cell type transitions is challenge-specific.

      The beta chain used for this mouse model was previously isolated based on specificity for Ovalbumin. Because the beta chain is essential for determining TCR antigen specificity, and is highly diverse in wildtype mice, I found it surprising that these mice are reported to have robust and consistently focused clonal responses to very diverse immune challenges, for which a fixed OVA-specific beta chain is unlikely to be useful. The authors don't comment on this aspect of their findings, but I would think it is not expected a priori that this would work. If this does work as reported, it is a valuable model system: due to massively reduced diversity, the TCR repertoire response is much more stereotyped across individual samples, and it is much easier to detect challenge-specific TCRs via the statistics of convergent responses.

      This was to some extent expected, since these mice live almost normally and have productive adaptive immune responses and protection. In real life, there are frequent TCR-pMHC interactions where the TCR-alpha chain dominates (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5701794/; https://pubmed.ncbi.nlm.nih.gov/37047500/). On the fixed TCR-beta background this mechanics starts working full-fledged, essentially substituting TCR-beta diversity, at the extent of relatively simplified TCRab repertoire and probably higher cross-reactivity.

      We agree that this is a valuable model, for sure, and indicated this in the last sentence of our Discussion. Now we are also adding this point to the abstract.

      While the data and analyses present interesting signals, they are flawed in several ways that undermine the reported findings. I summarize below what I think are the most substantive data and analysis issues.

      (1) There may be systematic inconsistencies in repertoire sampling depth that are not described in the manuscript. Looking at the supplementary tables (and making some plots), I found that the control samples (mice with mock challenge) have consistently much shallower sampling-in terms of both read count and UMI count-compared with the other challenge samples. There is also a strong pattern of lower counts for Treg vs CD4 cell samples within each challenge.

      The immune response of control mice is less extensive, as it should be. Just like the fact that the number of Tregs in tissues is lower than CD4, this is normal. So this all follows the expectations. But please note that we were very accurate everywhere with appropriate data normalisation, using all our previous extensive experience (https://pubmed.ncbi.nlm.nih.gov/29080364/).

      In particular (now adding more relevant details to Methods):

      For diversity metrics calculations, we randomly sampled an equal number of 1000 UMI from each cloneset. Samples with UMI < 700 were excluded from analysis.

      For amino acid overlap metrics calculations, we selected top-1000 largest clonotypes from each cloneset. Samples with clonotype counts < 700 were excluded from analysis.

      For nucleotide overlaps metrics calculations (eCD4-eTreg), we selected top-100 clonotypes from each cloneset. Samples with clonotypes < 100 were excluded from analysis.

      The top N clonotypes were selected as the top N clonotypes after randomly shuffling the sequences and aligning them in descending order. This was done in order to get rid of the alphabetical order for clonotypes with equal counts (e.g. count = 1 or 2).

      Downsampling was carried out using software vdjtools v.1.2.1.

      (2) FACS data are not reported. Although the graphical abstract shows a schematic FACS plot, there are no such plots in the manuscript. Related to the issue above, it would be important to know the FACS cell counts for each sample.

      Yes, we agree that this is valuable information that should be provided. Unfortunately, this data has not been preserved.

      (3) For diversity estimation, UMI-wise downsampling was performed to normalize samples to 1000 random UMIs, but this procedure is not validated (the optimal normalization would require downsampling cells). What is the influence of possible sampling depth discrepancies mentioned above on diversity estimation? All of the Treg control samples have fewer than 1000 total UMIs-doesn't that pose a problem for sampling 1000 random UMIs?

      Indeed, I simulated this procedure and found systematic effects on diversity estimates when taking samples of different numbers of cells (each with a simulated UMI count) from the same underlying repertoire, even after normalizing to 1000 random UMIs. I don't think UMI downsampling corrects for cell sampling depth differences in diversity estimation, so it's not clear that the trends in Fig 1A are not artifactual-they would seem to show higher diversity for control samples, but these are the very same samples with an apparent systematic sampling depth bias.

      We evaluated this approach through all our work, and summarised in the ref: https://pubmed.ncbi.nlm.nih.gov/29080364/. Altogether, normalising to the same count of randomly sampled UMI seems to be the best approach (although, preferably, the initial sequencing depth should be essentially higher for all samples than the sampling threshold used). Initial sorting of identical numbers of cells and ideally uniform library preparation and sequencing is generally not realistic and does not work in the real world, while UMI downsampling does the same work much better.

      (4) The Figures may be inconsistent with the data. I downloaded the Supplementary Table corresponding to Fig 1 and made my own version of panels A-C. This looked quite different from the diversity estimations depicted in the manuscript. The data does not match the scale or trends shown in the manuscript figure.

      There was a wrong column for Chao1, now correcting. Also, please note that we only used samples with > 700 UMI. Supplementary Table now corrected accordingly. Also, please note that Figure 1 shows the results for lung samples only.

      (5) For the overlap analysis, a different kind of normalization was performed, but also not validated. Instead of sampling 1000 UMIs, the repertoires were reduced to their top 1000 most frequent clones. It is not made clear why a different normalization would be needed here. There are several samples (including all Treg control samples) with only a couple hundred clones. It's also likely that the noted systematic sampling depth differences may drive the separation seen in MDS1 between Treg and CD4 cell types. I also simulated this alternative downsampling procedure and found strong effects on MDS clustering due to sampling effects alone.

      That’s right, for the overlap analysis (which values are mathematically proportional to the clonotype counts in both compared repertoires, so the difference in the counts causes major biases) the right way to do it is to choose the same number of clonotypes. See Ref. https://pubmed.ncbi.nlm.nih.gov/29080364/.

      We kept only samples with > 700 for the overlap analyses. Some relatively poor samples are present in all challenges, while MDS1 localization has clear reproducible logic, so we are confident in these results.

      It is not made clear how the overlap scores were converted to distances for MDS. It's hard to interpret this without seeing the overlap matrix.

      This is a built-in feature in VDJtools software (https://pubmed.ncbi.nlm.nih.gov/26606115/). See also here: https://vdjtools-doc.readthedocs.io/en/master/overlap.html.

      (6) The cluster analysis is superficial, and appears to have been cherry-picked. The clusters reported in the main text have illegibly small logo plots, and no information about V/J gene enrichments. More importantly, as the caption states they were chosen from the columns of a large (and messier-looking) cluster matrix in the supplementary figure based on association with each specific challenge. There's no detail about how this association was calculated, or how it controlled for multiple tests. I don't think it is legitimate to simply display a set of clusters that visually correlate; in a sufficiently wide random matrix you will find columns that seem to correlate with any given pattern across rows.

      Particular CDR3 sequences and VJ segments do not mean much for the results of this manuscript. Logos are given just for visual explanation of how the consensus motifs of the clusters look like.

      We now add two more Supplementary Tables and a Supplementary Figure with full information about clusters.

      We disagree that the Supplementary Figure 1 (representing all the clusters) looks “messy”. Vice versa, it is surprisingly “digital”, showing the clear patterns of responses and homings. This becomes clear if you visually study it for a while. But yes, it is too big to let the reader focus on this or that aspect. That is why we need to select TCR clusters to illustrate this or that aspect discussed in the work, but they were selected from the overall already structured picture.

      (7) The findings on differential plasticity and CD4 to Treg conversion are not supported. If CD4 cells are converting to Tregs, we expect more nucleotide-level overlap of clones. This intuition makes sense. But it seems that this section affirms the consequent: variation in nucleotide-level clone overlap is a readout of variation in CD4 to Treg conversion. It is claimed, based on elevated nucleotide-level overlap, that the LLC and PYMT challenges induce conversion more readily than the other challenges. It is not noted in the textual interpretations, but Fig 4 also shows that the control samples had a substantially elevated nucleotide-level overlap. There is no mention of a null hypothesis for what we'd expect if there was no induced conversion going on at all. This is a reduced-diversity mouse model, so convergent recombination is more likely than usual, and the challenges could be expected to differ in the parts of TCR sequence space they induce focus on. They use the top 100 clones for normalization in this case, but don't say why (this is the 3rd distinct normalization procedure).

      Your point is absolutely correct: “This is a reduced-diversity mouse model, so convergent recombination is more likely than usual”. Distinct normalisation procedure was required to focus on the most expanded clonotypes to avoid the tail of (presumably cross-reactive) and identical TCRs present in all repertoires in these limited-repertoire mice. So we downsampled as strictly as possible to minimise this background signal of nucleotide overlap, and only this strict downsampling to the top-100 clonotypes allowed us to visualise the difference between the challenges. This is a sort of too complicated explanation that would overload the manuscript. But your comments and our answers will be available to the reader who wants to go into all the details.

      The observed (at this strict downsampling) overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts in interpretations based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      Although interpretations of the reported findings are limited due to the issues above, this is an interesting model system in which to explore convergent responses. Follow-up experimental work could validate some of the reported signals, and the data set may also be useful for other specific questions.

      Yes, thank you for your really thorough analysis. We fully agree with your conclusion.

      Reviewer #3 (Public Review):

      Nakonechnaya et al present a valuable and comprehensive exploration of CD4+ T cell response in mice across stimuli and tissues through the analysis of their TCR-alpha repertoires.

      The authors compare repertoires by looking at the relative overlap of shared clonotypes and observe that they sometimes cluster by tissue and sometimes by stimulus. They also compare different CD4+ subsets (conventional and Tregs) and find distinct yet convergent responses with occasional plasticity across subsets for some stimuli.

      The observed lack of a general behaviour highlights the need for careful comparison of immune repertoires across cell subsets and tissues in order to better understand their role in the adaptive immune response.

      In conclusion, this is an important paper to the community as it suggests several future directions of exploration.

      Unfortunately, the lack of code and data availability does not allow the reproducibility of the results.

      Thank you for your positive view.

      All data on immune repertoires are deposited here: https://figshare.com/articles/dataset/Convergence_plasticity_and_tissue_residence_of_regulatory_and_effector_T_cell_response/22226155

      We added the Data availability statement to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • In the manuscript at "yielding 13,369 {plus minus} 1,255 UMI-labeled TCRα cDNA molecules and 3233 {plus minus} 310 TCRα CDR3 clonotypes per sample" I'm not sure how can there be fewer unique DNA molecules than clonotypes in each sample.

      That was our mistake for sure, now corrected.

      • In the manuscript at "This indicates that the amplitude and focused nature of the effector and regulatory T cell response in lungs is generally comparable."

      I'm not sure it's possible to conclude that a drop in diversity in all conditions necessarily signals a focused nature. Since at this stage, the nature of the colotypes was not compared between conditions, it is not possible to claim a focused nature of the response.

      We have softened the wording:

      "This could indicate that the amplitude and focused nature of the effector and regulatory T cell response in lungs is generally comparable."

      • What are your thoughts on why there is such a large overlap between Treg and Teff in the Lung in control? For some replicates it is almost as much as a post-LLC challenge!

      There is some natural dispersion in the data, which is generally expectable. The overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • In the manuscript at "These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches" I'm not sure we can conclude this from the Figure. Wouldn't you expect the samples to be grouped by color (the different challenges)? Maybe I'm not understanding the sentence!

      This is a different story, about resident Tregs, irrespective of the challenge.

      The whole explanation is here in the text:

      “Global CDR3α cluster analysis revealed that characteristic eTreg TCR motifs were present in distinct lymphatic tissues, including spleen and thymus, irrespective of the applied challenge (Supplementary Fig. 1). To better illustrate this phenomenon, we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge. This analysis demonstrated close proximity of eTreg repertoires obtained from the same lymphatic tissues upon all lung challenges and across all animals (Fig. 5a, b). These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches. Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells (Fig. 5c, d).”

      And in the abstract:

      “Additionally, our TCRα repertoire analysis demonstrated that distinct antigenic specificities are characteristic for eTreg cells residing in particular lymphatic tissues, regardless of the challenge, revealing the homing-specific, antigen-specific resident Treg populations. ”

      • In the manuscript at " Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells ":

      5b and 5d seem to have the same pattern: Spleen and MLN group together, AxLN and IgLN together and thymus is separate. Do you mean to say that the groups are more diffuse? I feel like the pattern really is the same and it's likely due to some noise in the data…

      Yes, we just mean here that eTreg groups are less diffuse - means more convergent.

      • I'm not sold on the eCD4 to eTreg conversion evidence. Why only limit to the top 100 clones? The top 1000 clones were used in previous analyses! Moreover, the authors claim that calculating relative overlap (via F2) of matching CDR3+V+J genes is evidence of a conversion between eCD4 and eTreg. I think to convince myself of a real conversion, I would track the cells between groups, unfortunately, I'm not sure how to track this.. Maybe looking at the thymus population? For example, what is the overlap in the thymus vs. after the challenge? I don't have an answer on how to verify but I feel that this conclusion is a bit on the weaker end.

      Distinct normalisation procedure was required to focus on the most expanded clonotypes to avoid the tail of (presumably cross-reactive) and identical TCRs present in all repertoires in these limited-repertoire mice. So we downsampled as strictly as possible to minimise this background signal of nucleotide overlap, and only this strict downsampling to the top-100 clonotypes allowed us to visualise the difference between the challenges. This is a sort of too complicated explanation that would overload the manuscript. But your comments and our answers will be available to the reader who wants to go into all the details.

      The observed (at this strict downsampling) overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts in interpretations based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • There is a nuance in the analysis between Figure 3 and Figure 5 which I think I am not grasping. Both Figures use the same method and the same data but what is different? I think the manuscript would benefit from making this crystal clear. The conclusions will likely be more evident as well!

      As explained in the text and above, on Figure 5 “we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge.”

      The idea of this mini-chapter of the manuscript is to reveal tissue-resident Tregs, distinct for distinct tissues, resident there in all these mice, irrespectively of the challenge we applied. And they are really there (!).

      • Do the authors plan to share their R scripts?

      All calculations were performed in VDJtools. R was only used to build figures. Corrected this in Methods.

      Minor typos and formatting issues to address:

      • Typo in Figure 2a the category should read "worm" instead of "warm"

      Corrected.

      • Figure 2a heatmap is missing a color bar indicating the value ranges

      The detailed information can be found in additional Supplementary materials.

      • Figure 2f is never mentioned in the manuscript!

      Corrected.

      • "eTreg repertoire upon lung challenge is reflected in the draining lymph node" - the word upon is of a lower size

      Corrected.

      • The authors should make the spelling of eTreg uniform across the manuscript (reg in subscript vs just lower case letters. Same goes for CDR3a vs CDR3\alpha

      Corrected.

      • Figure 4a-d p-values annotations are not shown. Is it because they are not significant?

      Corrected.

      • The spelling of FACS buffer should be uniform (FACs vs FACS, see methods)

      Corrected.

      • In the gating strategy, I would make a uniform annotation for the cluster of differentiation, for example, "CD44 high" vs "CD44^{hi}", pos vs + etc.

      Corrected.

      • Citation for MIGEC software (if available) is missing from methods

      Present in the text so probably sufficient.

      Reviewer #2 (Recommendations For The Authors):

      I noticed the data was made available via Figshare in the preprint, but there is no data availability statement in the current ms.

      We provided Data availability statement.

      The methods state that custom scripts were written to perform the various analyses. Those should be made available in a code repository, and linked in the ms.

      All calculations were performed in VDJtools. R was only used to build figures. Corrected this in Methods.

      The title mentioned "TCR repertoire prism", so I thought "prism" was the name of a new method or software. But then the word "prism" didn't appear anywhere in the ms.

      We just mean viewing or understanding something from a different perspective or through a lens that reveals different aspects or nuances.

      Figure 1D lacks an x-axis label.

      Worked on the figures in general.

      Reviewer #3 (Recommendations For The Authors):

      • The paper is very concise, possibly a bit too much. It could use additional explanations to properly affirm its relevance, for example:

      why the choice of fixing the CDR3beta background?

      To make repertoire more similar across the mice, and to track all the features of repertoire using only one chain.

      to what it is fixed?

      As explained in Methods:

      “C57BL/6J DO11.10 TCRβ transgenic mice (kindly provided by Philippa Marrack) and crossed to C57BL/6J Foxp3eGFP TCRa-/- mice.”

      What do you expect to see and not to see in this specific system and why it is important?

      As stated above: we expected repertoire to be more similar across the mice, and it is important to find antigen-specific TCR clusters across mice, and to be able to track all the features of the TCR repertoire using only one chain.

      Does this system induce more convergent responses? If so, can we extrapolate the results from this system to the full alpha-beta response?

      Such a model, compared to conventional mice, is much more powerful in terms of the ability of monitoring convergent TCR responses. At the same time, it behaves natural, mice live almost normally, so we believe it reflects natural behaviour of the full fledged alpha-beta T cell repertoire.

      • Is the lack of similarity of other tissues to Lung/MLN due to a lack of a response?

      As indicated in the title of the corresponding mini-chapter: “eTreg repertoire upon lung challenge is reflected in the draining lymph node”. And conclusion of this mini-chapter is that “these results demonstrate the selective tissue localization of the antigen-focused Treg response. ”

      Can you do a dendrogram like 2a for the other tissues to better clarify what is going on there? There is space in the supplementary material.

      We built lots of those, but in such single dimension mostly they are less informative compared to 2D MDS plots.

      • Figure 5 seems a bit out of place as it looks more related to Figure 2. It could maybe be integrated there, sent to supplementary or become Figure 3?

      This is a different story, about resident Tregs, irrespective of the challenge.

      The whole explanation is here in the text:

      “Global CDR3α cluster analysis revealed that characteristic eTreg TCR motifs were present in distinct lymphatic tissues, including spleen and thymus, irrespective of the applied challenge (Supplementary Fig. 1). To better illustrate this phenomenon, we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge. This analysis demonstrated close proximity of eTreg repertoires obtained from the same lymphatic tissues upon all lung challenges and across all animals (Fig. 5a, b). These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches. Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells (Fig. 5c, d).”

      And in the abstract:

      “Additionally, our TCRα repertoire analysis demonstrated that distinct antigenic specificities are characteristic for eTreg cells residing in particular lymphatic tissues, regardless of the challenge, revealing the homing-specific, antigen-specific resident Treg populations. ”

      • Have you explored more systematically the role of individual variability? If you stratify by individual, do you observe any trend? If not this is also an interesting observation to highlight and discuss.

      This is inside the calculations and figures/ one dot = 1 mice, so this natural variation is there inside.

      • Regarding the MDS plots: why are 2 dimensions the right amount? Maybe with 3, you can see both tissue specificity and stimuli contributions. Can you do a stress vs # dimensions plot to check what should be the right amount of dimensions to more accurately reproduce the distance matrix?

      Tissue specificity and stimuli contribution is hard to distinguish without focussing on appropriate samples, as we did on Fig. 3 and 5. The work is already not that simple as is, and attempting to analyse this in multidimensional space is far beyond our current abilities. But this is an interesting point for future work, thank you.

      • Figure 2: A better resolution is needed in order to properly resolve the logo plots at the bottom.

      Yes, we worked on Figures, and also provide new Supplementary Figure with all the logos.

      • No code or data are made available. There is also a lack of supplementary figures that complement and expand the results presented in the main text.

      We believe that the main text, although succinct, contains lots of information to analyse and conclusions (preliminary) to make. So we do not see it rational to overload it further.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Overall Response

      We thank the reviewers for reviewing our manuscript, recognizing the significance of our study, and offering valuable suggestions. Based on the reviewer’s comments and the updated eLife assessment, we would like to chose the current version of our manuscript as the Version of Record of our manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model which takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input, than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter.

      The authors control for some degree of redundancy between their training and test sets, both using sequence and structural similarity criteria. This is more careful than can be said of most works in the field of PPI prediction.

      As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      The authors check for performance drops when the test set is restricted to pairs of interacting proteins such that the chain pair is not similar as a pair (in sequence or structure) to a pair present in the training set. A more challenging test would be to restrict the test set to pairs of interacting proteins such that none of the chains are separately similar to monomers present in the training set. In the case of structural similarity (TM-scores), this would amount to replacing the two "min"s with "max"s in Eq. (4). In the case of sequence similarity, one would simply require that no monomer in the test set is in any MMSeqs2 cluster observed in the training set. This may be an important check to make, because a protein may interact with several partners, and/or may use the same sites for several distinct interactions, contributing to residual data leakage in the test set.

      We thank the reviewer for the suggestion! In the case of protein-protein prediction (“0D prediction”) or protein-protein interfacial residue prediction(“1D prediction”), we think making none of the chains in the test set separately similar to monomers in the training set is necessary, as the reviewer pointed out that a protein may interact with several partners, and may even use the same sites for the interactions. Since the task of this study is predicting the inter-protein residue-residue contacts (“2D prediction”), even though a protein uses the same site to interact with different partners, as long as the interacting partners are different, the inter-protein contact maps would be different. Therefore, we don’t think that in our task, making this restriction to the test set is necessary.

      The training set of AFM with v2 weights has a global cutoff of 30 April 2018, while that of PLMGraph-Inter has a cutoff of March 7 2022. So there may be structures in the test set for PLMGraph-Inter that are not in the training set of AFM with v2 weights (released between May 2018 and March 2022). The "Benchmark 2" dataset from the AFM paper may have a few additional structures not in the training or test set for PLMGraph-Inter. I realize there may be only few structures that are in neither training set, but still think that showing the comparison between PLMGraph-Inter and AFM there would be important, even if no statistically significant conclusions can be drawn.

      We thank the reviewer for the suggestion! It is not enough to only use the date cutoff to remove the redundancy, since similar structures can be deposited in the PDB in different dates. Because AFM does not release the PDB codes of its training set, it is difficult for us to totally remove the redundancy. Therefore, we think no rigorous conclusion can be drawn by including these comparisons in the manuscript. Besides, the main point of this study is to demonstrate that the integration of multiple protein language models using protein geometric graphs can dramatically improve the model performance for inter-protein contact prediction, which can provide some important enlightenments for the future development of more powerful protein complex structure prediction methods beyond AFM, rather than providing a tool which can beat AFM at this moment. We think including too many stuffs in the comparison with AFM may distract the readers. Therefore, we choose to not include these comparisons in the manuscript.

      Finally, the inclusion of AFM confidence scores is very good. A user would likely trust AFM predictions when the confidence score is high, but look for alternative predictions when it is low. The authors' analysis (Figure 6, panels c and d) seems to suggest that, in the case of heterodimers, when AFM has low confidence, PLMGraph-Inter improves precision by (only) about 3% on average. By comparison, the reported gains in the "DockQ-failed" and "precision-failed" bins are based on knowledge of the ground truth final structure, and thus are not actionable in a real use-case.

      We agree with the reviewer that more studies are needed for providing a model which can well complement or even beat AFM. The main point of this study is to demonstrate that the integration of multiple protein language models using protein geometric graphs can dramatically improve the model performance for inter-protein contact prediction, which can provide some important enlightenments for the future development of more powerful protein complex structure prediction methods beyond AFM.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep learning approach for predicting inter-protein contacts, which is crucial for understanding proteinprotein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      We thank the reviewer for recognizing the strengths of our work!

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • I recommend renaming the section "Further potential redundancies removal between the training and the test" to "Further potential redundancies removal between the training and the test sets"

      Changed.

      • In lines 768-769, the sentence seems to end prematurely in "to use more stringent threshold in the redundancy removal"

      Corrected.

      • In Eq. (4), line 789, there are many instances of dashes that look like minus signs, creating some confusion.

      Corrected.

      • I think I may have mixed up figure references in my first review. When I said (Recommendations to the authors): "p. 22, line 2: from the figure, I would have guessed "greater than or equal to 0.7", not 0.8", I think I was referring to what is now lines 423-424, referring to what is now Figure 5c. The point stands there, I think.

      Corrected.

      • A couple of new grammatical mishaps have been introduced in the revision. These could be rectified.

      We carefully rechecked our revisions, and corrected the grammatical issues we found.

      Reviewer #2 (Recommendations For The Authors):

      Most of my concerns were resolved through the revision. I have only one suggestion for the main figure.

      The current scatter plots in Figure 2 are hard to understand as too many different methods are abstracted into a single plot with multiple colors. I would suggest comparing their performances using box plot or violin plot for the figure 2.

      We thank the reviewer for the suggestion! In the revision, we tried violin plot, but it does not look good since too many different methods are included in the plot. Besides, we chose the scatter plot as it can provide much more details. We also provided the individual head-to-head scatter plots as supplementary figures, we think which can also be helpful for the readers to capture the information of the figures.


      The following is the authors’ response to the original reviews.

      Overall Response

      We would like to thank the reviewers for reviewing our manuscript, recognizing the significance of our study, and offering valuable suggestions. We have carefully revised the manuscript to address all the concerns and suggestions raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! In the revision, to emphasize the performance of PLMGraph-Inter using the predicted monomer structures, we moved the evaluation results based on the predicted monomer from the supplementary to the main text (see the new Table 1 and Figure 2 in the revised manuscript) and re-organized the two subsections “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets” and “Impact of the monomeric structure quality on contact prediction” in the main text.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! It is worth noting that AFM automatically searches monomer templates in the prediction, and when we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) at least 20 templates were identified (AFM employed the top 20 templates in the prediction), and 87.8% of the targets employed the native templates (line 455-462 in page 25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”). Therefore, we think Figure 6 not Figure S5 (the original Figure S2) shows a fairer comparison. Besides, it is also worth noting the targets used in this study would have a large overlap with the training set of AlphaFold-Multimer, since AFM used all protein complex structures in PDB deposited before 2018-04-30 in the model training, which would further cause the overestimation of the performance of AFM (line 450-455 in page 24-25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).

      To mimic the performance of AlphaFold2 in real practice and produce predicted monomeric structures with more diverse qualities, we only used the MSA searched from Uniref100 protein sequence database as the input to AlphaFold2 and set to not use the template (line 203~210 in page 12 in the subsection of “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets”). Since some of the predicted monomer structures are of bad quality, it is reasonable that the performance of PLMGraph-Inter drops when the predicted monomeric structures are used in the prediction. We provided a detailed analysis of the impact of the monomeric structure quality on the prediction performance in the subsection “Impact of the monomeric structure quality on contact prediction” in the main text.

      We provided the analysis of the AFM multimer confidence values (“iptm + ptm”) in the revision (Figure 6, Figure S5 and line 495-501 in page 27 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion, and we are sorry for the confusion! In the AFM runs to predict protein complex structures, we used the default setting of AFM which automatically searches monomer templates in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions (AFM only used the top 20 templates), and 87.8% of the targets employed the native template. We further clarified this in the revision (line 455462 in page 25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFoldMultimer”). We also included the mean precisions of AFM (top-50 contact prediction) in the revision (Table S5 and line 483-484 in page 26 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number would be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      Author response image 1.

      The head-to-head comparison of qualities of complex predicted by AlphaFold-Multimer (2.2.0) and AlphaFold-Multimer (2.3.2) for each target PPI.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. During the revision, we also tested the new version of AFM on the datasets of HomoPDB and HeteroPDB, but we found the performance difference between the two versions of AFM is actually very little (see the figure above, not shown in the main text). One reason might be that some targets in HomoPDB and HeteroPDB are redundant with the training sets of the two version of AFM. Since our test sets would have more overlaps with the training set of AFM V3, we keep using the AFM V2 weights in this study.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We thank the reviewer for the suggestion! In the revision, we explored the performance of PLMGraph-Inter when using different thresholds of fold similarity scores of interacting monomers to further remove potential redundancies between the training and test sets (i.e. redundancy in structure ) (line 353-386 in page 19-21 in the subsection “Ablation study”; line 762-797 in page 41-43 in the subsection “Further potential redundancies removal between the training and the test”). We found that for heteromeric PPIs (targets in HeteroPDB), the further removal of potential redundancy in structure has little impact on the model performance (~3%, when TM-score 0.5 is used as the threshold). However, for homomeric PPIs (targets in HomoPDB), the further removal of potential redundancy in structure significantly reduce the model performance (~18%, when TM-score 0.5 is used as the threshold) (see Table 2). One possible reason for this phenomenon is that the binding mode of the homomeric PPI is largely determined by the fold of its monomer, thus the does not generalize well on targets whose folds have never been seen during the training.

      Whether the deep learning model can generalize well on targets with novel folds is a very interesting and important question. We thank the reviewer for pointing out this! However, to the best of our knowledge, this question has rarely been addressed by previous studies including AFM. For example, the Benchmark 2 dataset is prepared by ClusPro TBM (bioRxiv 2021.09.07.459290; Proteins 2020, 88:1082-1090) which uses a sequence-based approach (HHsearch) to identify templates not structure-based. Therefore, we don’t think this dataset is non-redundant in structure.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model.

      Reviewer #1 (Recommendations For The Authors):

      Some sections of the paper use technical terminology which limits accessibility to a broad audience. An obvious example is in the section "Results > Overview of PLMGraph-Inter > The residual network module": the average eLife reader is not a machine learning expert and might not be familiar with a "convolution with kernel size of 1 * 1". In general, the "Overview of PLMGraph-Inter" is a bit heavy with technical details, and I suggest moving many of these to Methods. This overview section can still be there but it should be shorter and written using less technical language.

      We thank the reviewer for the suggestion! We moved some technical details to the Methods section in the revision (line 184-185 in page 11; line 729-735 in page 39).

      List of typos and minor issues (page number according to merged PDF):

      • p. 3. line -3: remove "to"

      Corrected (line 36, page 3)

      • p. 5, line 7: "GINTER" should be "GLINTER"

      Corrected (line 64, page 5)

      • p. 6, line -4: "Given structures" -> "Given the structures"

      Corrected (line 95, page 6)

      • p. 6, line -2: "with which encoded"... ?

      We rephrased this sentence in revision. (line 97, page 6)

      • p. 9, line 1: "principal" -> "principle"

      Corrected (line 142, page 9)

      • p. 13, line 1: "has" -> "but have"

      Corrected (line 231, page 13)

      • p. 14, lines 6-7: "As can be seen from the figure that the predicted" -> "As can be seen from the figure, the predicted"

      We rephrased this paragraph, and the sentence was deleted in the revision (line 257-259 in page 15).

      • p. 18, line 1: the "five models" are presumably models a-e? If so, say "of models a-e"

      Corrected (line 310, page 17)

      • p. 22, line 2: from the figure, I would have guessed "greater than or equal to 0.7", not 0.8

      Based the Figure 3C, we think 0.8 is a more appropriate cutoff, since the precision drops significantly when the DTM-score is within 0.7~0.8.

      • p. 23, lines 2-3: "worth to making" -> "worth making"

      Corrected (line 443, page 24)

      • p. 24, line -5: "predict" -> "predicted"

      Corrected (line 484, page 26)

      • p 28, line -5: Please clarify what you mean by "We doubt": are you saying that you don't think these rearrangements exist in nature? If not, then reword.

      Corrected (line 566, page 30)

      • Figure 2, panel c, "DCPred" in the legend should be "CDPred"

      Corrected

      • Figures 3 and 5: Please improve the y-axis title in panel C. "Percent" of what?

      We changed the “Percent” to “% of targets” in the revision.

      We thank the reviewer for carefully reading our manuscript!

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding proteinprotein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We have carefully revised the manuscript to address the reviewer’s concerns.

      (1) The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! The “40 sequence identity” is a widely used threshold to remove redundancy when evaluating deep-learning based protein-protein interaction and protein complex structure prediction methods, thus we also chose this threshold in our study (bioRxiv 2021.10.04.463034, Cell Syst. 2021 Oct 20;12(10):969-982.e6). In the revision, we explored whether PLMGraph-inter can keep its performance when more stringent thresholds (30%,20%,10%) is applied (line 353386 in page 20-21 in the subsection of “Ablation study” and line 762-780 in page 40 in the subsection of “Further potential redundancies removal between the training and the test”). The result shows that even when using “10% sequence identity” as the threshold, mean precisions of the predicted contacts only decreases by ~3% (Table 2).

      (2) Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-tohead scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision (Figure S1 and Figure S2 in the supplementary).

      (3) The authors claim that PLMGraph-Inter is complementary to AlphaFoldmultimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We included this comparison in the revision (Figure S7).

      (4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We analyzed the relationship between the prediction performance and the depth of MSA in the revision (Figure S4 and Line 253264 in page 15 in the subsection of “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets” and line 798-806 in page 42 in the subsection of “Calculating the normalized number of the effective sequences of paired MSA”).

      Reviewer #2 (Recommendations For The Authors):

      I have the following suggestions in addition to the public review.

      (1) Overall, the manuscript is well-written; however, I recommend a careful review for minor grammar corrections to polish the final text.

      We carefully checked the manuscript and corrected all the grammar issues and typos we found in the revision.

      (2) It would be better to indicate that single sequence embeddings, MSA embeddings, and structure embeddings are ESM-1b, ESM-MSA & PSSM, and ESM-IF when they are first mentioned in the manuscript e.g. single sequence embeddings from ESM-1b, MSA embeddings from ESM-MSA and PSSM, and structural embeddings from ESM-IF.

      We revised the manuscript according to the reviewer’s suggestion (line 86-88 in page 6; line 99-101 in page 7).

      (3) I don't think "outer concatenation" is commonly used. Please specify whether it's outer sum, outer product, or horizontal & vertical tiling followed by concatenation.

      It is horizontal & vertical tiling followed by concatenation. We clarified this in the revision (line 129-130 in page 8).

      (4) 10th sentence on the page where the Results section starts, please briefly mention what are the other 2D pairwise features.

      We clarified this in the revision (line 131-132 in page 8).

      (5) In the result section, it states edges are defined based on Ca distances, but in the method section, it says edges are determined based on heavy atom distances. Please correct one of them.

      It should be Ca distances. We are sorry for the carelessness, and we corrected this in the revision (line 646 in page 35).

      (6) For the sentence, "Where ESM-1b and ESM-MSA-1b are pretrained PLMs learned from large datasets of sequences and MSAs respectively without label supervision,", I'd suggest replacing "without label supervision" with "with masked language modeling tasks" for clarity.

      We revised the manuscript according to the reviewer’s suggestion (line 150-151 in page 9).

      (7) It would be better to briefly explain what is the dimensional hybrid residual block when it first mentioned.

      We explained the dimensional hybrid residue block when it first mentioned in the revision (line 107 in page 7).

      (8) Please include error bars for the bar plots and standard deviations for the tables.

      We thank the reviewer for the suggestion! Our understanding is the error bars and standard deviations are very informative for data which follow gaussian-like distributions, but our data (precisions of the predicted contacts) are obviously not this type. Most previous studies in protein contact prediction and inter-protein contact prediction also did not include these in their plots or tables. In our case, including these elements requires a dramatic change of the styles of our figures and tables, but we would like to not change our figures and tables too much in the revision.

      (9) Please indicate whether the chain break is considered to generate attention map features from ESM-MSA-1b. If it's considered, please specify how.

      The paired sequences were directly concatenated without using any letter to connect them, which means we did not consider chain break in generating the attention maps from ESM-MSA-1b.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript investigates the role of membrane contact sites (MCSs) and sphingolipid metabolism in regulating vacuolar morphology in the yeast Saccharomyces cerevisiae. The authors show that tricalbin (1-3) deletion leads to vacuolar fragmentation and the accumulation of the sphingolipid phytosphingosine (PHS). They propose that PHS triggers vacuole division through MCSs and the nuclear-vacuolar junction (NVJ). The study presents some solid data and proposes potential mechanisms underlying vacuolar fragmentation driven by this pathway. Although the manuscript is clear in what the data indicates and what is more hypothetical, the story would benefit from providing more conclusive evidence to support these hypothesis. Overall, the study provides valuable insights into the connection between MCSs, lipid metabolism, and vacuole dynamics.

      We thank the positive review from the Reviewer #1. We hope that our hypotheses are supported by the "Author Response to Recommendations" and by further research in the future.

      Reviewer #2 (Public Review):

      This manuscript explores the mechanism underlying the accumulation of phytosphingosine (PHS) and its role in initiating vacuole fission. The study posits the involvement of membrane contact sites (MCSs) in two key stages of this process. Firstly, MCSs tethered by tricalbin between the endoplasmic reticulum (ER) and the plasma membrane (PM) or Golgi regulate the intracellular levels of PHS. Secondly, the amassed PHS triggers vacuole fission, most likely through the nuclear-vacuolar junction (NVJ). The authors propose that MCSs play a regulatory role in vacuole morphology via sphingolipid metabolism. While some results in the manuscript are intriguing, certain broad conclusions occasionally surpass the available data. Despite the authors' efforts to enhance the manuscript, certain aspects remain unclear. It is still uncertain whether subtle changes in PHS levels could induce such effects on vacuolar fission. Additionally, it is regrettable that the lipid measurements are not comparable with previous studies by the authors. Future advancements in methods for determining intracellular lipid transport and levels are anticipated to shed light on the remaining uncertainties in this study.

      We thank the careful comment from Reviewer #2. As Reviewer #2 pointed out, the mechanism of how slight changes in PHS levels can induce the vacuolar fission event is still uncovered in this manuscript. We sincerely consider that this issue has to be resolved in further study.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigated the effects of deletion of the ER-plasma membrane/Golgi tethering proteins tricalbins (Tcb1-3) on vacuolar morphology to demonstrate the role of membrane contact sites (MCSs) in regulating vacuolar morphology in Saccharomyces cerevisiae. Their data show that tricalbin deletion causes vacuolar fragmentation possibly in parallel with TORC1 pathway. In addition, their data reveal that levels of various lipids including ceramides, long-chain base (LCB)-1P, and phytosphingosine (PHS) are increased in tricalbin-deleted cells. The authors find that exogenously added PHS can induce vacuole fragmentation and by performing analyses of genes involved in sphingolipid metabolism, they conclude that vacuolar fragmentation in tricalbin-deleted cells is due to the accumulated PHS in these cells. Importantly, exogenous PHS- or tricalbin deletion-induced vacuole fragmentation was suppressed by loss of the nucleus vacuole junction (NVJ), suggesting the possibility that PHS transported from the ER to vacuoles via the NVJ triggers vacuole fission. Of note, the authors find that hyperosmotic shock increases intracellular PHS levels, suggesting a general role of PHS in vacuole fission in response to physiological vacuolar division-inducing stimuli. This work provides valuable insights into the relationship between MCS-mediated sphingolipid metabolism and vacuole morphology. The conclusions of this paper are mostly supported by their results, but inclusion of direct evidence indicating increased transport of PHS from the ER to vacuoles via NVJ in response to vacuolar division-inducing stimuli would have strengthened this study. There is another weakness in their claim that the transmembrane domain of Tcb3 contributes to the formation of the tricalbin complex which is sufficient for tethering ER to the plasma membrane and the Golgi complex. Their claim is based only on the structural simulation, but not on by biochemical experiments such as co-immunoprecipitation and pull-down.

      We appreciate the careful feedback from Reviewer #3. We have responded in the "Recommendations to Authors" section and hope it can partially support the weakness in our claim regarding the physical interaction between Tcb1, 2, and 3.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest that the authors include some of the data (e.g., Tcb interactions) that they refer to in the response to the reviewers. I think that this could enhance the message in this manuscript. Also, maybe it's a typo and you were referring to some other image panel, but in the rebuttal letter a "Fig. S3B" is mentioned, but I could not find it.

      Following the suggestions of reviewers #1 and #3, we have added the data of co-immunoprecipitation which confirmed that Tcb3 binds to both Tcb1 and Tcb2 as Supplemental Figure 2. With this change, the person (Ms. Saku Sasaki) who performed this analysis was also added as a co-author.

      Also, we appreciate the careful remark and apologize for the mistake. In the previous Author's response, we mentioned the vacuole observation using SD medium, but this data was Fig 5C, not Fig S3B.

      Reviewer #3 (Recommendations For The Authors):

      I would recommend that the authors include the IP data mentioned in their rebuttal letter to show the interactions among Tcb1-3. Also, the authors should quantify all lipid species in Fig 5B, as shown in Fig 3A.

      Following the suggestions of reviewers #1 and #3, we have added the co-immunoprecipitation data (Fig S2). In a further study, we would like to test if the transmembrane domain of Tcb3 is sufficient for the interaction among Tcb1-3. Also, we quantified all lipid species and replaced the data in Fig 5B.

      Minor points:

      (1) The function of vps4 is not mentioned in the manuscript.

      (2) The function of Sur2p is not mentioned in the manuscript. It should be clearly mentioned that DHS is converted to PHS by Sur2p.

      (1) We have added text sections which mention that VPS4 is needed for normal ESCRT function, and its deletion is an example for inhibition of GFP-Cps1p transport into the vacuole.

      (2) We have added the text in the manuscript that states Sur2p is the hydroxylase that catalysis the conversion of DHS to PHS.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Overall, the magnitude of the effect size due to FNDC5 deficiency in both male and female mice is rather modest. Looking at the data from a qualitative perspective, it is clear that knockout females still lose bone during lactation and on the low calcium diet (LCD). It is difficult to assess the physiologic consequence of the modest quantitative 'protection' seen in FNDC5 mutants since the mutants still show clear and robust effects of lactation and LCD on all parameters measured. Similarly, the magnitude of the 'increased' cortical bone loss in FNDC5 mutant males is also modest and perhaps could be related to the fact that these mice are starting with slightly more cortical bone. Since the authors do not provide a convincing molecular explanation for why FNDC5 deficiency causes these somewhat subtle changes, I would like to offer a suggestion for the authors to consider (below, point #2) which might de-emphasize the focus of the manuscript on FNDC5. If the authors chose not to follow this suggestion, the manuscript could be strengthened by addressing the consequences of the modest changes observed in WT versus FNDC5 KO mice.

      Response: We agree that the magnitude of the effect size due to FNDC5 deficiency is modest with regards to the quantitative cortical bone parameters. However, if one examines the changes in osteocyte lacunar size and the mechanical properties of these bones, the differences are greater. As shown in Figure 3 E, the lacunar area of the WT females on a low calcium diet increases by over 30% and the KO by less than 20%, while in the males it is approximately 38% in WT compared to 46% in KO mice. According to Sims and Buenzli (PMID: 25708054) a potential total loss of ~16,000 mm3 (16 mL) of bone occurs through lactation in the human skeleton. This was based on our measurements in lactation-induced murine osteocytic osteolysis (Qing et al PMID: 22308018). They used our 2D section of tibiae from lactating mice showing an increase in lacunar size from 38 to 46 um2. In that paper we also showed that canalicular width is increased with lactation. Therefore, this would suggest a dramatic decrease in intracortical porosity due to the osteocyte lacunocanalicular system in female KO on a low calcium diet compared to WT females and a dramatic increase in KO males compared to WT males. Also, PTH was higher in the serum of female WT compared to female KO mice on a low calcium diet, the opposite for males in order to maintain normal calcium levels (See Table 1). Based on this data, using the FNDC5 null animals, we would speculate that the product of FNDC5, irisin, is having a highly significant effect on the ultrastructure of bone in both males and females challenged with a low calcium diet.

      (2) The bone RNA-seq findings reported in Figures 4-6 are quite interesting. Although Youlten et al previously reported that the osteocyte transcriptome is sex-dependent, the work here certainly advances that notion to a considerable degree and likely will be of high interest to investigators studying skeletal biology and sexual dimorphism in general. To this end, one direction for the authors to consider might be to refocus their manuscript toward sexually-dimorphic gene expression patterns in osteocytes and the different effects of LCD on male versus female mice. This would allow the authors to better emphasize these major findings, and to then use FNDC5 deficiency as an illustrative example of how sexually-dimorphic osteocytic gene expression patterns might be affected by deletion of an osteocyte-acting endocrine factor. Ideally, the authors would confirm RNA-seq data comparing male versus female mice in osteocytes using in situ hybridization or immunostaining.

      Response: Thank you for this suggestion. We have compared the different effects of LCD on male versus female mice in our revised version and have added a figure containing this information.

      (3) Along the lines of point #2 (above), the presentation of the RNA-seq studies in Figures 4-6 is somewhat confusing in that the volcano plot titles seem to be reversed. For example, Figure 4A is titled "WT M: WT F", but the genes in the upper right quadrant appear to be up-regulated in female cortical bone RNA samples. Should this plot instead be titled "WT F: WT M"? If so, then all other volcano plots should be re-titled as well.

      Response: We have now insured that the plots are appropriately labeled.

      (4) Have the authors compared male versus female transcriptomes of LCD mice?

      Response: We have now compared the male vs female transcriptomes of LCD mice and added an additional figure.

      (5) It would be appreciated if the authors could provide additional serum parameters (if possible) to clarify incomplete data in both lactation and low-calcium diet models: RANKL/OPG ratio, Ctx, PTHrP, and 1,25-dihydroxyvitamin D levels.

      Response: It is not possible to quantitate each of these as the serum has been exhausted. We have checked the RANKL/OPG ratio in the RNA seq and qPCR data using osteocyte enriched bone chips and found no difference.

      (6) Lastly, the data that overexpressing irisin improved bone properties in Fig 2G was somewhat confusing. Based on Kim et al.'s (2018) work, irisin injection increased sclerostin gene expression and serum levels, thus reducing bone formation. Were sclerostin levels affected by irisin overexpression in this study? Was irisin's role in modulating sclerostin levels attenuated with additional calcium deficiency?

      Response: We have not observed any differences in the osteocyte Sost mRNA expression between WT and KO normal and low-calcium-diet male and female mice in our RNAseq and qPCR data. As such, we did not check the Sost levels for the 2G experiment.

      Reviewer #2 (Public Review):

      Summary:

      The goal of this study was to examine the role of FNDC5 in the response of the murine skeleton to either lactation or a calcium-deficient diet. The authors find that female FNDC5 KO mice are somewhat protected from bone loss and osteocyte lacunar enlargement caused by either lactation or a calcium-deficient diet. In contrast, male FNDC5 KO mice lose more bone and have a greater enlargement of osteocyte lacunae than their wild-type controls. Based on these results, the authors conclude that in males irisin protects bone from calcium deficiency but that in females it promotes calcium removal from bone for lactation.

      While some of the conclusions of this study are supported by the results, it is not clear that the modest effects of FNDC5 deletion have an impact on calcium homeostasis or milk production.

      Specific comments:

      (1) The authors sometimes refer to FNDC5 and other times to irisin when describing causes for a particular outcome. Because irisin was not measured in any of the experiments, the authors should not conclude that lack of irisin is responsible. Along these lines, is there any evidence that either lactation or a calcium-deficient diet increases the production of irisin in mice?

      therefore we have extrapolated that the observed effects are due to a lack of circulating irisin. However, this does not rule out that Fndc5 itself could have a function, but this would have to be most likely in muscle and not in the osteocyte as we do not detect significant levels of irisin in either primary osteoblasts nor primary osteocytes compared to muscle and C2C12 cells. As such, we concluded that the phenotypical differences we saw in our experiments are due to a lack of irisin. We now address the reviewer’s point in the discussion. The measurement of irisin in the circulation with lactation or with low calcium diet of normal mice has not been performed.

      (2) The results of the irisin-rescue experiment shown in figure 2G cannot be appropriately interpreted without normal diet controls. In addition, some evidence that the AAV8-irisin virus actually increased irisin levels in the mice would strengthen the conclusion.

      Response: We do not have the normal diet controls at this time. We have quantitate tagged irisin in other AAV experiments and found highly significant expression

      (3) There is insufficient evidence to support the idea that the effect of FNDC5 on bone resorption and osteocytic osteolysis is important for the transfer of calcium from bone to milk. Previous studies by others have shown that bone resorption is not required to maintain milk or serum calcium when dietary calcium is sufficient but is critical if dietary calcium is low (Endo. 156:2762-73, 2015). To support the conclusions of the current study, it would be necessary to determine whether FNDC5 is required to maintain calcium levels when lactating mice lack sufficient dietary calcium.

      Response: We agree that it would be important to measure calcium levels in the milk to test the hypothesis that FNDC5 is important to maintain calcium levels in milk. However, as the calcium levels are normal in the serum, we are assuming they are normal in milk. This would require future experiments.

      (4) The amount of cortical bone loss due to lactation is very similar in both WT and FNDC5 KO mice. The results of the statistical analysis of the data presented in figure 1B are surprising given the very similar effect size of lactation. The key result from the 2-way ANOVA is whether there is an effect of genotype on the effect size of lactation (genotype-lactation interaction). The interaction terms were not provided. Similar concerns are noted for the results shown in figure 1G and H.

      Response: We agree, thanks. We will now add the interaction terms in the figure legends.

      (5) It is not clear what justifies the term 'primed' or 'activated' for resorption. Is there evidence that a certain level of TRAP expression lowers the threshold for osteocytic osteolysis in response to a stimulus?

      Response: The number of TRAP positive osteocytes in female KO mice are lower than in female WT. The number of TRAP positive osteocytes are lower in WT males compared to WT females. We propose that irisin plays a role in the number of TRAP positive osteocytes in normal, WT females by readying or preparing these cells to rapidly respond to low calcium. We will use the term ‘primed’ and will not use the term ‘activated’. We are open to any terminology or description as to why this is observed and what irisin could be doing to the osteocyte.

      Reviewer #3 (Public Review):

      Summary:

      Irisin has previously been demonstrated to be a muscle-secreted factor that affects skeletal homeostasis. Through the use of different experimental approaches, such as genetic knockout models, recombinant Irisin treatment, or different cell lines, the role of Irisin on skeletal homeostasis has been revealed to be more complex than previously thought and this warrants further examination of its role. Therefore, the current study sought to rigorously examine the effects of global Irisin knockout (KO) in male and female mouse bone. Authors demonstrated that in calcium-demanding settings, such as lactation or low-calcium diet, female Irisin KO mice lose less bone compared to wild-type (WT) female mice. Interestingly male Irisin KO mice exhibited worse skeletal deterioration compared to WT male mice when fed a low-calcium diet. When examined for transcriptomic profiles of osteocyte-enriched cortical bone, authors found that Irisin KO altered the expression of osteocytic osteolysis genes as well as steroid and fatty acid metabolism genes in males but not in females. These data support the authors' conclusion that Irisin regulates skeletal homeostasis in sex-dependent manner.

      Strengths:

      The major strength of the study is the rigorous examination of the effects of Irisin deletion in the settings of skeletal maturity and increased calcium demands in female and male mice. Since many of the common musculoskeletal disorders are dependent on sex, examining both sexes in the preclinical setting is crucial. Had the investigators only examined females or males in this study, the conclusions from each sex would have contradicted each other regarding the role of Irisin on bone. Also, the approaches are thorough and comprehensive that assess the functional (mechanical testing), morphological (microCT, BSEM, and histology), and cellular (RNA-seq) properties of bone.

      Weaknesses: One of the weaknesses of this study is a lack of detailed mechanistic analysis of why Irisin has a sex-dependent role on skeletal homeostasis. This absence is particularly notable in the osteocyte transcriptomic results where such data could have been used to further probe potential candidate pathways between LC females vs. LC males.

      Response: Our future studies will focus on understanding the molecular mechanism behind the sex-dependent effects of irisin. Our RNA seq data shows a significant difference in the lipid, steroid, and fat metabolism pathways between male and female mice, as well as between WT and KO mice. Future studies will focus on these pathways.

      Another weakness is authors did not present data that convincingly demonstrate that Irisin secretion is altered in the skeletal muscle between female vs. male WT mice in response to calcium restriction. The supplement skeletal muscle data only present functional and electrophysiolgical outcomes. Since Itgav or Itgb5 were not different in any of the experimental groups, it is assumed that the changes in the level of Irisin is responsible for the phenotypes observed in WT mice. Assessing Irisin expression will further strengthen the conclusion based on observing skeletal changes that occur in Irisin KO male and female mice.

      Response: The problem is that the commercial assays for irisin are not dependable, and results can differ widely across and beyond the physiologic range of 1-10 ng/ml. In part this is due to the nature of the polyclonal antibodies used and the resultant cross reactivity with other proteins. It was shown in Islam et al, 2021 (Nature Metabolism) that the commercial ELISAs were completely unreliable in mice and the only reliable method of measuring circulating irisin is mass spectrometry.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Were there any low calcium diet food intake or body weight alterations between littermates and FDNC5 KO mice?

      Response: Yes, and we can now include the body weight data and the food intake data in the supplement. We do not observe any significant difference between the groups.

      (2) In Fig 1, ideally the authors would provide the osteocyte lacunar density along with the lacunar area.

      Response: We do not observe any difference in osteocyte density in any of the groups. There is not sufficient time within 2 weeks to see a change in osteocyte density because there is no new bone formation.

      (3) What is the author's comment on the involvement of irisin on TGF-B signaling since the authors observed peri lacunar remodeling in FDNC5 KO mice? Authors should also include this in the discussion section regarding the Irisin-TGF-B signaling in terms of observed increased matrix-related signals.

      Response: Perilacunar modeling is the removal followed by the replacement of the perilacunar and pericanilucular matrix as occurs with lactation (Qing et al 2012). Osteocytic osteolysis is the first half of that process where the matrix is removed. Alliston and colleagues generated transgenic mice with reduced expression of the TGFb Type II receptor in mice by using the Dmp1-Cre (PMID: 32282961). They clearly found a significant difference in bone parameters, the appearance of the osteocyte lacunocanalicular network, and markers of the osteocyte perilacunar remodeling between the sexes, however they did not compare the lacunar remodeling process in males as compared to females. The females were subjected to lactation and were found to be resistant to osteocytic osteolysis. To compare males and females, they would have had to challenge both sexes to a high calcium demanding condition such as low calcium diet as performed in the current study. Their study does suggest that TGF is involved in the osteocytic osteolysis that occurs with lactation. However, as the null males showed an abnormal lacunocanlicular network compared to wildtype males, this does not necessarily indicate a defect in perilacunar remodeling. It is more likely that the defect occurred during bone formation when osteoblasts were differentiating into osteocytes. Therefore, we will reference this paper regarding the role of TGF in osteocytic osteolysis in females with lactation but not in the comparison of males to females. We have examined the normalized expression of TGF1, 2, and 3 in the present study and found no significant differences in TGF1 or 2 in any of the groups, but did find significantly higher expression of TGF3 in females compared to males for WT (fdr < 0.05), LCD WT (fdr < 0.05), and Control KO (p value < 0.01). Perhaps this isoform is playing a major role in osteocytic osteolysis that occurs with lactation.

      (4) Did the authors compare the transcriptomic dataset between lactated female WT vs. KO groups? Or were the RNA-seq studies only performed on LCD study samples?

      Response: We have examined RNA sequence on the LCD study samples, and not in the lactating females.

      Reviewer #2 (Recommendations For The Authors):

      Line 401 on page 14 states that the sexes respond differently to calcium deficiency. Lacunar area increases in both sexes, so the response is very similar. What appears to be different between the sexes is the role of FNDC5 in this process.

      Response: Female WT mice have higher osteocyte lacunar area at baseline with normal diet compared to WT males. With the low calcium diet, lacunar area increases in both sexes, with female WTs having a greater increase. We agree that what appears to be different between the sexes is the role of FNDC5 when challenged with high calcium demand.

      Reviewer #3 (Recommendations For The Authors):

      • The authors state in the abstract and discussion that 'We propose Irisin ensures the survival of offspring by targeting the osteocytes...'. However, this appears to be over interpretation of their findings as they have not assessed the number of offspring surviving to weaning or their growth rate between WT and KO breeders.

      Response: That was a proposal and we agree that it could be an over interpretation. However we would like to keep this as a speculation that could be tested in future studies.

      • Figures 1 and 2 should include cortical Total Area (and maybe Marrow Cavity data from Supp as well). These data will help readers to assess whether the thinning of the cortex is driven by impaired periosteal expansion or accelerated endosteal resorption (or both). Marrow cavity area data seem to suggest increased endosteal resorption (Supp. Table 2), but unclear if periosteal expansion is altered.

      Response: The data are included in the supplementary tables. We do not observe any difference in the periosteal area between the groups.

      • To further support the author's statement that male KO mice exhibit different material properties of bone compared to WT mice, estimated elastic modulus should be calculated from the stiffness data (see https://doi.org/10.1002/jbmr.2539).

      Response: We looked at the elastic modulus and it requires a stress strain curve instead of the force displacement we used in our calculations, therefore we were not able to get the estimated elastic modulus from the raw data we have.

      • In Figure 3 there is no legend indicating females or males. Based on the data and results texts it is assumed that red is Female and blue is Male. However, please confirm in the figure legend.

      Response: This is now added in the figure legends.

      • Transcriptomic data should be deposited to NCBI GEO data repository. Also, please indicate whether cutoff p-value for DEG analysis was adjusted or not.

      Response: We have submitted our data to the GEO data repository: GSE242445. Significant genes were defined as genes with p-value less than 0.01 and absolute log2 fold change larger than 1. The p-value is not adjusted. This information is now added.

      • The statistical analysis section indicates that a two-way repeated-measure ANOVA was used. However, the data presented in the study are from independent groups, in which case repeated-measure statistical approaches should not be used. Please clarify the statistical tests that were used.

      Response: We now use regular ANOVA instead of repeated-measure ANOVA. Repeated-measure ANOVA is used for paired tests. The data remain significant.

      In summary, we thank the reviewers for their very useful and thoughtful suggestions for improving our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Response to reviewer 1 comments on “weaknesses”:

      “A weakness in the approach is the use of genetic models that do not offer complete deletion of the prolactin receptor from targeted neuronal populations...”

      We acknowledge that neither model used provided a complete deletion of the prolactin receptor (Prlr) from the targeted neuronal populations. We suspect that incomplete deletion of targeted genes is not uncommon in these sort of studies, but this remains the best approach to addressing our question, and we believe we have been thorough and transparent in reporting the degree of deletion observed. We thought we had appropriately discussed the implications of the low proportion of Kiss1 cells still expressing Prlr, but will certainly revisit to ensure it is discussed thoroughly. This does not detract, however, from the key conclusion that prolactin action is necessary for full suppression of fertility in lactation in the mouse.

      “Results showing no impact of progesterone on LH secretion during lactation are surprising, given the effectiveness of progesterone-containing birth control in lactating women...”

      We think that this comment misrepresents what has been done in our study. We did not report a lack of impact of progesterone, as exogenous progesterone was never administered to mice. We did, however, give mifepristone as a progesterone receptor antagonist to determine whether endogenous progesterone contributed to the suppression of kisspeptin neuronal activity. We found that mifepristone, at levels sufficient to terminate pregnancy, had no effect on pulsatile LH secretion in lactating mice. This is consistent with our prior observation that progesterone levels are low in mouse lactation, suggesting that progesterone does not contribute significantly to the suppression of kisspeptin neuronal activity during lactation in the mouse. We agree with the reviewer that if we had given exogenous progesterone, it likely would result in suppression of pulsatile LH secretion (as it does in women). Indeed, in other work, we have found that progesterone administration profoundly suppresses activity of the kisspeptin neurons in mice (https://doi.org/10.1210/en.2019-00193). But this was not the point of the present experiment. We will review how we have described this experiment to ensure that this is absolutely clear.

      “While the authors assert their findings may reflect an important role for prolactin in lactational infertility in other mammalian species, that remains to be seen….”

      We acknowledge that our study cannot address whether prolactin is necessary for the suppression of lactation in other mammalian species. We hope our data may stimulate a re-examination of this question in other species, however, as some of the prior methodology (such as using pharmacological suppression of prolactin) may have had off target effects that confound interpretation. We thought that this point was discussed appropriately in the manuscript but we will certainly check and make sure this is addressed suitably.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep-learning models to analyse the dynamics of epithelia. In this way, they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after the healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strength:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is solid.

      Weakness:

      Some aspects of the deep-learning models remained unclear, and the authors might want to think about adding details. First of all, for readers not being familiar with deep-learning models, I would like to see more information about ResNet and U-Net, which are at the base of the new deep-learning models developed here. What is the structure of these networks?

      We agree with the Reviewer and have included additional information on page 8 of the manuscript, outlining some background information about the architecture of ResNet and U-Net models.

      How many parameters do you use?

      We apologise for this omission and have now included the number of parameters and layers in each model in the methods section on page 25.

      What is the difference between validating and testing the model? Do the corresponding data sets differ fundamentally?

      The difference between ‘validating’ and ‘testing’ the model is validating data is used during training to determine whether the model is overfitting. If the model is performing well on the training data but not on the validating data, this a key signal the model is overfitting and changes will need to be made to the network/training method to prevent this. The testing data is used after all the training has been completed and is used to test the performance of the model on fresh data it has not been trained on. We have removed refence to the validating data in the main text to make it simpler and add this explanation to the methods. There is no fundamental (or experimental) difference between each of the labelled data sets; rather, they are collected from different biological samples. We have now included this information in the Methods text on page 24.

      How did you assess the quality of the training data classification?

      These data were generated and hand-labelled by an expert with many years of experience in identifying cell divisions in imaging data, to give the ground truth for the deep learning model.

      Reviewer #1 (Recommendations For The Authors):

      You repeatedly use 'new', 'novel' as well as 'surprising' and 'unexpected'. The latter are rather subjective and it is not clear based on what prior knowledge you make these statements. Unless indicated otherwise, it is understood that the results and methods are new, so you can delete these terms.

      We have deleted these words, as suggested, for almost all cases.

      p.4 "as expected" add a reference or explain why it is expected.

      A reference has now been included in this section, as suggested.

      p.4 "cell divisions decrease linearly with time" Only later (p.10) it turns out that you think about the density of cell divisions.

      This has been changed to "cell division density decreases linearly with time".

      p.5 "imagine is largely in one plane" while below "we generated a 3D z-stack" and above "our in vivo 3D image data" (p.4). Although these statements are not strictly contradictory, I still find them confusing. Eventually, you analyse a 2D image, so I would suggest that you refer to your in vivo data as being 2D.

      We apologise for the confusion here; the imaging data was initially generated using 3D z-stacks but this 3D data is later converted to a 2D focused image, on which the deep learning analysis is performed. We are now more careful with the language in the text.

      p.7 "We have overcome (...) the standard U-Net model" This paragraph remains rather cryptic to me. Maybe you can explain in two sentences what a U-Net is or state its main characteristics. Is it important to state which class you have used at this point? Similarly, what is the exact role of the ResNet model? What are its characteristics?

      We have included more details on both the ResNet and U-Net models and how our model incorporates properties from them on Page 8.

      p.8 Table 1 Where do I find it? Similarly, I could not find Table 2.

      These were originally located in the supplemental information document, but have been moved to the main manuscript.

      p.9 "developing tissue in normal homeostatic conditions" Aren't homeostatic and developing contradictory? In one case you maintain a state, in the other, it changes.

      We agree with the Reviewer and have removed the word ‘homeostatic’.

      p.9 "Develop additional models" I think 'models' refers to deep learning models, not to physical models of epithelial tissue development. Maybe you can clarify this?

      Yes, this is correct; we have phrased this better in the text.

      p.12 "median error" median difference to the manually acquired data?

      Yes, and we have made this clearer in the text, too.

      p.12 "we expected to observe a bias of division orientation along this axis" Can you justify the expectation? Elongated cells are not necessarily aligned with the direction of a uniaxially applied stress.

      Although this is not always the case, we have now included additional references to previous work from other groups which demonstrated that wing epithelial cells do become elongated along the P/D axis in response to tension.

      p.14 "a rather random orientation" Please, quantify.

      The division orientations are quantified in Fig. 4F,G; we have now changed our description from ‘random’ to ‘unbiased’.

      p.17 "The theories that must be developed will be statistical mechanical (stochastic) in nature" I do not understand. Statistical mechanics refers to systems at thermodynamic equilibrium, stochastic to processes that depend on, well, stochastic input.

      We have clarified that we are referring to non-equilibrium statistical mechanics (the study of macroscopic systems far from equilibrium, a rich field of research with many open problems and applications in biology).

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      In general, novelty over previous work does not seem particularly important. From a methodological point of view, the models are based on generic architectures of convolutional neural networks, with minimal changes, and on ideas already explored in general. The authors seem to have missed much (most?) of the literature on the specific topic of detecting mitotic events in 2D timelapse images, which has been published in more specialized journals or Proceedings. (TPMAI, CCVPR etc., see references below). Even though the image modality or biological structure may be different (non-fluorescent images sometimes), I don't believe it makes a big difference. How the authors' approach compares to this previously published work is not discussed, which prevents me from objectively assessing the true contribution of this article from a methodological perspective.

      On the contrary, some competing works have proposed methods based on newer - and generally more efficient - architectures specifically designed to model temporal sequences (Phan 2018, Kitrungrotsakul 2019, 2021, Mao 2019, Shi 2020). These natural candidates (recurrent networks, long-short-term memory (LSTM) gated recurrent units (GRU), or even more recently transformers), coupled to CNNs are not even mentioned in the manuscript, although they have proved their generic superiority for inference tasks involving time series (Major point 2). Even though the original idea/trick of exploiting the different channels of RGB images to address the temporal aspect might seem smart in the first place - as it reduces the task of changing/testing a new architecture to a minimum - I guess that CNNs trained this way may not generalize very well to videos where the temporal resolution is changed slightly (Major point 1). This could be quite problematic as each new dataset acquired with a different temporal resolution or temperature may require manual relabeling and retraining of the network. In this perspective, recent alternatives (Phan 2018, Gilad 2019) have proposed unsupervised approaches, which could largely reduce the need for manual labeling of datasets.

      We thank the reviewer for their constructive comments. Our goal is to develop a cell detection method that has a very high accuracy, which is critical for practical and effective application to biological problems. The algorithms need to be robust enough to cope with the difficult experimental systems we are interested in studying, which involve densely packed epithelial cells within in vivo tissues that are continuously developing, as well as repairing. In response to the above comments of the reviewer, we apologise for not including these important papers from the division detection and deep learning literature, which are now discussed in the Introduction (on page 4).

      A key novelty of our approach is the use of multiple fluorescent channels to increase information for the model. As the referee points out, our method benefits from using and adapting existing highly effective architectures. Hence, we have been able to incorporate deeper models than some others have previously used. An additional novelty is using this same model architecture (retrained) to detect cell division orientation. For future practical use by us and other biologists, the models can easily be adapted and retrained to suit experimental conditions, including different multiple fluorescent channels or number of time points. Unsupervised approaches are very appealing due to the potential time saved compared to manual hand labelling of data. However, the accuracy of unsupervised models are currently much lower than that of supervised (as shown in Phan 2018) and most importantly well below the levels needed for practical use analysing inherently variable (and challenging) in vivo experimental data.

      Regarding the other convolutional neural networks described in the manuscript:

      (1) The one proposed to predict the orientation of mitosis performs a regression task, predicting a probability for the division angle. The architecture, which must be different from a simple Unet, is not detailed anywhere, so the way it was designed is difficult to assess. It is unclear if it also performs mitosis detection, or if it is instead used to infer orientation once the timing and location of the division have been inferred by the previous network.

      The neural network used for U-NetOrientation has the same architecture as U-NetCellDivision10 but has been retrained to complete a different task: finding division orientation. Our workflow is as follows: firstly, U-NetCellDivision10 is used to find cell divisions; secondly, U-NetOrientation is applied locally to determine the division orientation. These points have now been clarified in the main text on Page 14.

      (2) The one proposed to improve the quality of cell boundary images before segmentation is nothing new, it has now become a classic step in segmentation, see for example Wolny et al. eLife 2020.

      We have cited similar segmentation models in our paper and thank the referee for this additional one. We had made an improvement to the segmentation models, using GFP-tagged E-cadherin, a protein localised in a thin layer at the apical boundary of cells. So, while this is primarily a 2D segmentation problem, some additional information is available in the z-axis as the protein is visible in 2-3 separate z-slices. Hence, we supplied this 3-focal plane input to take advantage of the 3D nature of this signal. This approach has been made more explicit in the text (Pages 14, 15) and Figure (Fig. 2D).

      As a side note, I found it a bit frustrating to realise that all the analysis was done in 2D while the original images are 3D z-stacks, so a lot of the 3D information had to be compressed and has not been used. A novelty, in my opinion, could have resided in the generalisation to 3D of the deep-learning approaches previously proposed in that context, which are exclusively 2D, in particular, to predict the orientation of the division.

      Our experimental system is a relatively flat 2D tissue with the orientation of the cell divisions consistently in the xy-plane. Hence, a 2D analysis is most appropriate for this system. With the successful application of the 2D methods already achieving high accuracy, we envision that extension to 3D would only offer a slight increase in effectiveness as these measurements have little room for improvement. Therefore, we did not extend the method to 3D here. However, of course, this is the next natural step in our research as 3D models would be essential for studying 3D tissues; such 3D models will be computationally more expensive to analyse and more challenging to hand label.

      Concerning the biological application of the proposed methods, I found the results interesting, showing the potential of such a method to automatise mitosis quantification for a particular biological question of interest, here wound healing. However, the deep learning methods/applications that are put forward as the central point of the manuscript are not particularly original.

      We thank the referee for their constructive comments. Our aim was not only to show the accuracy of our models but also to show how they might be useful to biologists for automated analysis of large datasets, which is a—if not the—bottleneck for many imaging experiments. The ability to process large datasets will improve robustness of results, as well as allow additional hypotheses to be tested. Our study also demonstrated that these models can cope with real in vivo experiments where additional complications such as progressive development, tissue wounding and inflammation must be accounted for.

      Major point 1: generalisation potential of the proposed method.

      The neural network model proposed for mitosis detection relies on a 2D convolutional neural network (CNN), more specifically on the Unet architecture, which has become widespread for the analysis of biology and medical images. The strategy proposed here exploits the fact that the input of such an architecture is natively composed of several channels (originally 3 to handle the 3 RGB channels, which is actually a holdover from computer vision, since most medical/biological images are gray images with a single channel), to directly feed the network with 3 successive images of a timelapse at a time. This idea is, in itself, interesting because no modification of the original architecture had to be carried out. The latest 10-channel model (U-NetCellDivision10), which includes more channels for better performance, required minimal modification to the original U-Net architecture but also simultaneous imaging of cadherin in addition to histone markers, which may not be a generic solution.

      We believe we have provided a general approach for practical use by biologists that can be applied to a range of experimental data, whether that is based on varying numbers of fluorescent channels and/or timepoints. We envisioned that experimental biologists are likely to have several different parameters permissible for measurement based on their specific experimental conditions e.g., different fluorescently labelled proteins (e.g. tubulin) and/or time frames. To accommodate this, we have made it easy and clear in the code on GitHub how these changes can be made. While the model may need some alterations and retraining, the method itself is a generic solution as the same principles apply to very widely used fluorescent imaging techniques.

      Since CNN-based methods accept only fixed-size vectors (fixed image size and fixed channel number) as input (and output), the length or time resolution of the extracted sequences should not vary from one experience to another. As such, the method proposed here may lack generalization capabilities, as it would have to be retrained for each experiment with a slightly different temporal resolution. The paper should have compared results with slightly different temporal resolutions to assess its inference robustness toward fluctuations in division speed.

      If multiple temporal resolutions are required for a set of experiments, we envision that the model could be trained over a range of these different temporal resolutions. Of course, the temporal resolution, which requires the largest vector would be chosen as the model's fixed number of input channels. Given the depth of the models used and the potential to easily increase this by replacing resnet34 with resnet50 or resnet101 the model would likely be able to cope with this, although we have not specifically tested this. (page 27)

      Another approach (not discussed) consists in directly convolving several temporal frames using a 3D CNN (2D+time) instead of a 2D, in order to detect a temporal event. Such an idea shares some similarities with the proposed approach, although in this previous work (Ji et al. TPAMI 2012 and for split detection Nie et al. CCVPR 2016) convolution is performed spatio-temporally, which may present advantages. How does the authors' method compare to such an (also very simple) approach?

      We thank the Reviewer for this insightful comment. The text now discusses this (on Pages 8 and 17). Key differences between the models include our incorporation of multiple light channels and the use of much deeper models. We suggest that our method allows for an easy and natural extension to use deeper models for even more demanding tasks e.g. distinguishing between healthy and defective divisions. We also tested our method with ‘difficult conditions’ such as when a wound is present; despite the challenges imposed by the wound (including the discussed reduction in fluorescent intensities near the wound edge), we achieved higher accuracy compared to Nie et al. (accuracy of 78.5% compared to our F1 score of 0.964) using a low-density in vitro system.

      Major point 2: innovatory nature of the proposed method.

      The authors' idea of exploiting existing channels in the input vector to feed successive frames is interesting, but the natural choice in deep learning for manipulating time series is to use recurrent networks or their newer and more stable variants (LSTM, GRU, attention networks, or transformers). Several papers exploiting such approaches have been proposed for the mitotic division detection task, but they are not mentioned or discussed in this manuscript: Phan et al. 2018, Mao et al. 2019, Kitrungrotaskul et al. 2019, She et al 2020.

      An obvious advantage of an LSTM architecture combined with CNN is that it is able to address variable length inputs, therefore time sequences of different lengths, whereas a CNN alone can only be fed with an input of fixed size.

      LSTM architectures may produce similar accuracy to the models we employ in our study, however due to the high degree of accuracy we already achieve with our methods, it is hard to see how they would improve the understanding of the biology of wound healing that we have uncovered. Hence, they may provide an alternative way to achieve similar results from analyses of our data. It would also be interesting to see how LTSM architectures would cope with the noisy and difficult wounded data that we have analysed. We agree with the referee that these alternate models could allow an easier inclusion of difference temporal differences in division time (see discussion on Page 20). Nevertheless, we imagine that after selecting a sufficiently large input time/ fluorescent channel input, biologists could likely train our model to cope with a range of division lengths.

      Another advantage of some of these approaches is that they rely on unsupervised learning, which can avoid the tedious relabeling of data (Phan et al. 2018, Gilad et al. 2019).

      While these are very interesting ideas, we believe these unsupervised methods would struggle under the challenging conditions within ours and others experimental imaging data. The epithelial tissue examined in the present study possesses a particularly high density of cells with overlapping nuclei compared to the other experimental systems these unsupervised methods have been tested on. Another potential problem with these unsupervised methods is the difficulty in distinguishing dynamic debris and immune cells from mitotic cells. Once again despite our experimental data being more complex and difficult, our methods perform better than other methods designed for simpler systems as in Phan et al. 2018 and Gilad et al. 2019; for example, analysis performed on lower density in vitro and unwounded tissues gave best F1 scores for a single video was 0.768 and 0.829 for unsupervised and supervised respectively (Phan et al. 2018). We envision that having an F1 score above 0.9 (and preferably above 0.95), would be crucial for practical use by biologists, hence we believe supervision is currently still required. We expect that retraining our models for use in other experimental contexts will require smaller hand labelled datasets, as they will be able to take advantage of transfer learning (see discussion on Page 4).

      References :

      We have included these additional references in the revised version of our Manuscript.

      Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1), 221-231. >6000 citations

      Nie, W. Z., Li, W. H., Liu, A. A., Hao, T., & Su, Y. T. (2016). 3D convolutional networks-based mitotic event detection in time-lapse phase contrast microscopy image sequences of stem cell populations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 55-62).

      Phan, H. T. H., Kumar, A., Feng, D., Fulham, M., & Kim, J. (2018). Unsupervised two-path neural network for cell event detection and classification using spatiotemporal patterns. IEEE Transactions on Medical Imaging, 38(6), 1477-1487.

      Gilad, T., Reyes, J., Chen, J. Y., Lahav, G., & Riklin Raviv, T. (2019). Fully unsupervised symmetry-based mitosis detection in time-lapse cell microscopy. Bioinformatics, 35(15), 2644-2653.

      Mao, Y., Han, L., & Yin, Z. (2019). Cell mitosis event analysis in phase contrast microscopy images using deep learning. Medical image analysis, 57, 32-43.

      Kitrungrotsakul, T., Han, X. H., Iwamoto, Y., Takemoto, S., Yokota, H., Ipponjima, S., ... & Chen, Y. W. (2019). A cascade of 2.5 D CNN and bidirectional CLSTM network for mitotic cell detection in 4D microscopy image. IEEE/ACM transactions on computational biology and bioinformatics, 18(2), 396-404.

      Shi, J., Xin, Y., Xu, B., Lu, M., & Cong, J. (2020, November). A Deep Framework for Cell Mitosis Detection in Microscopy Images. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 100-103). IEEE.

      Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A. V., Louveaux, M., ... & Kreshuk, A. (2020). Accurate and versatile 3D segmentation of plant tissues at cellular resolution. Elife, 9, e57613.

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      This important study used Voltage Sensitive Dye Imaging (VSDI) to measure neural activity in the primary visual cortex of monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. The authors show convincingly that the initial effect of the mask ran counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. They interpret these results in terms of influences from the receptive field center, and although an alternative view that emphasizes the role of the receptive field surround also seems reasonable, this study stands as an interesting and important contribution to our understanding of mechanisms of visual perception.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings.

      The authors have done a good job responding to the main points of my previous review. One important question remains, as stated in that review:

      "My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry at al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here."

      Their rebuttal of my first review is not convincing -- I still believe that surround influences are important and perhaps predominant in determining the outcome of the experiments. This is particularly clear for the "paradoxical" dynamics that they observe, which seem exactly to reflect the behavior of the surround.

      The authors' arguments to the contrary are based on three main points. First, their stimuli cover the center and surround, unlike those of many previous experiments, so they argue that this somehow diminishes the impact of the surround. But the argument is not accompanied by data showing the effects of center stimuli alone or surround stimuli alone. Second, their model -- a normalization model -- does not need surround influences to account for the masking effect. Third, they cite human psychophysical masking results from their collaborators (Sebastian et al 2017), but do not cite an equally convincing demonstration that surround contrast creates potent orientation selective masking when presented alone (Petrov et al 2005, https://doi.org/10.1523/JNEUROSCI.2871-05.2005).

      At the end of the day, these issues will be resolved by further experiments, not argumentation. The paper stands as an excellent contribution, but it might be wise for the authors to be less doctrinaire in their interpretations.

      We thank the reviewer for their positive comments and constructive criticism. In general, we agree with the reviewer’s comments. Importantly, we do not claim that there is no effect from the surround. What we say in the discussion is:

      “Because our targets are added to the background rather than occluding it, it is likely that a significant portion of the behavioral and neural masking effects that we observe come from target-mask interactions at the target location rather than from the effect of the mask in the surround.”

      We still stand by this assessment. We also make the point that, at least within the framework of our delayed normalization model, there is no need for the normalization mechanism to extend beyond the center mechanism to account for our results, and even if the normalization mechanism is somewhat larger than the center, the overlap region at the center would still have a large contribution to the modulations. Overall, we agree that these issues will be need to be resolved by future experiments.

      For the reasons discussed in our previous reply, we disagree with the reviewers’ statement “…this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature”. For similar reasons we disagree with the statement “It is nice to see that VSDI results square well with those from prior extracellular recordings”.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      We thank the reviewer for their positive comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      None, except perhaps for a more balanced representation of the "surround" possibility in the Discussion. The Petrov et al paper (https://doi.org/10.1523/JNEUROSCI.2871-05.2005) should be considered and cited.

      As discussed above, we believe that our discussion of possible contribution from the surround is balanced. While the paper by Petrov et al is interesting, the stimuli used to study the surround effects are quite different (e.g., gap between center and surround, and the sharp edge of the surround inner boundary) so direct comparison with our results is not possible.

      Reviewer #2 (Recommendations For The Authors):

      The authors have addressed the questions/suggestions I raised in my review.


      The following is the authors’ response to the original reviews.

      We thank the reviewers for their helpful comments and suggestions.

      eLife assessment

      This is an important contribution that extends earlier single-unit work on orientation-specific center-surround interactions to the domain of population responses measured with Voltage Sensitive Dye (VSD) imaging and the first to relate these interactions to orientation-specific perceptual effects of masking. The authors provide convincing evidence of a pattern of results in which the initial effect of the mask seems to run counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. It seems likely that the physiological effects of masking reported here can be attributed to previously described signals from the receptive field surround.

      We thank the reviewers for bringing up the relation of our results to findings from previous orientation-specific center-surround interactions studies. In our final manuscript, we added a paragraph discussing this important issue. Briefly, for multiple reasons, we believe that the orientation-dependent behavioral and neural masking effects that we observe are unlikely to depend on previously described center-surround interactions in V1. First, in human subjects, perceptual similarity masking effects are almost entirely accounted for by target-mask interactions at the target location and are recapitulated when the mask has the same size and location as the target (Sebastian et al 2017). Second, in our computational model, the effect of mask orientation on the dynamics of the response are qualitatively the same if the mask is restricted to the size and location of the target while mask contrast is increased (Fig. 8 – figure supplement 3). Third, in our model, the results are qualitatively the same when the spatial pooling region for the normalization signal is the same as that for the excitation signal (Fig. 8 – figure supplement figure 1). These considerations suggest that center-surround interactions may not be necessary for neural and behavioral similarity masking effects with additive targets.

      We would also like to point out some key differences between the stimuli that we use and the ones used in most previous center-surround studies. First, in our experiments, the target and the mask were additive, while in most previous center-surround studies the target occludes the background. Such studies therefore restrict the mask effect to the surround, while in our study we allow target-mask interactions at the center. Second, most center-surround studies have a sharp-edged target/surround, while in our experiments no sharp edges were present. Unpublished results from our lab suggest that such sharp edges have a large impact on V1 population responses. A third key difference is that our stimuli were flashed for a short interval of 250 ms corresponding to a typical duration of a fixation in natural vision, while most previous center-surround studies used either longer-duration drifting stimuli or very short-duration random-order stimuli for reverse-correlation analysis.

      In addition, we would like to emphasize that our results go beyond previous studies in two important ways. First, we study the effect of similarity masking in behaving animals and quantitatively compare the effect of similarity masking on behavior and physiology in the same subjects and at the same time. Second, VSD imaging allows us to capture the dynamics of superficial V1 population responses over the entire population of millions of neurons activated by the target at two important spatial scales. Such results therefore complement electrophysiological studies that examine the activity of a very small subset of the active neurons.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings. But the work may be less original than the authors propose, and their overall framing strikes me as odd. Some additional clarifications could make the contribution more clear.

      Please see our reply above regarding the agreement with previous studies and framing.

      My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry et al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here.

      We thank the reviewer for the pointing out this previous work which we now cite in the final version of the manuscript. For the reasons discussed above, while this study is interesting and related to our work, we believe that our results are quite distinct.

      • In the discussion (lines 315-316), they state "in order to account for the reduced neural sensitivity with target-background similarity in the second phase of the response, the divisive normalization signal has to be orientation selective." I wonder whether they observed this in their modeling. That is, how robust were the normalization model results to the values of sigma_e and sigma_n? It would be useful to know how critical their various model parameters were for replicating the experimental effects, rather than just showing that a good account is possible.

      Thank you for this suggestion. In the final manuscript we include a supplementary figure that shows how the model’s predictions are affected by the orientation tuning and spatial extent of the normalization signal, and by the size and contrast of the mask (Fig. 8 – figure supplement 1-4).

      • The majority of their target/background contrast conditions were collected only in one animal. This is a minor limitation for work of this kind, but it might be an issue for some.

      We agree that this is a limitation of the current study. These are challenging experiments and we were unable to collect all target/background contrast combinations from both monkeys. However, in the common conditions, the results appear similar in the two animals, and the key results seem to be robust to the contrast combination in the animal in which a wider range of contrast combinations was tested. We added these points to the discussion in the final manuscript.

      • The authors point out (line 193-195) that "Because the first phase of the response is shorter than the second phase, when V1 response is integrated over both phases, the overall response is positively correlated with the behavioral masking effect." I wonder if this could be explored a bit more at the behavioral level - i.e. does the "similarity masking" they are trying to explain show sensitivity to presentation time?

      We agree that testing the effect of stimulus duration on similarity masking is interesting, but unfortunately, it is beyond the scope of the current study. We would also like to point out that the duration of the presentation was selected to match the typical time of fixation during natural behaviors, so much shorter or much longer stimulus durations would be less relevant for natural vision.

      • From Fig. 3 it looks like the imaging ROI may include some opercular V2. If so, it's plausible that something about the retinotopic or columnar windowing they used in analysis may remove V2 signals, but they don't comment. Maybe they could tell us how they ensured they only included V1?

      We thank the reviewer for this comment. As part of our experiments, we extract a detailed retinotopic map for each chamber, so we were able to ensure that the area used for the decoding analysis lays entirely within V1. We now incorporate this information in the final manuscript (Fig. 3 – figure supplement 1).

      • In the discussion (lines 278-283) they say "The positive correlation between the neural and behavioral masking effects occurred earlier and was more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance in our task is dominated by columnar scale signals in the second phase of the response. To the best of our knowledge, this is the first demonstration of such decoupling between V1 responses at the retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better predictor of behavioral performance in a detection task." I am having trouble finding where exactly they demonstrate this in the results. Is this just by comparison of Figs. 4E,K and 5E,K? I may just be missing something here, but the argument needs to be made more clearly since much of their claim to originality rests on it.

      We thank the reviewer for this comment. In the final manuscript we are more explicit when we discuss this point and refer to the relevant panels in Figs. 4, 5 and their figure supplements. To substantiate this key claim, we also report the timing of the transition between the two phases in all temporal correlation panels and report the neural-behavioral correlation for the integration period.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      Points to Consider / Possible Improvements

      The biphasic nature of the relationship between neural and behavioral modulation by the mask and the surprising finding that the two are anticorrelated in the initial phase are left as a mystery. The paper would be more impactful if this mystery could be resolved.

      We thank the reviewer for the positive comments. In our view, while our results are surprising, there may not be a remaining mystery that needs to be resolved. As our model shows, the biphasic nature of V1’s response can be explained by a delayed orientation-tuned gain control. Our results are consistent with the hypothesis that perception is based on columnar-scale V1 signals that are integrated over an approximately 200 ms long period that incorporates both the early and the late phase of the response, since such decoded V1 signals are positively correlated with the behavioral similarity masking effect (Fig. 5D, J; Fig. 5 – figure supplement 1). We now explain this more clearly in the discussion of our final manuscript.

      The finding is based on analyses of the correlation between behavior and neural responses. This appears in the main body of the manuscript and is detailed in Figures S1 and S2, which show the correlation over time between behavior and target response for the retinotopic and columnar scale.

      One possible way of thinking of this transition from anti- to positive correlation with behavior is that it might reflect the dynamics of a competitive interaction between mask and target, with the initial phase reflecting predominantly the mask response, with the target emerging, on some trials, in the latter phase. On trials when the mask response is stronger, the probability of the target emerging in the latter phase, and triggering a hit, might be lower, potentially explaining the anticorrelation in the initial phase. The sustained response may be a mixture of trials on which the target response is or is not strong enough to overcome the effect of the mask sufficiently to trigger target detection.

      It would, I think, be worth examining this by testing whether target dynamics may vary, depending on whether the monkey detected the target (hit trials) or failed to detect the target (miss trials). Unless I missed it I do not think this analysis was done. Consistent with this possibility, the authors do note (lines 226-229) that "The trajectories in the target plus mask conditions are more complex. For example, when mask orientation is at +/- 45 deg to the target, the population response is initially dominated by the mask, but then in mid-flight, the population response changes direction and turns toward the direction of the target orientation." This suggests (to this reviewer, at least) that the emergence of a positive correlation between behavioral and neural effects in the latter phase of the response could reflect either a perceptual decision that the target is present or perhaps deployment of attention to the location of the target.

      It may be that this transition reflected detection, in which it might be more likely on hit trials than miss trials. Given the SNR it would presumably be difficult to do this analysis on a trial-by-trial basis, but the hit and miss trials (which make each make up about 1/2 of all trials) could be averaged separately to see if the mid-flight transition is more prominent on hit trials. If this is so for the +/- 45 degree case it would be good to see the same analysis for other combinations of target and mask. It would also be interesting to separate correct reject trials from false alarms, to determine whether the mid-flight transition tends to occur on false alarm trials.

      If these analyses do not reveal the predicted pattern, they might still merit a supplemental figure, for the sake of completeness.

      We thank the reviewer for suggesting this interesting possibility. The original analysis in the manuscript was based on both correct and incorrect trials, raising the possibility that our results reflect some contribution from decision- and/or attention-related signals rather than from low-level nonlinear encoding mechanisms in V1 that we postulate in our model (Fig. 8). To explore this possibility, we re-examined our results while excluding error trials. We found that our key results from Figs 4 and 5 – namely that there is an early transient phase in which the neural and behavioral similarity effects are anti-correlated, and a later sustained phase in which they are positively correlated – hold even for the subset of correct trials, reducing the possibility that decision/attention-related signals play a major role in explaning our results. We now include the results of this analysis as a supplementary figure in the final manuscript (Fig. 4 – figure supplement 2). While there may be some interesting differences in the response dynamics between correct and incorrect trials, the current study was not designed to address this question and the large number of conditions and small number of repeats that it necessitated make this data set suboptimal for examining these phenomena.

      References

      Sebastian S, Abrams J, Geisler WS. 2017. Constrained sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci U S A 114: E5731-e40

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to express our sincere appreciation for the invaluable comments provided by the reviewers and their constructive suggestions to enhance the quality of our manuscript. In response to their feedback, we have diligently revised and resubmitted our paper as an article, introducing five primary figures, seven supplementary figures, and two supplementary data files. Importantly, this work represents a noteworthy contribution to the field, presenting novel findings for the first time without any prior publication.

      Within the enclosed document, we have provided a comprehensive response to the reviewer comments, addressing each point in a meticulous and specific manner. We extend our sincere gratitude to the reviewers for their diligent examination of our manuscript and for offering insightful recommendations.

      In our latest revision, we have taken great care to respond to every reviewer's comment, ensuring that we clarify the manuscript and provide robust evidence where required. The primary focus of these revisions was to provide additional context regarding the cooperative role between PR-Set-7 and PARP-1 in the repression of metabolic genes, accompanied by a thorough description of the current state of the field. Substantial modifications and new analyses, presented in the supplemental figures, have been included to comprehensively address this concern.

      Another concern raised was regarding the interaction between PARP-1 and mono-methylated active histone marks, which was not adequately described in the previous version of our manuscript. In this revised version, we have updated our Fig. 1 and Supplemental Fig. S1 and introduced Supplemental Fig. S2 to properly demonstrate that PARP-1 binds to all mono-methylated active histone marks tested. Furthermore, we extensively revised the Discussion section of our manuscript to discuss the implications of this discovery and how it fits into the broader context of PARP-1 research.

      Addressing another reviewer's concern about the potential indirect regulation of transcription by PARP1 and PR-SET7, we revised the discussion section and incorporated findings from our recent study. These findings clearly demonstrate PARP1's binding to the loci of misregulated genes, suggesting a direct involvement in their regulation.

      Furthermore, we have improved the description of the reagents and Drosophila lines used in this study to provide a more comprehensive understanding for readers. Finally, we conducted a comprehensive revision of the entire manuscript to rectify the identified typos and grammatical errors.

      Enclosed, you will find a detailed, point-by-point response to each of the reviewer's comments, showcasing our commitment to addressing their concerns with precision.

      We firmly believe that our revisions successfully resolve all the concerns raised by the reviewers, and we are confident that this improved version of our manuscript contributes significantly to the scientific discourse.

      Reviewer #1:

      The study investigates the role of PARP-1 in transcriptional regulation. Biochemical and ChIP-seq analyses demonstrate specific binding of PARP-1 to active histone marks, particularly H4K20me, in polytene chromosomes of Drosophila third instar larvae. Under heat stress conditions, PARP-1's dynamic repositioning from the Hsp70 promoter to its gene body is observed, facilitating gene activation. PARP-1, in conjunction with PR-Set7, plays a crucial role in the activation of Hsp70 and a subset of heat shock genes, coinciding with an increase in H4K20me1 levels at these gene loci. This study proposes that H4K20me1 is a key facilitator of PARP-1 binding and gene regulation. However, there are several critical concerns that are yet to be addressed. The experimental validation and demonstration of results in the main manuscript are scant. Recent developments in the area are omitted, as an important publication hasn't been discussed anywhere in the work (PMID: 36434141). The proposed mechanism operates quite selectively, and any extrapolations require intensive scientific evidence.

      Major Comments:

      (1) PARP1 hypomorphic mutant validation data must be provided at RNA levels as the authors have mentioned about its global reduction in RNA levels.

      We sincerely appreciate Reviewer 1 for their meticulous review of our manuscript and for providing valuable insights. In response to the raised concern, we would like to highlight that the validation data for the PARP1 hypomorphic mutant at the RNA level has been previously documented in our study (PMID: 20371698), where we found that PARP1 RNA level was deeply impacted in parp1C03256. To enhance clarity, we have made corresponding modifications to the Materials and Methods section to explicitly articulate this aspect: parp-1C03256 significantly lowers the level of PARP-1 RNA and protein level (14) but also significantly diminishes the level of pADPr (11).

      We hope these revisions effectively address the reviewer's suggestion and contribute to a more comprehensive understanding of our findings.

      (2) The authors should provide immunoblot data for global Poly (ADP) ribosylation levels in PARP1 hypomorphic mutant condition as compared to the control. They must also provide the complete details of the mouse anti-pADPr antibody used in their immunoblot in Figure 5B.

      We extend our gratitude to Reviewer 1 for drawing attention to aspects requiring further clarification. In response to the inquiry about global Poly (ADP) ribosylation levels in the PARP1 hypomorphic mutant condition, we want to emphasize that our study extensively reported on the diminished levels of pADPr in comparison to the wildtype, as documented in our previous work (PMID: 21444826). To address this, we have incorporated pertinent details in the Materials and Methods section, providing a comprehensive account of our findings. parp-1C03256 significantly lowers the level of PARP-1 RNA and protein level (14) but also significantly diminishes the level of pADPr (11).

      Furthermore, in addressing the request for complete details of the mouse anti-pADPr antibody (10H) used in Figure 5B, we have taken steps to enhance transparency. The Materials and Methods section has been revised to incorporate more comprehensive information about the antibody, ensuring a clearer understanding of our experimental procedures. anti-pADPr (Mouse monoclonal, 1:500, 10H - sc-56198, Santa Cruz).

      We appreciate the reviewer's diligence in ensuring the robustness of our methodology, and we believe these modifications strengthen the overall quality and transparency of our study.

      (3) PR-Set7 mutant validation results should be provided in the main manuscript, as done by the authors using qRT-PCR. Also, immunoblot data for the PR-set7 null condition should be supplemented in the main manuscript as the authors have already mentioned their anti-PR-Set7 (Rabbit, 1:1000, Novus Biologicals, 44710002) antibody in the materials and methods section.

      We appreciate Reviewer 1's thorough examination of our manuscript and their constructive feedback. The pr-set7 null mutant has been rigorously characterized in a study conducted by Dr. Ruth Steward's laboratory (PMID: 15681608). Additionally, we employed our PR-SET7 antibody to validate the mutant, and the corresponding data can be found in Supplemental Figure 3. To enhance clarity, we have made necessary modifications to both the results and Materials and Methods sections, providing explicit details on the validation process. Result section: To validate our hypothesis, we initially confirmed that the pr-set720 mutant not only eliminated PR-SET7 RNA and protein but also abrogated H4K20me1 modification (Supplemental Fig.S3).

      Material and methods section: The pr-set720 null mutant was validated in (15) and we confirmed that this mutant abolishes PR-SET7 RNA and protein level but also leads to the absence of H4K20me1 (Supplemental Fig. S3).

      We believe these revisions address the reviewer's concerns and contribute to a more comprehensive presentation of our study.

      (4) The authors have probably missed out on a very important recent report (PMID: 36434141), suggesting the antagonistic nature of the PARP1 and PR-SET7 association. In light of these important observations, the authors must check for the levels of PR-SET7 in PARP1 hypomorphic conditions.

      We appreciate the insightful comment from Reviewer 1, drawing our attention to the recent study by Estève et al. (PMID: 36434141) highlighting the potential antagonistic relationship between PARP1 and PR-SET7. To address this important point, we have carefully examined the levels of PR-SET7 in PARP1 hypomorphic conditions.

      In response to this concern, we have added two new supplemental figures, Supplemental Fig. S4 and S5, which specifically address the impact of PARP1 deficiency on PR-Set7 expression. These figures clearly demonstrate that there were no significant changes observed in PR-SET7 RNA (Fig. S4) or protein levels (Fig. S5) in the absence of Parp1. This finding supports the conclusion that Parp1 is not directly involved in the regulation of PR-SET7 in Drosophila.

      Furthermore, we have updated the Results section to explicitly mention this observation:

      Interestingly, in the absence of PARP-1, neither PR-SET7 RNA nor protein levels were affected (Supplemental Fig. S4-5), indicating that PARP-1 is not directly implicated in the regulation of PR-SET7.

      Additionally, we have included information about the anti-H3 antibody used in Supplemental Fig. S4 in the Materials and Methods section: anti-H3 (Rabbit polyclonal, 1/1000, FL-136 sc-10809 Santa Cruz).

      We believe that these modifications effectively address the raised concern and provide a more comprehensive understanding of the relationship between PARP1 and PR-SET7 in our study. We hope these clarifications enhance the overall robustness and clarity of our findings.

      (5) Also, the results of the aforementioned study should be adequately discussed in the present study along with its implications in the same.

      We appreciate Reviewer 1's valuable suggestion to discuss the implications of the study by Estève et al. (PMID: 36434141) within the context of our own findings. Estève et al. reported a potential antagonistic relationship between PARP1 and PR-SET7, showing that a decrease in PARP1 proteins leads to an increase in PR-SET7 protein levels. In our investigation, however, we did not observe significant changes in PR-SET7 RNA and protein levels in the parp1C03256 mutant, as demonstrated in the newly added Supplemental Fig. S3 and S4.

      We acknowledge the discrepancy between our results and those of Estève et al., and we propose that this difference may be due to distinct experimental approach: Estève et al.'s study focused on mammalian cell populations and in vitro experiments, whereas our investigation employed Drosophila third-instar larvae as the whole organism model. It is plausible that regulatory mechanisms governing PR-SET7 differ between mammals and Drosophila. Another possibility is that PARP-1 may cooperate with PR-SET7 in the context of Drosophila development but could exhibit antagonistic roles against PR-SET7 in specific cell lines and under certain biological or developmental conditions.

      In the Discussion section, we have incorporated this information, stating: A recent study demonstrated that in human cells overexpressing PARP-1, PR-SET7/SET8 is degraded (33). This implies that the absence of PARP-1 might lead to increased levels of PR-SET7. However, in our study involving parp-1 mutant in Drosophila third-instar larvae, we observed a slightly different scenario: we detected a minor but not significant reduction in both PR-SET7 RNA and protein levels (Supplemental Fig.S4 and S5). This outcome stands in stark contrast to the previous study's findings. The discrepancy could be due to the distinct experimental approaches used: the previous research focused on mammalian cells and in vitro experiments, whereas our study examined the functions of PARP-1 in whole Drosophila third-instar larvae during development. Consequently, while PARP-1 may cooperate with PR-SET7 in the context of Drosophila development, it could exhibit antagonistic roles against PR-SET7 in specific cell lines and under certain biological or developmental conditions.

      We believe these modifications provide a comprehensive discussion of the observed discrepancies and enhance the overall interpretation of our findings. We hope that these clarifications satisfactorily address the concerns raised by Reviewer 1.

      (6) Gene transcriptional activation requires open chromatin and RNA polymerase II binding to the promoter. Since, differentially expressed genes in both PR-Set7 null and PARP1 hypomorph mutants, co-enriched with PARP-1 and H4K20me1 were mainly upregulated, the authors should provide RNA polymerase II occupancy data of these genes via RNA-Pol II ChIP-seq to further attest their claims.

      We appreciate the insightful comment from Reviewer 1 regarding the necessity for RNA-polymerase II (PolII) occupancy data to further support our claims on gene transcriptional activation. To address this concern, we conducted an analysis of PolII occupancy around genes co-enriched with PARP-1 and H4K20me1 that are upregulated in both pr-set720 and parp-1C03256 mutants during the third instar larvae stage. The results of this analysis have been included in the newly added supplemental Fig. S5.

      Our findings reveal that these upregulated genes exhibit higher PolII occupancy compared to other genes, both at their promoter regions and gene bodies, suggesting heightened activity during third instar larval stage in wild type animals (Supplemental Fig. S6). To further validate these results, we cross-referenced publicly available RNA-seq data at the same developmental stage, confirming that, on average, these upregulated genes display a 40% higher expression compared to other genes (supplemental Fig. S6B).

      Moreover, we would like to highlight the consistency of our current findings with our previous study (PMID: 38012002), where we reported the critical involvement of PARP-1 in tempering the expression of active metabolic genes at the end of the third instar larvae. The current data, suggesting a role for PR-SET7 in this regulatory process, adds another layer to our understanding of the nuanced control exerted by PARP-1 on the expression of active metabolic genes during this critical developmental transition.

      In light of these results, we have modified the Results section to emphasize these findings: Intriguingly, under wild-type conditions, these genes displayed expression levels approximately 40% higher than the average and demonstrated increased RNA-Polymerase II occupancy both at their promoter regions and gene bodies compared to other genes (supplemental Fig.S6), indicating their high activity in wild type context.

      Additionally, we have incorporated this information into the Discussion section to underscore the cooperative role of PARP-1 and PR-SET7 in repressing the expression of active metabolic genes: Notably, genes co-enriched with PARP-1 and H4K20me1, and are upregulated in both parp-1C03256 and pr-set720 mutants, are predominantly metabolic genes exhibiting high expression levels under wild-type conditions and a high occupancy of polymerase II both at their promoter region and gene body (Supplemental Fig. S6). In our previous study, we discovered that PARP-1 plays a crucial role in repressing highly active metabolic genes during the development of Drosophila by binding directly to their loci (34). Also, PARP-1 is required for maintaining optimum glucose and ATP levels at the third-instar larval stage (34). During Drosophila development, repression of metabolic genes is crucial for larval to pupal transition (35, 36). This repression is linked to the reduced energy requirements as the organism prepares for its sedentary pupal stage (35, 37). Notably, we observed that PARP-1 shows a high affinity for binding to the gene bodies of these metabolic genes (34).

      Our data indicates that in both parp-1 and pr-set7 mutant animals, there was a preferential repression of metabolic genes at sites where PARP-1 and H4K20me1 are co-bound (Fig.3E), while these metabolic genes are highly active during third-instar larval stage (Supplemental Fig.S6). Thus, we propose that the presence of H4K20me1 may be essential for the binding of PARP-1 at these gene bodies, contributing to their repression. Importantly, this mechanism of gene repression has broader developmental implications. As earlier stated, mutant animals lacking functional PARP-1 and PR-SET7 undergo developmental arrest during larval to pupal transition. This arrest could be directly linked to the disruption of the normal metabolic gene repression during development. Without the repressive action of PARP-1 and PR-SET7, key metabolic processes might remain unchecked, leading to metabolic imbalances that are incompatible with the normal progression to the pupal stage.

      Finaly, we have updated the Materials and Methods section to include information about the RNA-seq and PolII ChIP-seq datasets used: GSE15292 (RNA-polymerase II). In addition, we used the Developmental time-course RNA-seq dataset (54), SRP001065.

      We believe that these modifications comprehensively address Reviewer 1's concern and provide a more robust foundation for our claims regarding the role of PARP-1 and PR-SET7 in the transcriptional regulation of co-enriched genes during the critical developmental transition.

      (7) As discussed in Figure 4, the authors found transcriptional activation of group B genes even after a significant reduction of H3K20me1 in their gene body after heat shock. Given the dynamic equilibrium shift in epigenetic marks that regulate gene expression and their locus-specific transcriptional regulation, the authors should further look for the enrichment of other epigenetic marks and even H4K20me1 specific demethylases such as PHF8 (PMID: 20622854), and their cross-talk with PARP1 to further bridge the missing links of this tale. This will add more depth to this work.

      We appreciate the thoughtful input provided by Reviewer 1 and acknowledge the importance of exploring additional epigenetic marks and potential cross-talk association with PARP1 to enhance the depth of our study. Our investigation has primarily focused on the interplay between PR-SET7/H4K20me1 and PARP-1, as evidenced by the colocalization and robust binding affinity observed between PARP-1 and H4K20me1 (Fig 1C, 2B, and 3A). This interaction is particularly noteworthy in the context of regulating specific heat shock genes, as highlighted in Figure 4A. While we recognize the potential significance of examining a broader spectrum of epigenetic marks and considering the involvement of specific demethylases, such as PHF8 (PMID: 20622854), in this regulatory network, our research strategy is intentionally tailored to leverage the unique characteristics of the PR-SET7/H4K20me1 and PARP-1 interplay in Drosophila. A key consideration is the technical advantage afforded by the fact that PR-SET7 is the exclusive methylase responsible for H4K20 in Drosophila (PMID: 15681608), allowing for specific depletion of H4K20me1 without the confounding influence of other methyltransferases.

      This specificity is pivotal, especially given the similar developmental arrest patterns observed in both PR-SET7 and PARP-1 mutants. Such parallel phenotypes provide a distinct opportunity to delve deeply into the intricacies of their interaction during organismal development and in response to heat stress. Additionally, the identity of the demethylase for H4K20me1 in Drosophila remains unknown, further underscoring the rationale for our focused approach.

      While we acknowledge the broader implications of exploring additional epigenetic marks, we believe that our deliberate focus on the PR-SET7/H4K20me1 and PARP-1 pathway provides a unique and valuable perspective on the regulation of gene expression in Drosophila. We hope that this clarification addresses the concerns raised by Reviewer 1 and conveys the rationale behind our chosen research strategy.

      Reviewer #2:

      Summary:

      This study from Bamgbose et al. identifies a new and important interaction between H4K20me and Parp1 that regulates inducible genes during development and heat stress. The authors present convincing experiments that form a mostly complete manuscript that significantly contributes to our understanding of how Parp1 associates with target genes to regulate their expression.

      Strengths:

      The authors present 3 compelling experiments to support the interaction between Parp1 and H4K20me, including:

      (1) PR-Set7 mutants remove all K4K20me and phenocopy Parp mutant developmental arrest and defective heat shock protein induction.

      (2) PR-Set7 mutants have dramatically reduced Parp1 association with chromatin and reduced poly-ADP ribosylation.

      (3) Parp1 directly binds H4K20me in vitro.

      Weaknesses:

      (1) The histone array experiment in Fig1 strongly suggests that PARP binds to all mono-methylated histone residues (including H3K27, which is not discussed). Phosphorylation of nearby residues sometimes blocks this binding (S10 and T11 modifications block binding to K9me1, and S28P blocks binding to K27me1). However, H3S3P did not block H3K4me1, which may be worth highlighting. The H3K9me2/3 "blocking effect" is not nearly as strong as some of these other modifications, yet the authors chose to focus on it. Rather than focusing on subtle effects and the possibility that PARP "reads" a "histone code," the authors should consider focusing on the simple but dramatic observation that PARP binds pretty much all mono-methylated histone residues. This result is interesting because nucleosome mono-methylation is normally found on nucleosomes with high turnover rates (Chory et al. Mol Cell 2019)- which mostly occurs at promoters and highly transcribed genes. The author's binding experiments could help to partially explain this correlation because PARP could both bind mono-methylated nucleosomes and then further promote their turnover and lower methylation state.

      We appreciate the comprehensive review and valuable insights provided. In response to the comments, we have made substantial revisions to address the concerns and enhance the clarity of our findings. In Figure 1B, C, D, F, and G, we have expanded our data presentation to demonstrate PARP-1's binding affinity for H3K27me1. This addition is now incorporated into the revised results section. Additionally, we have updated Supplemental Fig.S1 and introduced new supplemental data (Supplemental Fig.S2) to illustrate the inhibition of PARP-1 binding by H3S10P, H3S28P, and H3T11P. The comprehensive exploration of PARP-1's interaction with mono-methylated histones, as suggested by the reviewer, is now more robustly documented in our revised figures and supplementary materials.

      Our Discussion section has been refined to articulate more clearly how PARP-1 may be selectively recruited to active chromatin domains through its interaction with mono-methylated histone marks. We have proposed a model where PARP-1 actively participates in the turnover process, contributing to the maintenance of an active chromatin environment. This proposed mechanism involves PARP-1 selectively binding to mono-methylated active histone marks associated with highly transcribed genes. Upon activation, PARP-1 undergoes automodification, leading to its release from chromatin and facilitating the reassembly of nucleosomes carrying the mono-methylated marks. The enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) subsequently cleaves pADPr, allowing for the restoration of PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis is consistent with existing research across various model organisms and aligns with the known association of PARP-1 with highly expressed genes, as well as its role in mediating nucleosome dynamics and assembly.

      Our Discussion section is modified a followed: Finaly, highly transcribed genes have been reported to present a high turnover of mono-methylated modifications, maintaining a state of low methylation (50). Then, our findings suggest that PARP-1 might actively participate in the turnover process to uphold an active chromatin environment. The proposed mechanism unfolds as follows: 1) PARP-1 selectively binds to mono-methylated active histone marks associated with highly transcribed genes. 2) Upon activation, PARP-1 undergoes automodification and is subsequently released from chromatin, facilitating the reassembly of nucleosomes carrying the mono-methylated marks. 3) The enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) cleaves pADPr, allowing for the restoration of PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis aligns cohesively with existing research conducted across various model organisms, including mice, Drosophila, and Humans (7, 23, 29, 51-53). Notably, previous studies have consistently demonstrated that PARP-1 predominantly associates with highly expressed genes and plays a crucial role in mediating nucleosome dynamics and assembly. Thus, our proposed model provides a molecular framework that may contribute to understanding the relationship between PARP-1 and the epigenetic regulation of gene expression. Further experimental validation is warranted to elucidate the precise details of this proposed mechanism and its implications in the broader context of chromatin dynamics and transcriptional control.

      We hope that these revisions address the reviewer's concerns and contribute to the overall strength and clarity of our manuscript.

      (2) The RNAseq analysis of Parp1/PR-Set7 mutants is reasonable, but there is a caveat to the author's conclusion (Line 251): "our results indicate H4K20me1 may be required for PARP-1 binding to preferentially repress metabolic genes and activate genes involved in neuron development at co-enriched genes." An alternative possibility is that many of the gene expression changes are indirect consequences of altered development induced by Parp1 or PR-Set7 mutants. For example, Parp1 could activate a transcription factor that represses the metabolic genes that they mention. The authors should consider discussing this possibility.

      We hope that these revisions address the reviewer's concerns and contribute to the overall strength and clarity of our manuscript.

      We extend our gratitude to Reviewer 2 for their thoughtful consideration of our manuscript and the insightful suggestion. In response to the raised concern regarding the conclusion on Line 251, where we proposed that "our results indicate H4K20me1 may be required for PARP-1 binding to preferentially repress metabolic genes and activate genes involved in neuron development at co-enriched genes," we acknowledge the alternative possibility suggested by the reviewer. It is plausible that many of the observed gene expression changes are indirect consequences of altered development induced in parp-1 or pr-set7 mutants. For example, PARP-1 could activates a transcription factor that represses the mentioned metabolic genes.

      To address this concern, we have revisited our data and incorporated relevant findings from one of our recent studies that utilized a ChIP-seq approach. The results from this study suggest a direct binding of PARP-1 to the loci of metabolic genes, providing support for the notion that PARP-1 may indeed directly regulate their expression (PMID: 37347109). We have updated the Discussion section to reflect this information, aiming to provide a more comprehensive perspective on the potential mechanisms underlying the observed gene expression changes: In our previous study, we discovered that PARP-1 plays a crucial role in repressing highly active metabolic genes during the development of Drosophila by binding directly to their loci (34). Also, PARP-1 is required for maintaining optimum glucose and ATP levels at the third-instar larval stage (34). During Drosophila development, repression of metabolic genes is crucial for larval to pupal transition (35, 36). This repression is linked to the reduced energy requirements as the organism prepares for its sedentary pupal stage (35, 37). Notably, we observed that PARP-1 shows a high affinity for binding to the gene bodies of these metabolic genes (34).

      We believe these modifications contribute to a more informed interpretation of our findings.

      (3) The section on the inducibility of heat shock genes is interesting but missing an important control that might significantly alter the author's conclusions. Hsp23 and Hsp83 (group B genes) are transcribed without heat shock, which likely explains why they have H4K20me without heat shock. The authors made the reasonable hypothesis that this H4K20me would recruit Parp-1 upon heat shock (line 270). However, they observed a decrease of H4K20me upon heat shock, which led them to conclude that "H4K20me may not be necessary for Parp1 binding/activation" (line 275). However, their RNA expression data (Fig4A) argues that both Parp1 and H40K20me are important for activation. An alternative possibility is that group B genes indeed recruit Parp1 (through H4K20me) upon heat shock, but then Parp1 promotes H3/H4 dissociation from group B genes. If Parp1 depletes H4, it will also deplete H4K20me1. To address this possibility, the authors should also do a ChIP for total H4 and plot both the raw signal of H4K20me1 and total H4 as well as the ratio of these signals. The authors could also note that Group A genes may similarly recruit Parp1 and deplete H3/H4 but with different kinetics than Group B genes because their basal state lacks H4K20me/Parp1. To test this possibility, the authors could measure Parp association, H4K20methylation, and H4 depletion at more time points after heat shock at both classes of genes.

      We thank Reviewer 2 for their valuable comment on our manuscript. We acknowledge your hypothesis suggesting that PARP-1 may induce H3/H4 dissociation from group B genes, potentially leading to a reduction in H4K20me1. However, our findings support a different interpretation.

      Our data indicate that while H4K20me1 is present under normal conditions at group B genes, its reduction following heat shock does not appear to hinder PARP-1's role in transcriptional activation (Fig 4A, C and E). We propose that the observed decrease in H4K20me1 might reflect a regulatory shift in chromatin structure that is conducive to transcriptional activation during heat shock, facilitated by PARP-1 independently of sustained H4K20me1 levels at group B genes. Additionally, the literature suggests a dual role for H4K20me1 in gene regulation, from facilitating transcriptional elongation in certain contexts to acting as a repressor in others.

      Unlike in group A genes which had low enrichment of H4K20me1 before heat shock (Fig 4B and D), the high enrichment of H4K20me1 in group B genes (Fig 4C and E) could imply a repressive role for this mark prior to heat stress. Thus, in the context of group B genes, it's conceivable that the removal of H4K20me1 might be necessary for their activation during heat stress. Thus, PR-SET7 may possess functions beyond its role as a histone methylase, which are crucial for activating group B genes under heat stress conditions. These functions could include methylation of non-histone substrates and non-catalytic activities.

      Furthermore, our analysis of gene expression in pr-set720 and parp-1C03256 mutants indicates that while PARP-1 and H4K20me1 interaction may have overlapping roles in gene regulation, they also possess distinct functions in the modulation of gene expression (Fig 3E). Thus, we propose that the relationship between PR-SET7 and PARP-1 in transcriptional regulation involves a complex regulatory mechanism that extends beyond the presence of H4K20me1.

      We modified the discussion section to address this point: Another plausible explanation could be that the recruitment of PARP-1 to group B genes loci promotes H4 dissociation and then leads to a reduction of H4K20me1. However, our findings suggest an alternative interpretation: the decrease in H4K20me1 at group B genes during heat shock does not seem to impede PARP-1's role in transcriptional activation, (Fig.4A, C and E). Rather than disrupting PARP-1 function, we propose that this reduction in H4K20me1 may signify a regulatory shift in chromatin structure, priming these genes for transcriptional activation during heat shock, with PARP-1 playing an independent facilitating role. Moreover, existing studies have highlighted the dual role of H4K20me1, acting as a promoter of transcription elongation in certain contexts and as a repressor in others (13, 25, 38, 39, 41-45). The elevated enrichment of H4K20me1 in group B genes under normal conditions may indicate a repressive state that requires alleviation for transcriptional activation. Additionally, we cannot discount the possibility of unique regulatory functions associated with PR-SET7, extending beyond its recognized role as a histone methylase. Non-catalytic activities and potential interactions with non-histone substrates might contribute to the nuanced control exerted by PR-SET7 on group B genes during heat stress (46, 47). Furthermore, our exploration of pr-set720 and ParpC03256 mutants reveals distinct roles for PARP-1 and H4K20me1 in modulating gene expression (Fig 3E). This reinforces the notion that the interplay between PR-SET7 and PARP-1 involves a multifaceted regulatory mechanism. Understanding the intricate relationship between these molecular players is crucial for elucidating the complexities of gene expression modulation under heat stress conditions.

      We hope that this modification will adequately address Reviewer 2 concerns and enhance the clarity of our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please check the entire manuscript for grammatical errors and typos. PR-set7 has been wrongly written as PR-ste7 in quite a few places in the manuscript. Poly (ADP)-ribosylation has been written as poly(ADP-ribosyl)ation in the last result heading. There are more such errors. Please rectify them.

      We express our sincere appreciation to Reviewer 1 for their meticulous review of our manuscript, and we acknowledge the importance of ensuring grammatical accuracy and clarity. We have taken your feedback seriously and conducted a comprehensive revision of the entire manuscript to rectify the identified typos and grammatical errors. We hope that these revisions contribute to an improved overall presentation of our research, and we appreciate the reviewer's diligence in ensuring the accuracy of the manuscript.

      (2) The authors can also look up publicly available mammalian ChIP-seq data for H4K20me1 and PARP1, in order to further ossify their findings and increase the breadth of their work.

      We appreciate the suggestion from Reviewer 1 and have taken steps to further validate and broaden the scope of our findings. Specifically, we compared the distribution of PARP1 and H4K20me1 in Human K562 cells. The results of this analysis revealed a correlation in their distribution, supporting the idea that the observed correlation between PARP-1 and H4K20me1 is not limited to fruit flies. We have incorporated these findings into the Results section and added a new Supplemental Fig. S6 to visually highlight this correlation: Finally, to extend the generalizability of our observations beyond Drosophila, we compared the distribution of PARP1 and H4K20me1 in Human K562 cells. Strikingly, we observed a correlation in their distribution, suggesting that the interplay between PARP-1 and H4K20me1 is not limited to fruit flies (Supplemental Fig. S6).

      We believe that this modification addresses Reviewer 1's suggestion by providing additional evidence that supports the broader relevance of our findings beyond the Drosophila model system.

      (3) Please discuss in greater detail how the PARP1-H4K20me1 axis orchestrates the repression program (metabolic pathways in this case) with proper references.

      We appreciate Reviewer 1's continued engagement with our manuscript and have adjusted the discussion section to provide a more detailed insight into how the PARP1-H4K20me1 axis orchestrates the repression program, particularly focusing on metabolic pathways. The modified discussion section now reads: In our previous study, we discovered that PARP-1 plays a crucial role in repressing highly active metabolic genes during the development of Drosophila by binding directly to their loci (34). Also, PARP-1 is required for maintaining optimum glucose and ATP levels at the third-instar larval stage (34). During Drosophila development, repression of metabolic genes is crucial for larval to pupal transition (35, 36). This repression is linked to the reduced energy requirements as the organism prepares for its sedentary pupal stage (35, 37). Notably, we observed that PARP-1 shows a high affinity for binding to the gene bodies of these metabolic genes (34). Our data indicates that in both parp-1 and pr-set7 mutant animals, there was a preferential repression of metabolic genes at sites where PARP-1 and H4K20me1 are co-bound (Fig.3E), while these metabolic genes are highly active during third-instar larval stage (Supplemental Fig.S6). Thus, we propose that the presence of H4K20me1 may be essential for the binding of PARP-1 at these gene bodies, contributing to their repression. Importantly, this mechanism of gene repression has broader developmental implications. As earlier stated, mutant animals lacking functional PARP-1 and PR-SET7 undergo developmental arrest during larval to pupal transition. This arrest could be directly linked to the disruption of the normal metabolic gene repression during development. Without the repressive action of PARP-1 and PR-SET7, key metabolic processes might remain unchecked, leading to metabolic imbalances that are incompatible with the normal progression to the pupal stage.

      We hope these modifications provide a more comprehensive discussion on how the PARP1-H4K20me1 axis influences the repression program, particularly within metabolic pathways, and how this mechanism contributes to the broader context of Drosophila development.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents a useful inventory of immune signatures that are correlated with cancer treatment-related pneumonitis. The data were collected and analysed using solid and validated methodology and can be used as a starting point for further functional studies.

      We sincerely thank the editor for their encouraging comments regarding our study. As rightly pointed out, this study indeed serves as a pivotal starting point for subsequent functional studies.

      Reviewer #2 (Recommendations For The Authors):

      I greatly appreciate the authors diligence in addressing all the suggested points. The paper now presents significantly stronger evidence to support the findings.

      I do have one final question: Could you clarify how the correlation presented in Supplementary Figure 3 was calculated? Is it a Pearson correlation of CTCAE grade directly to marker expression? Additionally, could you explain how the significance was determined? The authors mention a significant correlation for CCR7, but the heatmap displays similarly high values for CD7 and CD57. Finally, I'm curious about the absence of CD16 in the heatmap.

      Thank you for your insightful query. To clarify, the correlation shown in Supplementary Figure 3 was indeed calculated using the Pearson correlation coefficient. This involved correlating the CTCAE grade directly with the mean expression levels of each marker. The computations were conducted using GraphPad Prism version 9. Regarding the statistical significance, we defined a threshold of P < 0.05 as significant. Specifically, the P-values for CCR7, CD7, and CD57 were found to be 0.009, 0.035, and 0.039, respectively. Hence, while CCR7 showed a significant correlation, CD7 and CD57 also exhibited relatively high values, as correctly observed. We have added CD7 and CD57 along with CCR7 in the discussion section, though not to mention much for better focusing on CD16.

      CD16 was initially omitted from Supplementary Figure 3 to prevent redundancy and preserve data clarity. Nonetheless, in light of your query, we have included CD16 in the correlation matrix to provide a comprehensive view of its association with other markers.

      We hope this adequately addresses your question and further clarifies our findings.

      Reviewer #3 (Recommendations For The Authors):

      General suggestions for presentation in the future:

      It is essential to concretely define the numbers presented in all figures and plots. For example, in Figure 6 (I), what does it mean by "percentage representation of FCGR3A (CD16)"? Percentage of what? How did you calculate that? It is also important to show more statistics in general, for example, in dot plots like Figure 6 (H), where are the means and p-values? Little things like that completely change the impact of the figures. For the narrative of this paper, it is OK, but in the future, fine-tuning the presentation would massively improve the impact of the work which the contents deserve.

      Thank you for your insightful feedback. Addressing your concerns, I have revised Figure 6H and Figure 6I to provide a more precise and informative presentation of our data. In Figure 6H, the violin plots illustrate the expression intensity of FCGR3A (CD16) on CD4+ and CD8+ T cells. Each dot represents an individual cell within the BALF from both healthy controls (HC) and COVID-19 patients. This data was derived from the single-cell RNA-seq dataset GSE145926. To enhance clarity and statistical robustness, I have now included p-values directly in Figure 6H. Additionally, for a more comprehensive understanding, the means ± standard deviation (SD) have been incorporated into the main text of the manuscript.

      Regarding Figure 6I, it depicts the proportion of FCGR3A (CD16)-positive cells within the CD4+ and CD8+ T cell populations in BALF from HC and COVID-19 patients. The threshold for FCGR3A expression was set at 0.5. Upon further review and in response to your feedback, I realized an error in the calculation of the proportion of FCGR3A-positive cells among CD4+ and CD8+ T cells. Initially, the proportion of FCGR3A-positive CD4+ T cells was calculated in relation to the entire CD4+ T cell population, without differentiation between the groups. This has now been corrected, and the adjusted figures are presented in Figure 6I.

      I am grateful for the opportunity to refine these figures, as your suggestions have not only helped to correct the error but have also significantly enhanced the impact and clarity of our work. Your guidance has been instrumental in improving the overall quality and presentation of our research, ensuring that the findings are communicated effectively and accurately.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on diabetogenic risk from colorectal cancer (CRC) treatment. The authors claim that postoperative screening for type 2 diabetes should be prioritized in CRC survivors with overweight/obesity, irrespective of the oncological treatment received. The evidence supporting the claims is solid but requires confirmation in different populations. These results have theoretical or practical implications and will be of interest to endocrinologists, oncologists, general practitioners, gastrointestinal surgeons, and policymakers working on CRC and diabetes.

      Author response: We thank you for taking the time to provide constructive feedback on our manuscript and for the useful suggestions. We have provided a point-by-point response to each of the reviewers’ comments with clearly marked changes to the manuscript.

      Public reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors set out to determine whether colorectal cancer surgery site (right, left, rectal) and chemotherapy impact the subsequent risk of developing T2DM in the Danish national health register.

      Strengths:

      • The research question is conceptually interesting

      • The Danish national health register is a comprehensive health database

      • The data analysis was thorough and appropriate

      • The findings are interesting, and a little surprising that there was no impact of chemotherapy on the development of T2DM

      Weaknesses:

      This is not a weakness as such, but in the discussion, I would consider adding some brief comment on the international generalizability of the findings - e.g. demographic make up of the Danish population health register and background rates of DM and obesity in this population with CRC compared to countries on other continents.

      Author response: We agree that this information would be valuable. It has now been added in the Discussion section.

      Changes in manuscript: "In Denmark, the overall T2D prevalence is 6.9%25, lower than the global average in 2021 (10.5%) and also falls below the estimate of high-income countries (11.1%).26 Similarly, the obesity rate of 20% aligns with other Scandinavian countries and is below that of most high-income nations.27” (Page 8, line 256-258)

      A little more information would be helpful regarding how T2DM was diagnosed in the registry.

      Author response: We have now added a more thorough explanation of how T2D was diagnosed in the Methods section.

      Changes in manuscript: “Diabetes is defined as the second occurrence of any event across three types of inclusion events: 1) Diabetes diagnosed during hospitalisation 2) diabetes-specific services received at podiatrist 3) purchases of glucose lowering. Thus, if a patient developed transient T2D during chemotherapy treatment, it will only be an inclusion event if they purchase glucose lowering drugs. Individuals were classified as having T1D if they had received prescriptions for insulin combined with a diagnosis of type 1 from a medical hospital department. Otherwise, diabetes was classified as type 2.22” (Page 5, line 154-160)

      If someone did develop transient hyperglycemia requiring DM medications during chemotherapy, would the investigators have been able to identify these people?

      Author response: Yes, we have added a sentence in the Methods section.

      Changes in manuscript: “Thus, if a patient developed transient T2D during chemotherapy treatment, it will only be an inclusion event if they purchase glucose lowering drugs.” (Page 5, line 156-158)

      Would they have been classified as T2DM based on filling a prescription for DM meds for a period of time? Also, did the authors have information regarding time to development of T2DM after surgery?

      Author response: Yes, if they have 2 (or more) prescriptions of oral glucose lowering drugs. Yes, we have information regarding time to development of T2DM after surgery and found no difference between the groups.

      Changes in manuscript: Information on mean time to develop T2D post-surgery has now been added to Table 2.

      In the adjusted Models, the authors did not adjust for cancer stage, even though cancer stage appears to be very different between the chemo and no chemo groups. It would be interesting to know if it affects the results if the model adjusted for cancer stage

      Author response: We agree that adjustment for cancer stage would be a valuable information and we have performed the analysis and added a sentence in the Result section.

      Changes in manuscript: An adjusted analysis of cancer stage now appears in the Supplementary table 1.

      “Moreover, adjusting for cancer stage did not affect the results (Supplementary table 1).” (Page 7, line 219-220)

      It would be worthwhile to report if mortality rates were different between the groups during follow up, and if the authors investigated whether perhaps differences in mortality rates led to specific groups living longer, and therefore having more time to develop DM

      Author response: This situation is accounted for in the analysis by using Cox-regression analysis. This method accounts for the potential competing effect of mortality.

      Changes in manuscript: None.

      Overall, the authors achieved their aims, and the conclusions are supported by their results as reported.

      The results are unlikely to significantly change patient treatment or T2DM screening in this population. With some additional information, as described above, the results would be of interest to the community.

      Reviewer #2 (Public Review):

      Summary:

      The study showed the impact of cancer treatment on new onset of diabetes among patients with colorectal cancer using the national database. Findings reported that individuals with rectal cancer without chemotherapy were less likely to develop diabetes but among other groups, treatment didn't show any impact on the development of diabetes. BMI still played a significant role in developing diabetes regardless of treatment types.

      Strengths:

      One of the strengths of this study is innovative findings about the prognosis of colorectal cancer treatment stratified by treatment types. Especially, as it examined the impact of treatment on the risk of new chronic disease after diagnosis, it became significant evidence that suggests practical insights in developing a proper monitoring system for patients with colorectal cancer and their outcomes after treatment and diagnosis. It is imperative for providers to guide patients and caregivers to prevent adverse outcomes like new onset of chronic disease based on BMI and types of treatment. The next strength is the national database. As the study used the national database, the generalizability is validated.

      Weaknesses:

      Even though the study attempted to examine the impact of each treatment option, the dosage of chemotherapy and the types of chemotherapy were not able to be examined due to the data source.

      Author response: No unfortunately not. We agree that this would have been valuable information. This is stated in the original manuscript as a limitation. Please refer to page 10 line 305-306.

      Changes in manuscript: None.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor things:

      There are minor inconsistencies in the methods and results regarding BMI. In the methods, the authors state that BMI <18.5 and >/=40 were excluded, but these groups are included in Table 2.

      Author response: This has been corrected

      Changes in manuscript: BMI groups <18.5 and >/=40 are now excluded in Table 2. (Page 18)

      Line 204, I believe should be BMI 18.5-24.9, not 20-24.9.

      Author response: This has been corrected

      Changes in manuscript: “For each group (type of surgery ± chemotherapy), the HR for developing T2D depending on BMI subgroups was calculated by using Cox regression analysis adjusted for age, sex, year of surgery, and ASA score using normal weight (BMI:18.5-24.9) as the reference group.” (Page 6, line 184-186)

      Rather than showing the BMI mean in Table 1, it would be interesting to see the BMI breakdown by category.

      Author response: Yes, we agree. This analysis has now been added to Table 1

      Changes in manuscript: Please refer to Table 1

      Re line 215, I would consider rewriting to remove the multiple negatives -e.g. Radiation therapy in rectal resected had did not impact the incidence rate of T2D in the Rectal-No-Chemo group or Rectal-Chemo group

      Author response: This has been corrected. Please refer to the Result section.

      Changes in manuscript: “Radiation therapy in the rectal resected groups had no impact on the incidence rate of T2D (Table 2); and the unadjusted/adjusted HR of developing T2D was non-significant when comparing Rectal-No-Radiation patients with Rectal-Radiation patients (Table 3).” (Page 7, 223-225)

      Consider changing some of the "didn't"s in the discussion to "did not"

      Author response: This has been corrected.

      Changes in manuscript: Revised and corrected throughout the discussion.

      Reviewer #2 (Recommendations For The Authors):

      Some points need to be clarified and improved.

      In the method, patients with Type 1 Diabetes were excluded in the baseline but some patients were diagnosed with Type 1 diabetes after treatment and they were included in your analysis. It is interesting to identify Type 1 Diabetes after the treatment as an outcome, do you think that this diagnosis is caused by the treatment? And incidence rate or other HRs did not seem to include Type 1 Diabetes as stated in the methods. Did you exclude every Type 1 diabetes? If not, It needs to give further explanation about this outcome since the mechanism of Type 1 Diabetes and Type 2 Diabetes is different.

      Author response: This matter has now been clarified in the Methods section.

      Changes in manuscript: “Additionally, individuals diagnosed with Type 1 diabetes (T1D) either before or after surgery were excluded, along with those diagnosed with T2D preoperatively or within the first 2 weeks postoperatively, as the last group probably represents patients with preoperatively unknown pre-existing prediabetes or diabetes.22” (Page 4, line: 125-128)

      Despite limited existing findings, some studies actually reported the incidence rates of Type 2 Diabetes among patients with CRC (Singh S, Earle CC, Bae SJ, et al. Incidence of Diabetes in Colorectal Cancer Survivors. J Natl Cancer Inst. 2016;108(6):djv402. Published 2016 Feb 2. doi:10.1093/jnci/djv402; Khan NF, Mant D, Carpenter L, Forman D, Rose PW. Long-term health outcomes in a British cohort of breast, colorectal and prostate cancer survivors: a database study. Br J Cancer. 2011;105 Suppl 1(Suppl 1):S29-S37. doi:10.1038/bjc.2011.420; Jo A, Scarton L, O'Neal LJ, et al. New onset of type 2 diabetes as a complication after cancer diagnosis: A systematic review. Cancer Med. 2021;10(2):439-446. doi:10.1002/cam4.3666) whereas your study examined the impact of the different types of treatments.

      Author response: Our findings of T2D rate among CRC patients are now commented on in discussion section, and the abovementioned studies are included as references.

      Changes in manuscript: “This national cohort study demonstrated an IR of developing T2D after CRC surgery similar to previous studies.5,11” (Page 8, line 237-238)

      To strengthen the presentation, some places should be revised.

      • Line 216: it says that Table 1 showed no impact of radiation therapy on the incidence rate of T2D. However, either the interpretation or the table number seems wrong. Table 1 does not have this information. Correct this statement.

      • Line 239: There are typo and incomplete sentence. Check the sentence and correct the sentence.

      • Line 257-261: It may be a systematic issue to separate these two paragraphs. But two paragraphs seem related so make them one paragraph.

      Author response: These suggested changes have been made. Regarding line 216 the paragraph has been adjusted to the following:

      Changes in manuscript: “Radiation therapy in the rectal resected groups had no impact on the incidence rate of T2D (Table 2); and the unadjusted/adjusted HR of developing T2D was non-significant when comparing Rectal-No-Radiation patients with Rectal-Radiation patients (Table 3).” (Page 7, 223-225)

      Reference

      (1) Araghi M, Soerjomataram I, Jenkins M, et al. Global trends in colorectal cancer mortality: projections to the year 2035. Int J Cancer. 2019;144(12):2992-3000. doi:10.1002/ijc.32055

      (2) Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2017;66(4):683-691. doi:10.1136/gutjnl-2015-310912

      (3) González N, Prieto I, del Puerto-Nevado L, et al. 2017 Update on the Relationship between Diabetes and Colorectal Cancer: Epidemiology, Potential Molecular Mechanisms and Therapeutic Implications. Vol 8.; 2017. www.impactjournals.com/oncotarget

      (4) Mills KT, Bellows CF, Hoffman AE, Kelly TN, Gagliardi G. Diabetes mellitus and colorectal cancer prognosis: A meta-analysis. Dis Colon Rectum. 2013;56(11):1304-1319. doi:10.1097/DCR.0b013e3182a479f9

      (5) Singh S, Earle CC, Bae SJ, et al. Incidence of Diabetes in Colorectal Cancer Survivors. J Natl Cancer Inst. 2016;108(6). doi:10.1093/jnci/djv402

      (6) Xiao Y, Wang H, Tang Y, et al. Increased risk of diabetes in cancer survivors: a pooled analysis of 13 population-based cohort studies. ESMO Open. 2021;6(4). doi:10.1016/j.esmoop.2021.100218

      (7) Colorectal D, Nordcan 2019. 5-Year Age-Standardised Relative Survival (%), Males and Females. Accessed September 12, 2022. “https://nordcan.iarc.fr/en/dataviz/survival?cancers=520&set_scale=0&sexes=1_2&populations=208”" has been copied into your clipboard

      (8) Nano J, Dhana K, Asllanaj E, et al. Trajectories of BMI Before Diagnosis of Type 2 Diabetes: The Rotterdam Study. Obesity. 2020;28(6):1149-1156. doi:10.1002/oby.22802

      (9) Maddatu J, Anderson-Baucum E, Evans-Molina C. Smoking and the risk of type 2 diabetes. Translational Research. 2017;184:101-107. doi:10.1016/j.trsl.2017.02.004

      (10) Lega IC, Lipscombe LL. Review: Diabetes, Obesity, and Cancer-Pathophysiology and Clinical Implications. Endocr Rev. 2020;41(1). doi:10.1210/endrev/bnz014 (11) Jo A, Scarton L, O’Neal LTJ, et al. New onset of type 2 diabetes as a complication after cancer diagnosis: A systematic review. Cancer Med. 2021;10(2):439-446. doi:10.1002/cam4.3666

      (12) Feng JP, Yuan XL, Li M, et al. Secondary diabetes associated with 5-fluorouracil-based chemotherapy regimens in non-diabetic patients with colorectal cancer: Results from a single-centre cohort study. Colorectal Disease. 2013;15(1):27-33. doi:10.1111/j.1463-1318.2012.03097.x

      (13) Lee EK, Koo B, Hwangbo Y, et al. Incidence and disease course of new-onset diabetes mellitus in breast and colorectal cancer patients undergoing chemotherapy: A prospective multicenter cohort study. Diabetes Res Clin Pract. 2021;174. doi:10.1016/j.diabres.2021.108751

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Summary:

      In the present study, authors found the ternary complex formed by NCAN, TNC, and HA as an important factor facilitating the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex. NCAM binds HA via the N-terminal Link modules, meanwhile, TNC cross-links NCAN through the CDL domain at the C-terminal. The expression and right localization of these three factors facilitate the multipolar-bipolar transition necessary for immature neurons to migrate radially. TNC and NCAM are also involved in neuronal morphology. The authors used a wide range of techniques to study the interaction between these three molecules in the developing cortex. In addition, single and double KO mice for NCAN and TNC were analyzed to decipher the role of these molecules in neuronal migration and morphology.

      Strengths:

      The study of the formation of the cerebral cortex is crucial to understanding the pathophysiology of many neurodevelopmental disorders associated with malformation of the cerebral cortex. In this study, the authors showed, for the first time, that the ternary complex formed by NCAN, TNC, and HA promotes neuronal migration. The results regarding the interaction between the three factors forming the ternary complex are convincing.

      We appreciate the reviewers' positive assessment of our research.

      Weaknesses:

      However, regarding the in vivo experiments, the authors should consider some points for the interpretation of the results:

      • The authors did not use the proper controls in their experiments. For embryonic analysis, such as cortical migration, neuronal morphology, and protein distribution (Fig. 6, 7, and 9), mutant mice should be compared with control littermates, since differences in the results could be due to differences in embryonic stages. For example, in Fig. 6 the dKO is more developed than the WT embryo.

      It was challenging to compare double knockout mice with control littermates. When crossing Ncan and Tcn double heterozygous mice, the probability of obtaining double knockout mice is 1/16. Given an average litter size of around 8, acquiring a substantial number of double knockout mice would necessitate an impractical number of breeding pairs. Consequently, we were constrained to use non-littermate control mice. To address potential differences in developmental stages, we analyzed 19-20 embryos obtained from five individuals in each group, demonstrating that the observed differences between the two groups are more substantial than the inherent variability within each group.

      • The authors claim that NCAM and TNC are involved in neuronal migration from experiments using single KO embryos. This is a strong statement considering the mild results, with no significant difference in the case of TNC KO embryos, and once again, using embryos from different litters.

      We agree with the reviewer's comment that a single deletion of TNC has a minimal impact on neuronal migration. We have revised the Results section to reflect the mild nature of the TNC KO phenotype more accurately.

      Page 8, line 225: "In NCAN KO mice, a significantly lower percentage of labeled cells resided in the upper layer (Bin2), and more cells remained in the lower layer (Bin5) than in WT mice (Figure 7a). In contrast, the impact of a single deletion of TNC on neuronal cell migration was minimal. Although TNC KO mice exhibited a tendency to have a higher proportion of labeled cells in the lower layer (Bin4) than in WT mice, this did not reach statistical significance (Figure 7a). The delay in neuronal migration observed in the single KO mice was milder when compared to that observed in DKO mice (Figure 6a-c), suggesting that simultaneous deletion of both NCAN and TNC is necessary for a more pronounced impairment in neuronal cell migration."

      • The measurement of immunofluorescence intensity is not the right method to compare the relative amount of protein between control and mutant embryos unless there is a right normalization.

      We agree that measuring immunofluorescence intensity alone is insufficient for comparing the relative amount of protein. In Figure 8, we have employed Western blotting to compare the protein levels, revealing an approximately 50% reduction in NCAN and TNC following hyaluronidase digestion. In Figures 7b and 7c, we demonstrated alterations in the localization patterns of TNC and NCAN in Ncan KO and Tnc KO mice; however, we did not mention their quantity.

      • Page 7, line 206. "No significant abnormalities were observed in the laminar structure in 4-week-old DKO mice". The authors should be more careful with this statement since they did not check the lamination of the adult cortex. I would recommend staining, control and mutant mice, with markers of different cortical populations, such as Cux1, Ctip2, Tbr1, to asses this point.

      In response to the suggestion, we have conducted additional experiments to provide a more detailed examination of the laminar structure in the cerebral cortex. The results have been incorporated into the revised manuscript as follows:

      Page 7, line 209: "To investigate the laminar organization of the postnatal cerebral cortex, we analyzed the distribution of NeuN-positive postmitotic neurons in DKO mice at 2 weeks of age. No notable abnormalities were observed in the laminar structure of DKO mice (Figure 6-figure supplement 3a, b). Additionally, the laminar distribution of Ctip2-positive deep layer neurons showed no significant differences between WT and DKO mice (Figure 6-figure supplement 3a, c)."

      • The authors do not explain how they measured the intensity of TNC around the transfected Turbo-RFP-positive neurons.

      We added the following description to the Materials and Methods:

      Page 18, line 608: "Images were captured in the IZ region containing Turbo-RFP-positive neurons using a 100X magnification objective lens with 3.0X optical zoom on an AX R confocal microscope (Nikon). A total of 10 optical sections were acquired with a step size of 190 nm. Z-projection views were generated, and the staining intensity of TNC around Turbo-RFP-positive neurons was measured in a 59 × 59 µm area using ImageJ FIJI."

      • The loading control of the western blots should be always included.

      In Figure 6-figure supplement 1, we have incorporated western blot data using a GAPDH antibody as a loading control. We have added an explanation in the figure legend of Figure 3c, stating that we analyzed the same samples as those used in Figure 1e.

      • For Fig. 3e, I think values are represented relative to E18 instead to P2.

      Thank you for pointing that out. As suggested, we have corrected the representation in Fig. 3e to be relative to E18 instead of P2.

      • I would recommend authors use the standard nomenclature for the embryonic stages. The detection of the vaginal plug is considered as E0.5 and therefore, half a day should be added to embryonic stages (E14.5...).

      We have revised our manuscript to designate the detection of the vaginal plug as E0.5, and subsequently, we have adjusted all embryonic stages by adding half a day, such as E14.5.

      • Fig 10K: I do not see the differences in the number of neurites in the graph.

      We have modified the presentation from a box-and-whisker plot to a bar graph to enhance the visibility of differences in the average number of neurites.

      • Line 37: Not all of the cerebral cortex is structured in 6 layers but the neocortex.

      We have changed 'cerebral cortex' to 'cerebral neocortex.'

      Reviewer 2

      Summary:

      ECM components are prominent constituents of the pericellular environment of CNS cells and form complex and dynamic interactomes in the pericellular spaces. Based on bioinformatic analysis, more than 300 genes have been attributed to the so-called matrisome, many of which are detectable in the CNS. Yet, not much is known about their functions while increasing evidence suggests important contributions to developmental processes, neural plasticity, and inhibition of regeneration in the CNS. In this respect, the present work offers new insights and adds interesting aspects to the facets of ECM contributions to neural development. This is even more relevant in view of the fact that neurocan has recently been identified as a potential risk gene for neuropsychiatric diseases. Because ECM components occur in the interstitial space and are linked in interactomes their study is very difficult. A strength of the manuscript is that the authors used several approaches to shed light on ECM function, including proteome studies, the generation of knockout mouse lines, and the analysis of in vivo labeled neural progenitors. This multi-perspective approach permitted to reveal hitherto unknown properties of the ECM and highlighted its importance for the overall organization of the CNS.

      Strengths:

      Systematic analysis of the ternary complex between neurons, TNC, and hyaluronic acid; establishment of KO mouse lines to study the function of the complex, use of in utero electroporation to investigate the impact on neuronal migration;

      We appreciate the reviewers' insightful comments.

      Weaknesses:

      The analysis is focused on neuronal progenitors, however, the potential impact of the molecules of interest, in particular, their removal on differentiation and /or survival of neural stem/progenitor cells is not addressed. The potential receptors involved are not considered. It also seems that rather the passage to the outer areas of the forming cortex is compromised, which is not the same as the migration process. The movement of the cells is not included in the analysis.

      In this study, we demonstrated that the ternary complex of NCAN, TNC, and HA is predominantly localized in the subplate/intermediate zone. This region lacks neural stem/progenitor cells but serves as the initiation site for the radial migration of postmitotic neurons. Consequently, our study focused on the role of the ternary complex in neuronal migration and polarity formation. We acknowledge that we did not investigate in-depth the potential effects of ECM perturbation on the differentiation and survival of neural stem/progenitor cells. However, as highlighted by the reviewer, it is important to explore the effects on neural stem/progenitor cells. To address this concern, we analyzed Pax6-positive radial glial cells and Tbr2-positive intermediate progenitor cells in the ventricular zone of wild-type and Ncan/Tnc double knockout (DKO) mice. Immunohistochemical analysis revealed no significant differences between WT and DKO mice (Figure 6-figure supplement 4a). Furthermore, the morphology of nestin-positive radial fibers exhibited no distinguishable variations between WT and DKO mice (Figure 6-figure supplement 4b, c).

      (1) In the description of the culture of cortical neurons the authors mentioned the use of 5% horse serum as a medium constituent. HS is a potent stimulus for astrocyte differentiation and astrocytes in vitro release neurocan. Therefore, the detection of neurocan in the supernatant of the cultures as shown in Figure 1h might as well reflect release by cultivated astrocytes.

      As pointed out by the reviewer, Figure 1h did not conclusively demonstrate that neurons are the sole source of NCAN production. Indeed, in situ hybridization analysis revealed the widespread distribution of Ncan mRNA throughout the cerebral cortex (Figure 2a). This result suggests that the production of NCAN involves not only neurons but also other cell populations, including radial glial cells and astrocytes. While we acknowledge the potential contribution of other cell types to NCAN production, Ncan expression by neurons during radial migration is a crucial aspect of our findings (Figure 1i, j). We have revised the manuscript as follows:

      Page 5, line 111: "This result suggested the secretion of NCAN by developing neurons; however, we cannot rule out the involvement of coexisting glial cells in the culture system. To investigate the expression of Ncan mRNA during radial migration in vivo, we labeled radial glial cells in the VZ with GFP through in utero electroporation at E14.5 (Figure 1i, Figure 1-figure supplement 1)."

      (2) It is known that neurocan in vivo is expressed by neurons, but may be upregulated in astrocytes after lesion, or in vitro, where the cells become reactive.

      We have incorporated the following description into the discussion:

      Page 11, line 359: "Previous studies have reported an upregulation of NCAN and TNC in reactive astrocytes, indicating the potential formation of the ternary complex of NCAN, TNC, and HA in the adult brain in response to injury (Deller et al., 1997; Haas et al., 1999)."

      (3) Do NCAN KO neurons show an increase in neurite growth on the TNC substrates? The response on POL was changed (Fig. 10h-k), but the ECM substrates were not tested with the KO neurons.

      The impact of ECM substrates on NCAN KO neurons has not been investigated, and this remains an avenue for further exploration in our ongoing research. Future studies aim to elucidate the NCAN-TNC connection by identifying TNC cell surface receptors and unraveling the subsequent intracellular signaling pathways.

      (4) Do the authors have an explanation for why the ternary complex is concentrated in the SP/IZ zone?

      In the mature brain, hyaluronan acts as a scaffold that facilitates the accumulation of ECM components, including proteoglycans and tenascins around neurons. Therefore, it is conceivable that the ECM components bind to hyaluronan in the embryonic brain, resulting in its accumulation in the subplate/intermediate zone. In support of this hypothesis, enzymatic digestion of hyaluronan in the subplate/intermediate zone led to the disappearance of TNC and NCAN accumulation (Figure 8a-c). This result may account for the disparity observed, where Tnc mRNA is expressed in the ventricular zone while the TNC protein localizes to the subplate/intermediate zone.

      (5) Are hyaluronic acid synthesizing complexes (HAS) concentrated in the SP/IZ?

      According to the reviewer's comment, we have investigated the localization of Has2 and Has3 mRNA using in situ hybridization. However, due to the relatively low expression levels of these enzymes, we encountered challenges in obtaining clear signals (Author response image 1). Further research is needed to understand the mechanisms behind the localization of hyaluronan in the intermediate zone.

      Author response image 1.

      In situ hybridization analysis of Has2 and 3 mRNA on the E16.5 cerebral cortex. Upper images show results of in situ hybridization using antisense against Has2 and 3. Lower images are in situ hybridization using sense probes as negative controls.

      (6) CSPGs as well as TNC are part of the neural stem/progenitors cell niche environment. Does the removal of either of the ECM compounds affect the proliferation, differentiation, and/or survival of NSPCs, or their progeny?

      )7) This question relates to the fact that the migration process itself is not visualized in the present study, rather its outcome - the quantitative distribution of labeled neurons in the different bins of the analysis. This could also derive from modified cell numbers.

      As pointed out by the reviewer, previous studies have shown the role of CSPGs and TNC as components of the neural stem/progenitor cell niche (see reviews by (Faissner et al., 2017; Faissner and Reinhard, 2015). However, as mentioned in Response #2, based on our analyses, we did not observe a reduction in neural stem/progenitor cells in NCAN/TNC double-knockout mice. While we cannot precisely explain this discrepancy, it is worth noting that many past studies evaluated the activities of the ECM molecules in in vitro systems such as neurospheres. The observed differences may stem from variations in experimental systems.

      (8) What is the role of the ECM in the SP/IZ area? Do the cells need the ECM to advance, the reduction would then leave the neuronal progenitors in the VZ area? This somehow contrasts with interpretations that the ECM acts as an obstacle for neurite growth or cell migration, or as a kind of barrier.

      The role of the ECM is multifaceted, with certain ECM molecules known to inhibit neurite outgrowth while others facilitate it. Additionally, the effects of ECM can vary depending on the cell type. It is established that after migrating neurons adhere to radial fibers, they utilize these fibers as a scaffold to migrate toward the cortical surface. However, in the subplate/intermediate zone, migrating neurons have not yet adhered to radial fibers. This study provides evidence that multipolar neurons undergo morphological changes into bipolar cells with the assistance of the NCAN, TNC, and HA complex. Subsequently, this facilitates their movement along radial fibers.

      (9) A direct visualization of the movement of neural progenitors in the tissue as has been for example performed by the Kriegstein laboratory might help resolve some of these issues.

      As suggested by the reviewer, utilizing live imaging techniques to directly observe the movement of neural progenitors within the tissue is indeed a powerful tool. We recognize the significance of addressing these points in future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewer for the constructive comments. We have revised the papers to address the concerns. In summary, here is what we included in the revised version.

      • Statistical analysis using biological replicate datasets for WT and K40R doublet microtubule.

      • Addition figures for statistical analysis and MIP decorations in MEC17-KO and K40R.

      • Revised texts and figures to reflect the new changes, cite proper references and fix small errors throughout the text.

      Reviewer #1 (Public Review):

      Summary:

      The study "Effect of alpha-tubulin acetylation on the doublet microtubule structure" by S. Yang et al employs a multi-disciplinary approach, including cryo-electron microscopy (cryo-EM), molecular dynamics, and mass spectrometry, to investigate the impact of α-tubulin acetylation at the lysine 40 residue (αK40) on the structure and stability of doublet microtubules in cilia. The work reveals that αK40 acetylation exerts a small-scale, but significant, effect by influencing the lateral rotational angle of the microtubules, thereby affecting their stability. Additionally, the study provided an explanation of the relationship between αK40 acetylation and phosphorylation within cilia, despite that the details still remain elusive. Overall, these findings contribute to our understanding of how post-translational modifications can influence the structure, composition, stability, and functional properties of important cellular components like cilia.

      Strengths:

      (1) Multi-Disciplinary Approach: The study employs a robust combination of cryo-electron microscopy (cryo-EM), molecular dynamics, and mass spectrometry, providing a comprehensive analysis of the subject matter.

      (2) Significant Findings: The paper successfully demonstrates the impact of αK40 acetylation on the lateral rotational angles between protofilaments (inter-PF angles) of doublet microtubules in cilia, thereby affecting their stability. This adds valuable insights into the role of post-translational modifications in cellular components.

      (3) Exploration of Acetylation-Phosphorylation Relationship: The study also delves into the relationship between αK40 acetylation and phosphorylation within cilia, contributing to a broader understanding of post-translational modifications.

      (4) High-quality data: The authors are cryo-EM experts in the field and the data quality presented in the manuscript is excellent.

      (5) Depth of analysis: The authors analyzed the effects of αK40 acetylation in excellent depth which significantly improved our understanding of this system.

      Thank you for highlighting the strength of our paper.

      Weaknesses:

      I have no major concerns about this paper, but would recommend that a few minor issues be addressed.

      (1) Lack of Statistical Details: The review points out that the paper could benefit from providing more statistical details, such as the number of particles and maps used for analysis, randomization methods, and dataset splitting for statistical analyses.

      To address this, we analyzed the true biological replicate datasets (different cultures, cryo-EM vitrification and data collection) from WT and K40R. Since the MEC17-KO was collected as only one dataset, we decided to not divide the MEC-17 using randomization since the division does not lead to independent sets, which tends to yield identical results in the case of cryo-EM. The biological replicates help us to see how consistent is our structure data for interpretation. The information about the replicate dataset is now included in Table 1. The description of the analysis is highlighted in the manuscript and included in the Materials & Methods and Fig. S4.

      In summary, the biological replicate between the WT data indicates that the inter-PF rotation angles are significantly consistent between two biological replicates. On the other hand, there are variations in the inter-PF angles between two replicates of K40R data in the B-tubule (Fig. S4B).

      Overall, when pooling the data together ( 6 + 6 measurement points for WT dataset 1 & 2 and 6 + 6 measurement points for K40R dataset 1 & 2 and 6 measurement points for MEC17-KO) (Fig. S4), our analysis yields the same statistical significance as the average of all datasets (6 measurement points of the total averages for WT, K40R and MEC17-KO) (Fig. 3).

      In addition, the variation in inter-PF rotation angles between certain PF pairs within the K40R replicates (B7B8 and B9B10) is similar to the variation to MEC17-KO. This suggests that the deacetylation induces variation in inter-PF angles while the inter-PF angles are maintained consistently in WT.

      (2) Questionable Conclusion Regarding MIPs: The reviewer suggests caution in the paper's conclusion that "Acetylation of αK40 does not affect tubulin and MIPs." The reviewer recommends that this conclusion be more specific or supported by additional evidence to exclude all other possibilities.

      We now revised the text to make sure we do not overclaim that “Acetylation of αK40 does not affect tubulin and MIPs.” We now describe more specifically as “Lack acetylation of αK40 does not significantly affect tubulin and MIP interactions”. Also the text was edited to make the statement more specific.

      (3) Need for Additional Visual Data: The reviewer recommends that an enlarged local density map along with fitted PDB models be provided in a supplementary figure, such as Figure 4.

      We now include the density maps and fitted PDB models in Fig. 4 and Fig. S5. We also include more snapshots of the MIP in K40R and MEC17-KO in Figure S3.

      Overall, the paper is strong in its scientific approach and findings but could benefit from additional statistical rigor and clarification of certain conclusions.

      Page 11, Line 226: "cluster consists of only ~ acetylated", lacks the percentage. Please correct this.

      We corrected it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript describes the crystal structures of Streptococcus pneumoniae NOXs. Crystals were obtained for the wild-type and mutant dehydrogenase domain, as well as for the full-length protein comprising the membrane domain. The manuscript further carefully studies the enzyme's kinetics and substrate-specificity properties. Streptococcus pneumoniae NOX is a non-regulated enzyme, and therefore, its structure should provide a view of the NOX active conformation. The structural and biochemical data are discussed on this ground.

      Strengths:

      This is very solid work. The protein chemistry and biochemical analysis are well executed and carefully described. Similarly, the crystallography must be appreciated given the difficulty of obtaining good enzyme preparations and the flexibility of the protein. Even if solved at medium resolution, the crystal structure of the full-length protein conveys relevant information. The manuscript nicely shows that the domain rotations are unlikely to be the main mechanistic element of NOX regulation. It rather appears that the NADPH-binding conformation is pivotal to enzyme activation. The paper extensively refers to the previous literature and analyses the structures comprehensively with a comparison to previously reported structures of eukaryotic and prokaryotic NOXs.

      We thank the referee for these very nice comments about our work.

      Weaknesses:

      The manuscript is not always very clear with regard to the analysis of NADPH binding. The last section describes a "crevice" featured by the NADPH-binding sites in NOXs. It remains unclear whether this element corresponds to the different conformations of the protein C-terminal residues or more extensive structural differences. This point must be clarified.

      We agree with the referee that our terminology was not very clear. Responding to your comment helped us to improve our explanation: we have changed the text to emphasize the differences we observe in the distances between the FAD binding groove and the entire NADPH binding groove, which includes conserved NADPH-contacting motifs as well as the critical aromatic.

      A second less convincing point concerns the nature of the electron acceptor. The manuscript states that this NOX might not physiologically act as a ROS producer. A question then immediately arises: Is this protein an iron reductase?

      Can the authors better discuss or provide more data about this point?

      The referee has a legitimate point, which was also our first idea. In the initial work on SpNOX, where we discovered bacterial NOX enzymes (see Hajjar et al 2017 in mBio), we evaluated its possible role as an iron reductase. There we showed that SpNOX can reduce CytC directly; however, while some reduction of Fe3+-NTA complex (used classically in ferric reductase activity assay) occurred, this reduction was inhibitable by SOD and occurred indirectly by the superoxide produced, so therefore not a true iron reductase activity. This represents a mixed situation of direct and indirect reduction of an iron-containing acceptor that appears to preclude physiological iron reductase activity since it appears that the protein component of CytC allows it to interact with SpNOX. As these questions had been already addressed in a previous paper, we did not add anything here and we prefer to underline this possibility of another acceptor and to leave this question open for future works.

      Reviewer #2 (Public Review):

      The authors describe the structure of the S. pneumoniae Nox protein (SpNOX). This is a first. The relevance of it to the structure and function of eukaryotic Noxes is discussed in depth.

      Strengths and Weaknesses

      One of the strengths of this work is the effort put into preparing a pure and functionally active SpNOX preparation. The protein was expressed in E. coli and the purification and optimization of its thermostability and activity are described in detail, involving salt concentration, glycerol concentration, and pH.

      This reviewer was surprised by the fact that the purification protocol in the eLife paper differs from those in the mBio and Biophys. J. papers by the absence of the detergent lauryl maltose neopentyl glycol (LMNG). LMNG is only present in the activity assay at a low concentration (0.003%; molar data should be given; by my calculation, this corresponds to 30 μM).

      We regret this misunderstanding: our description was not clear enough. As the referee points out, in previous papers we purified the full length SpNOX with the detergent LMNG. In the current paper, we described only the protocol for SpNOX DH domain variant, a soluble cytoplasmic domain. We have now modified the text to clarify the difference between the purification of fulllength SpNOX variants, which were performed with detergent as cited in Vermot et al 2020, and the purification of DH domains, which are soluble and thus did not require detergent in the purification.

      In light of the presence of lipids in cryo-EM-solved structures of DUOX and NOX2, it is surprising that the authors did not use reconstitution of the purified SpNOX in phospholipid (nanodisk?). The issue is made more complicated by the statement on p. 18 of "structures solved in detergent like ours" when no use of detergent in the solubilization and purification of SpNOX is mentioned in the Methods section (p. 21-22).

      As stated above, detergent used to purify the full-length version of SpNOX. We did in fact perform some preliminary tests of reconstitution in nanodiscs. Different trials of negative staining studies showed heterogeneous size of SpNOX in nanodiscs and the initial images were not promising. Furthermore, in parallel, we had positive results in crystallography relatively quickly with protein in detergent. We thus focused on refining the crystals, which was a fairly long and mobilizing task; we decided to allocate time and resources to the promising avenue and did not further pursue nanodiscs.

      We did not go in theCryo-EM direction because the small size of the protein was initially believed to be a significant barrier to successful Cryo-EM. Perhaps we could have pursued this avenue: while our manuscript here was submitted to eLife, another group deposited a preprint in BioRxiv using CryoEM to solve the structure of SpNOX (see comment below). This structure was solved in detergent so even in this CryEM structure there is no information on the potential roles of lipids as asked by the referee.

      In this revised version, we have added a comment, in the last paragraph, in reference to the additional data available today thanks to the other structures generated by this other group (Murphy's group).

      Can the authors provide information on whether E. coli BL21 is sufficiently equipped for the heme synthesis required for the expression of the TM domain of SpNOX. Was supplementation with δaminolevulinic acid used

      The production of His-SpNox in E.coli C41(DE3) is without any δ-aminolevulinic acid supplementation. Supplementation was tested but no change was observed regarding the heme content (UV/Visible spectra) so we settled on the purification described by Vermot et al 2020. Initially, for the mBio paper (Haajar et al 2017), we performed heme titrations which gave stoichiometry between 1.35 to 1.5 heme/protein, indicating 2 hemes (these data were not shown). In the end in this work we observed two hemes in the crystal structure, thus confirming that E.coli, at least for this protein, did not need supplementation with δ-aminolevulinic acid .

      The 3 papers on SpNOX present more than convincing evidence that SpNOX is a legitimate Nox that can serve as a legitimate model for eukaryotic Noxes (cyanide resistance, inhibition by DPI, absolute FAD dependence, and NADPH/NADH as the donor or electrons to FAD). It is also understood that the physiological role of SpNOX in S. pneumoniae is unknown and that the fact that it can reduce molecular oxygen may be an experimental situation that does not occur in vivo.

      I am, however, linguistically confused by the statement that "SpNOX requires "supplemental" FAD". Noxes have FAD bound non-covalently and this is the reason that, starting from the key finding of Babior on NOX2 back in 1977 to the present, FAD has to be added to in vitro systems to compensate for the loss of FAD in the course of the purification of the enzyme from natural sources or expression in a bacterial host. I wonder whether this makes FAD more of a cosubstrate than a prosthetic group unless what the authors intend to state is that SpNOX is not a genuine flavoprotein.

      We believe there is some confusion between SpNOX – the full length transmembran protein -- and SpNOXDH -- the cytosolic domain only. The sentence pinpointed by the referee was in fact “The strict requirement of FAD addition for SpNOXDH activity suggests that the flavin behaves as a cosubstrate”. This statement was about the isolated cytosolic domain that does not contain the TM part of the protein.

      We agree that in WT NOX enzymes (including SpNOX) FAD is held within the enzyme structure and thus can be considered, by definition, as a prosthetic group. This is supported by the nanomolar affinity for FAD of SpNOX. We did not intend to say that NOX and SpNOX are not genuine flavoproteins.

      On the other hand, when isolated, the affinity of DH domain for flavins drops to the µM level. This µM level of affinity does not allow stable maintenance of the flavin in the active site as illustrated by the spectra of Figure 3. This is instead the typical affinity of a substrate or a co-substrate (similar to that of substrate NADPH) that can be exchangeable and diffuse in and out of the active site. The DH domain recognizes and reduces flavins but, as a consequence of its lower affinity, will release to its environment free reduced flavins. Thus the isolated DH behaves as a flavin reductase that uses flavin as substrate. Such enzymes have already been well described (and some of them are of the FNR family). Such enzymes, using flavin as substrate, typically have affinity for flavin in the µM range and share with the SpNOX DH binding properties centered on the isoalloxazine ring only.

      We understand that, in the text, to switch from the SpNOX to the SpNOX DH and for FAD from a prosthetic group to a diffusible co-substrate can be confusing. So, to make it clearer, we modified the following sentences and added references to “some flavin reductases characterization” that could provide support for the reader.

      “The strict requirement of FAD addition for SpNOXDH activity and its µM level of affinity suggests that the flavin behaves as a co-substrate rather than a prosthetic group. As an isolated domain, SpNOXDH may work as a flavin reductase enzyme (Gaudu et al, 1994; Fieschi et al 1995; Nivière et al 1996), ..”

      We hope that it will help.

      I am also puzzled by the statement that SpNOX "does not require the addition of Cyt c to sustain superoxide production". Researchers with a Cartesian background should differentiate between cause and effect. Cyt c serves merely as an electron acceptor from superoxide made by SpNOX but superoxide production and NADPH oxidation occur independently of the presence of added Cyt c.

      Thanks to the referee for pointing out this poor wording. We agree and have amended the text to clarify what we originally meant. It is now:

      “SpNOXDH requires supplemental FAD to sustain both superoxide production, which can be observed in the presence of Cyt c (Figure 2A), and NADPH oxidation, which can be observed in the absence of Cyt c (Figure 2B).”

      The ability of the DH domain of SpNOX (SpNOXDH) to produce superoxide is surprising to this reviewer.The result is based on the inhibition of Cyt c reduction by added superoxide dismutase (SOD) by 40%. In all eukaryotic Noxes superoxide is produced by the one-electron reduction of molecular oxygen by electrons originating from the distal heme, having passed from reduced FAD via two hemes. The proposal that superoxide is generated by direct transfer of electrons from FAD to oxygen deserves a more in-depth discussion and relies too heavily on the inhibitory effect of SOD. A control experiment with inactivated SOD should have been done (SOD is notoriously heat resistant and inactivation might require autoclaving).

      The initial reports of a NOX DH-domain-only construct (that of human Nox4) producing superoxide are cited in the text. Moreover, natural flavin reductases are known to produce superoxide due to the release of free reduced flavin in the medium.

      As explain above, FAD in full length SpNox is a relay for the electrons from NADPH to heme and is internal to the protein and thus devoted to this specific task.

      In the case of SpNOX DH, its flavin reductase behavior leads to the release in the medium of free reduced flavin as a nonspecific diffusible electron carrier. It has been already demonstrated that such free reduced flavin can efficiently reduce soluble O2 and be a source of superoxide.

      This has been particularly well documented in (Gaudu et al, 1994. J.Biol.Chem). We have added this reference to the text (see the modified sentence in a reply, 2 comments above).

      Furthermore, we want to point to the referee that the link between flavin and superoxide production here is not only based on the inhibition by SOD. When we added the flavin inhibitor DPI we observed no more superoxide production from the DH domain (Figure 2C). This supports the role of free-reduced flavin in both the production of superoxide and also part of direct cyt C reduction as observed.

      An unasked and unanswered question is that, since under aerobic conditions, both direct Cyt c reduction (60%) and superoxide production (40%) occur, what are the electron paths responsible for the two phenomena occurring simultaneously?

      We thank the referee for dedication to a clear understanding of the mechanism used by the SpNOXDH construct. It pushes us to develop a clear description of the mechanism at work here for the readers. Please find below a proposal mechanism describing the electron transfer from NAD(P)H to free flavin that can, as diffusible species, then reduce non-specifically either the O2 or the Cyt.C encountered.

      Author response image 1.

      However, it is important to remember that this is not physiological, and rather the result of using a DH domain isolated from the TM of SpNOX. Nonetheless, it shows that the DH domain is fully functional for NAD(P)H as well as the hydride transfer.

      This reviewer had difficulty in following the argument that the fact that the kcat of SpNOX and SpNOXDH are similar supports the thesis that the rate of enzyme activation is dependent on hydride transfer from nicotinamide to FAD.

      We have amended the text to clarify this point. If the reaction rate is not affected by the presence or absence of the hemes in the TM domain, this inevitably implies that the rate is NOT limited by the electron transfer to the heme, and ultimately to O2, from the FAD, and thus the hydride transfer step that oxidizes the FAD must be the rate limiting step.

      The section dealing with mutating F397 is a key part of the paper. There is a proper reference to the work of the Karplus group on plant FNRs (Deng et al). However, later work, addressing comparison with NOX2, should be cited (Kean et al., FEBS J., 284, 3302-3319, 2017). Also, work from the Dinauer group on the minimal effect of mutating or deleting the C-terminal F570 in NOX2 on superoxide production should be cited (Zhen et al., J. Biol. Chem. 273, 6575-6581, 1998).

      We thank the reviewer for pointing out our unintended omission of these important works; we have amended the text and added the citations.

      It is not clear why mutating F397 to W (both residues having aromatic side chains) would stabilize FAD binding.

      In a few words, trp’s double ring can establish larger and stronger vanderWaals contact with the isoalloxazine ring than the phe sidechain. Our discussion regarding this point is extensive in the structural section where we compare the structures with F and W in this position. At this time we do not think it is necessary to add anything to the text.

      Also, what is meant by "locking the two subdomains of the DH domain"? What subdomains are meant?

      The two subdomains are the NADPH-binding domain and the FAD-binding domain, which we define on p 11 (“SpNOXDH presents a typical fold of the FNR superfamily of reductase domain containing two sub-domains, the FAD-binding domain (FBD) and an NADPH-binding domain (NBD) “) and which are labeled in Fig. 4. By “locking” we meant to convey immobilizing them into a specific conformation; we have amended the text to clarify this point.

      Methodological details on crystallization (p. 11) should be delegated to the Methodology section. How many readers are aware that SAD means "Single Wavelength Anomalous Diffraction" or know what is the role of sodium bromide?

      We have amended the text to emphasize the intended point, which is the different origins of the two DH structures: the de novo structure was possible through co crystallization with bromide, and the molecular replacement structure used the de novo structure as a model.

      The data on the structure of SpNOX are supportive of a model of Nox activation that is "dissident" relative to the models offered for DUOX and NOX2 activation. These latter models suggested that the movement of the DH domain versus the TM domain was related to conversion from the resting to the activated state. The findings reported in this paper show that, unexpectedly, the domain orientation in SpNOX (constitutively active!) is much closer to that of resting NOX2. One of the criteria associated with the activated state in Noxes was the reduction of the distance between FAD and the proximal heme. The authors report that, paradoxically, this distance is larger in the constitutively active SpNOX (9.2 Å) than that in resting state NOX2 (7.6 Å) and the distance in Ca2+-activated DUOX is even larger (10.2 Å).

      A point made by the authors is the questioning of the paradigm that activation of Noxes requires DH domain motion.

      Instead, the authors introduce the term "tensing", within the DH domain, from a "relaxed" to a more rigid conformation. I believe that this proposal requires a somewhat clearer elaboration

      It is clear that the distance between the FAD and NADPH shown in the Duox and Nox2 structures is too large for the chemical reaction of hydride transfer. Wu et al used the terms ‘tense’ and ‘relaxed’ to describe conformations of the DH domain corresponding to ‘short distance’ and ‘longer distance’, respectively, between the two ligand binding sites. We quoted this terminology and have amended the text to clarify that we envision a motion of the NBD relative to the FBD, as distinct from a larger motion of the whole DH domain relative to the TM domain.

      The statement on p. 18, in connection to the phospholipid environment of Noxes, that the structure of SpNOX was "solved in detergent" is puzzling since the method of SpNOX preparation and purification does not mention the use of a detergent. As mentioned before, this absence of detergent in the present report was surprising because LMNG was used in the methods described in the mBio and Biophys. J. papers. The only mention of LMNG in the present paper was as an addition at a concentration of 0.003% in the activity assay buffers.

      Please see our response to similar points above. Detergent was present for the solubilization of the full-length SpNOX.

      The Conclusions section contains a proposal for the mechanism of conversion of NOX2 from the resting to the activated state. The inclusion of this discussion is welcome but the structural information on the constitutively active SpNOX can, unfortunately, contribute little to solving this important problem. The work of the Lambeth group, back in 1999 (cited as Nisimoto et al.), on the role of p67-phox in regulating hydride transfer from NADPH to FAD in NOX2 may indeed turn out to have been prophetic. However, only solving the structure of the assembled NOX2 complex will provide the much-awaited answer. The heterodimerization of NOX2 with p22-phox, the regulation of NOX2 by four cytosolic components, and the still present uncertainty about whether p67-phox is indeed the final distal component that converts NOX2 to the activated state make this a formidable task.

      The work of the Fieschi group on SpNOX is important and relevant but the absence of external regulation, the absence of p22-phox, and the uncertainty about the target molecule make it a rather questionable model for eukaryotic Noxes. The information on the role of the C-terminal Phe is of special value although its extension to the mechanism of eukaryotic Nox activation proved, so far, to be elusive.

      We really thank the referee for the positive comments on our work and the deep interest shown by this careful evaluation.

      We understand the arguments of the referee regarding the relevance of our work here to eukaryotic NOX, but we do not share the reservations expressed. While human NOXes need interactions with other proteins or have EF-hand or other domains that control them, SpNOX corresponds exactly to the minimal core common to any NOX isoform. In fact, because SpNOX has only this conserved core, it is unique in that it can work as a constitutively active NOX without protein-protein interactions or regulatory domains. Thus the fundamentals of electron transfer mechanisms of NOX enzyme are present in SpNOX.

      There might be some differences in the internal organization from isoform to isoform (as regarding the relative DH domain vs TM domain orientation) but considering the similarity between NOX2 and SpNOX topology we are rather confident that the SpNOX structure will turn out to be a reasonable model of the activated NOX2 structure. History will tell.

      In any case, this work on SpNOX allowed us to highlight hydride transfer as the limiting step and also to highlight some structural differences that could be at the source of the regulation in eukaryotic NOX. In itself, we think this is a significant contribution to the field.

      We warmly thank both referees for their constructive remarks and their help in the improvement of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • The manuscript states that the flavin "behaves" like a co-substrate and thereby reports on the Km for the flavins. I feel that this terminology might be confusing. The flavin is unchanged after the reaction, and what matters is the enzyme's affinity for the flavin and the flavin concentration needed to saturate the enzyme (to have it in the fully holo form).

      See above -- answering many questions from referee2, we have extensively commented on that point (substrate, cofactor, affinity, etc..) and made some adjustments in the text to clarify. We hope it is now satisfactory.

      • I could not find the methodological description of the experiments performed to measure the Km for the flavins, and the legend of Figure S4 does not help in this regard. I think that the data (left panels of S4) should be interpreted as binding curves with associated Kd values.

      We have changed the text to clarify the method used to measure Km for flavins.

      • A related point is that the manuscript refers to Km as an "affinity". This is inappropriate and should be avoided, as the Km is not the Kd.

      We agree with the referee that the Km is not the Kd. However, under the appropriate conditions, to which our experiments conform, Km is accepted as a relevant approximation of affinity (Srinisivan, FEBS Journal, v 289 pp 6086-6098 2022). We have added a sentence to clarify this point and cite this reference in the text.

      • The environment around the putative oxygen site should be shown. The text indicates that "the residues characteristic of the O2 reducing center in eukaryotic FRD domains of NOX and DUOX enzymes are not conserved in SpNOX." How does the site look? This point relates to the more general comment above on the oxidizing substrate used by this bacterial NOX.

      This is a really interesting point that contains many potential biological developments for future studies of this prokaryotic family of NOX enzymes. While we were submitting this work to eLife for evaluation, another group (Murphy's lab) filed a pre-publication in BioRXiv, in which they also solved the structure of SpNOX but this time by CryoEM with an unexpected level of resolution for such a small protein (their paper is not yet published but probably under peer review somewhere). In their work, they made a special effort to identify the O2 reducing center (bacterial NOX sequences alignment, mutation studies, …) They were not able to localize such a site with accuracy. There is also other complementary data between their work and ours. So, we will add a paragraph at the end of the discussion to comment on this parallel work and to emphasize on the complementarity of their studies and what it brings to the final understanding of this enzyme.

      • The section "A Close-up View of NOX's NAD(P)H Binding Domains vs the FNR Gold Standard" should be clarified.

      I found it difficult to understand. Is the different conformation of Phe397 creating the crevice? Could NADPH be modeled in NOX2 and DUOX in the same conformation observed in FNR and modeled in the bacterial NOX? Or would there be clashes, implying the necessity of larger conformational changes to bring the nicotinamide closer to the FAD?

      Please see responses above on this point; we have amended the text to clarify. In a few words, we propose that activation in the eukaryotic enzymes would entail NBD subdomain (containing NADPH site) towards the FBD subdomain (containing FAD) through an internal motion within the DH domain. Doing so, they would approach the DH domain topology of SpNOX, which models an active state.

      Reviewer #2 (Recommendations For The Authors):

      On p. 6, second line, it should be (Figure 1C and 1D). Space is missing between C and "and".

      On p. 9, in Figure 3, the labeling A and B are missing. Also, the legend of part B does not correspond to the actual graph colors. Thus, the tracing of F397W is red and not grey as indicated in the legend.

      Corrected. Thank you

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) V2 epitopes exhibit properties of CD4i epitopes in that they are largely absent from the native Env surface, probably by glycan-occlusion, but become more exposed upon CD4 binding. Although the V2-scaffolds were produced in GnTi- cells to produce highmannose proteins, it appears that no systematic analysis of glycan content or structure was carried out save for enzymatic deglycosylation of the constructs to sharpen bands on SDS-PAGE gels. It would be helpful if the authors could comment on how the lack of this information might impact their conclusions.

      We thank the reviewer for this comment.

      The lack of native glycan structures is a common phenomenon in all HIV studies involving in vitro cell culture-expressed envelope proteins.

      As the reviewer mentioned, it is clear that our V1V2 scaffolds produced in GnTi-cells contain the expected high-mannose glycans, as evident from a significant shift and sharpening of the protein bands on the SDS-PAGE gel upon deglycosylation with the PNGase enzyme.

      In our previously published studies by Chand et al.,2017* (ref. below), the V1V2 scaffolds were shown to bind to glycan-dependent PG9 antibody suggesting that the conformation of the PG9 epitope is retained in the high-mannose V1V2 scaffold. This information has also been added to the “Hypothesis and Experimental Design” section of the Results in the revised manuscript.

      Additionally, as shown in Results, the human antibodies elicited in study participants against native glycosylated envelope protein due to natural HIV-1 infection distinguished the H173 and Y 173 epitopes in the high-mannose scaffolds, which was also recapitulated in our mouse studies using the GnTi-expressed high-mannose V1V2 scaffolds as antigens.

      Therefore, it does not seem likely that differences in glycans per se majorly affected the binding or the conclusions from our studies.

      *Chand S, Messina EL, AlSalmi W, Ananthaswamy N, Gao G, Uritskiy G, Padilla-Sanchez V, Mahalingam M, Peachman KK, Robb ML, Rao M, Rao VB. Glycosylation and oligomeric state of the envelope protein might influence HIV-1 virion capture by α4β7 integrin. Virology. 2017 Aug;508:199-212. doi: 10.1016/j.virol.2017.05.016. Epub 2017 May 31. PMID: 28577856; PMCID: PMC5526109.

      (2) Similarly, the MD simulations appear to be performed without taking glycan structure/occupancy.

      We were unable to perform glycan-dependent MD simulation studies because of the high computational demands and also the technical limitations that existed at the time of the study several years ago. Therefore, we focused on the protein backbone of the short C-strand in the V2 region that lacks glycan sites and in previous published studies has been demonstrated as conformationally polymorphic.

      Since this C-strand epitope is the binding site for many V2-directed antibodies identified previously, we hypothesized that it is relevant to explore this small immunogenic epitope for its propensity to change conformation due to an escape mutation discovered at residue 173 in a natural HIV-1 infection. How might this epitope behave in MD simulations in the presence of different glycans requires further investigation.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      Liao et al leveraged two powerful genomics techniques-CUT&RUN and RNA sequencing-to identify genomic regions bound by and activated or inactivated by SMAD1, SMAD5, and the progesterone receptor during endometrial stromal cell decidualization. Additionally, the authors generated novel knock-in HA-SMAD1 and PA-SMAD5 tagged mice to combat antibody issues facing the field, generating a novel model to advance the study of BMP signaling in the female reproductive tract. During decidualization in a murine model, SMAD1/5 are bound to many genomic sites of genes important in decidualization and pregnancy and coregulated responses with progesterone receptor signaling.

      Strengths

      The authors utilized powerful next generation sequencing and identified important transcriptional mechanisms of SMAD1/5 and PGR during decidualization in vivo.

      Weaknesses<br /> None.

      Overall, the manuscript and study are well structured and provide critical mechanistic updates on the roles of SMAD1/5 in decidualization and preparation of the maternal endometrium for pregnancy.

      We thank you for the summary and consideration.

      Reviewer #2 (Public Review):

      Summary:

      Liao and colleagues generated tagged SMAD1 and SMAD5 mouse models and identified genome occupancy of these two factors in the uterus of these mice using the CUT&RUN assay. The authors used integrative bioinformatic approaches to identify putative SMAD1/5 direct downstream target genes and to catalog the SMAD1/5 and PGR genome co-localization pattern. The role of SMAD1/5 on stromal decidualization was assayed in vitro on primary human endometrial stromal cells. The new mouse models offer opportunities to further dissect SMAD1 and SMAD5 functions without the limitation from SMAD antibodies, which is significant. The CUT&RUN data further support the usefulness of these mouse models for this purpose.

      Strengths:

      The strength of this study is the novelty of new mouse models and the valuable cistromic data derived from these mice. Overall the present manuscript is an excellent resource paper for the field of reproductive biology.

      Weaknesses:

      The weakness of the present version of the manuscript includes the self-limited data analysis approaches such as the proximal promoter based bioinformatic filter and an outdated method on inferring the cell type composition. Evidence was provided for potential associations between SMAD1/5 and other major transcription factors. However, causal effects of SMAD1/5 on the genome occupancy of other major uterine transcription factors were discussed but not experimentally examined in the present manuscript, which is understandable.

      For data in Figure 2B, the current manuscript fails to elaborate the common and distinct features between clusters 1 and 3 as well as the biological significance of having two separate clusters for SMAD1. In addition, Figure S1A shows overlapping genome occupancy between SMAD1 and SMAD5, which is not clearly demonstrated in Figure 2B.

      Thank you for the comments. We’ve added additional interpretations in Lines 281-283, addressing the clustering results mentioned in Figure 2B as suggested. We do appreciate the overlapping genome occupancy in Cluster 1, although the signal intensities may differ between two groups.

      Lines 281-283:

      “Peaks in cluster 1 exhibit a shared enrichment for both SMAD1 and SMAD5, whereas clusters 2 and 3 demonstrate preferential enrichment for SMAD5 and SMAD1, respectively.”

      For data in Figure 5A, the result description does not provide adequate information to guide readers to full understanding of the data. The biological meaning behind the three PR clusters is not stated nor speculated. Moreover, Figure 5A and Figure S1B are inherently connected but fail to be adequately described in the main text.

      Thank you for the comments. We’ve added additional interpretations in Lines 415-421 discussing the clustering results mentioned in Figure 5A, together with Supplement Figure 1C (Former Supplement Figure 1B) as suggested.

      Lines 415-421:

      “Based on the k-means clustering results of the peaks, we demonstrated clusters with shared occupancy between SMAD1/5 and PR (cluster 1), preferential deposition in the SMAD1 (cluster 2), SMAD5 (cluster 4) and PR (clusters 3,5), respectively. Interestingly, between clusters 3 and 5, although the primary enrichment is for PR, overall, the signal intensities for SMAD5 are higher in cluster 5. Together with previous analysis on genes uniquely or commonly bound by SMAD1/5 (Supplement Figure 1A), we speculate such observation can be attributed to a subset of the genes that are potentially co-regulated by SMAD5 and PR.”

      Reviewer #3 (Public Review):

      Summary:

      As SMAD1/5 activities have previously been indistinguishable, these studies provide a new mouse model to finally understand unique downstream activation of SMAD1/5 target genes, a model useful for many scientific fields. Using CUT&RUN analyses with gene overlap comparisons and signaling pathway analyses, specific targets for SMAD1 versus SMAD5 were compared, identified, and interpreted. These data validate previous findings showing strong evidence that SMADs directly govern critical genes required for endometrial receptivity and decidualization, including cell adhesion and vascular development. Further, SMAD targets were overlapped with progesterone receptor binding sites to identify regions of potential synergistic regulation of implantation. The authors report strong correlations between progesterone receptor and SMAD1/5 direct targets to cooperatively promote embryo implantation. Finally, the authors validated SMAD1/5 gene regulation in primary human endometrial stromal cells. These studies provide a data-rich survey of SMAD family transcription, defining its role as a governor of early pregnancy.

      Strengths:

      This manuscript provides a valuable survey of SMAD1/5 direct transcriptional events at the time of receptivity. As embryo implantation is controlled by extensive epithelial to stromal molecular crosstalk and hormonal regulation in space and time, the authors state a strong, descriptive narrative defining how SMAD1/5 plays a central role at the site of this molecular orchestration. The implementation of cutting-edge techniques and models and simple comparative analyses provide a straightforward, yet elegant manuscript.

      Although the progesterone receptor exists as a major regulator of early pregnancy, the authors have demonstrated clear evidence that progesterone receptor with SMAD1/5 work in concert to molecularly regulate targets such as Sox17, Id2, Tgfbr2, Runx1, Foxo1 and more at embryo implantation. Additionally, the authors pinpoint other critical transcription factor motifs that work with SMADs and the progesterone receptor to promote early pregnancy transcriptional paradigms.

      Weaknesses:

      Although a wonderful new tool to ascertain SMAD1 versus SMAD5 downstream signaling, the importance of these factors in governing early pregnancy is not novel. Furthermore, functional validation studies are needed to confirm interactions at promoter regions. Additionally, the authors presume that all overlapped genes are shared between progesterone receptor and SMAD1/5, yet some peak representations do not overlap. Although, transcriptional activation can occur at the same time, they may not occur in the same complex. Thus, further confirmation of these transcriptional events is warranted.

      Thank you for the comments. We recognized this limitation and discussed future options regarding this in Lines 578-583.

      Lines 578-583:

      “In this study, we determined the overlapped transcriptional control between SMAD1/5 and PR at the gene level, and functionally validated the regulatory effect at the transcript level in a human stromal cell decidualization model. While we observe a subset of peak representations that do not overlap at the base pair level in the promoter regions, future functional screenings at the promoter level, such as luciferase reporter assays to assess transcriptional co-activation by SMAD1/5 and PR, will advance this study.”

      Since whole murine uterus was used for these studies, the specific functions of SMAD1/5 in the stroma versus the epithelium (versus the myometrium) remain unknown. Further work is needed to delineate binding and transcriptional activation of SMAD1/5 and the progesterone receptor in the uterine compartments.

      We thank the reviewer for the insightful comment. Given the multifaceted roles of SMAD1/5 play the female reproductive tract, we concur that future studies will benefit from a more compartmentalized approach, as discussed in Lines 526-538.

      Lines 526-538:

      “Published studies have shown that nuclear SMAD1/5 localize to the stroma and epithelium during the decidualization process at 4.5 dpc, during the window of implantation. Conditional deletion of SMAD1/5 exclusively in the uterine epithelium using lactoferrin-icre (Ltf-icre) results in severe subfertility due to impaired implantation and decidual development. Conditional deletion of SMAD1/5/4 exclusively in the cells from mesenchymal lineage (including uterine stroma) using anti-Mullerian hormone type 2 receptor cre (Amhr2-cre) results in infertility with defective decidualization. Given the essential roles of SMAD1/5 in both stroma and epithelium identified by previous studies, we believe that the transcriptional co-regulatory roles of SMAD1/5 and PR reported here using the whole uterus validates a relationship between SMAD1/5 and PR in both the stromal and epithelial compartments. However, it does not rule out potential coregulatory roles of SMAD1/5 and PR in the myometrium, immune cells, and/or endothelium, given that whole uterus was used. The specific transcriptional evaluations of SMAD1/5 in the stroma versus the epithelium would require future validations using single-cell sequencing and/or spatial transcriptomic analysis.”

      There are asynchronous gene responses in the SMAD1/5 ablated mouse model compared to the siRNA-treated human endometrial stromal cells. These differences can be confounding. Further investigation is needed to understand the meaning of these differences and as they relate to the entire SMAD transcriptome.

      Thank you for the comments. In the current study, we used human endometrial stromal cells as a model to validate our findings functionally, aiming to mimic the specific time point during decidualization. We acknowledge the similarities and differences between the mouse and human cell models, and this information needs to be considered when evaluating genome-wide effects on the transcriptome. This point is discussed ins Lines 589-597.

      Lines 589-597:

      “Since mice only undergo decidualization upon embryo implantation whilst human stromal cells undergo cyclic decidualization in each menstrual cycle in response to rising levels of progesterone, asynchronous gene responses may occur in comparison between mouse models and human cells. However, cellular transformation during decidualization is conserved between mice and humans, which makes findings in the mouse models a valuable and transferable resource to be evaluated in human tissues. Accordingly, our functional validation studies were performed using human endometrial stromal cells induced to decidualize in vitro for four days, which models the early phases of decidualization. Additional transcriptomic studies of the SMAD1/5 perturbations in human endometrial stromal cells will be of great resource in understanding the entire SMAD1/5 regulomes in humans.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The inference on the cell type composition could use updated bioinformatic tools, which are purely computational without costly and time-consuming wet-lab resources. Perhaps this part of the description could be streamlined if the authors chose to use the method in the current version.

      We thank the reviewer for the suggestion. We added the analysis of the cell type composition using the updated tool CIBERSORTx (PMID:31061481) and included the results and discussion regarding the cell type composition changes in Supplement Figure 1B and Lines 392-407.

      Lines 392-407

      “To explore the major cell types regulated by SMAD1/5, first, we used CIBERSORTx to analyze and depict changes in the cell populations upon SMAD1/5 depletion in the mouse uterus during early pregnancy. By imputing the bulk uterine gene expression profiles to previously published mouse uterine single-cell datasets using CIBERSORTx, we were able to compare changes across both samples and cell types upon the SMAD1/5 perturbation in the mouse uterus. We highlight the proportional increase in the epithelial cells, as well as the decrease in the decidual stromal cells and smooth muscle cells in mice lacking uterine SMAD1/5 during the periimplantation phase (Supplement Figure 1B). Such cell populational changes are in line with the phenotypical observations of decidualization failure and excessive proliferation in the epithelial compartment. In addition, to explore the expression patterns of SMAD1/5 direct targets in human, we profiled the expression levels of the key “up-targets” and “down-targets” in the different cell types of the human endometrium. Using previously published single-cell RNA seq data of human endometrium, we visualized the expression patterns of suppressive targets and activating targets of SMAD1/5 (Figure 4E). Apart from the major epithelial and stromal compartments, SMAD1/5 target genes are also widely expressed in the immune cell populations. Such observations reinforced the importance of the BMP signaling pathways in establishing an immune-privileged environment at the maternal-fetal interface.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigated the factors related to understudied genes in biomedical research. It showed that understudied genes are largely abandoned at the writing stage, and it identified a number of biological and experimental factors that influence which genes are selected for investigation. The study is a valuable contribution to this branch of meta-research, and while the evidence in support of the findings is solid, the interpretation and presentation of the results (especially the figures) needs to be improved.

      We thank the editor and reviewers for their detailed and thoughtful assessment of our work. Below, we present detailed responses to reviewers’ comments and suggestions. We are also submitting a version edited for clarity of presentation and precision of interpretation.

      Following the eLife assessment, we also tried to identify further statements where results could be presented in a more precise way.

      First, in the section Subsequent reception by other scientists does not penalize studies on understudied genes, we now state “This result again opposes the hypothesis that less-investigated genes will yield articles with lower impact.”

      Second, in section Identification of biological and experimental factors associated with selection of highlighted genes, we now state:

      “We cautiously hypothesize that this might reflect on many different research groups producing reagents surrounding the genes that they actively study. The most informative continuous factor is the number of research articles about a gene (Figure 1B).”, removing claims of causality.

      Finally, for improved readability, we have moved all supplemental tables into separate .xlsx files.

      Reviewer #1 (Public Review):

      Summary and strengths

      The authors tried to address why only a subset of genes are highlighted in many publications. Is it because these highlighted genes are more important than others? Or is it because there are non-genetic reasons? This is a critical question because in the effort to discover new genes for drug targets and clinical benefit, we need to expand a pool of genes for deep analyses. So I appreciate the authors' efforts in this study, as it is timely and important. They also provided a framework called FMUG (short for Find My Understudied Gene) to evaluate genes for a number of features for subsequent analyses.

      We thank the reviewer for their insightful comments and are pleased that the reviewer shares our appreciation for the gravity of these questions. As the reviewer emphasizes, it is critical to understand whether the choice of genes reflects their importance or non-genetic reasons. Previously we and others demonstrated that this choice does not reflect biological importance, when the latter is assessed through unbiased genome-wide data (e.g.: Haynes et al., 2018; Stoeger et al. 2018). Now we contribute to this critical question by systematically evaluating individual non-genetic reasons. We address the reviewer’s comments below.

      Weaknesses

      Many of the figures are hard to comprehend, and the figure legends do not sufficiently explain them.

      For example, what was plotted in Fig 1b? The number of articles increased from results -> write-ups -> follow-ups in all four categories with different degrees. But it does not seem to match what the authors meant to deliver.

      We apologize for the lack of clarity. We identified two interrelated elements that we have now fixed: i) the prior figure legend provided for each genomics approach n number of articles, such as “GWAS (n=450 articles)”; ii) the prior y-axis was labelled “Number of articles”.

      Addressing the first element, we now rephrased the legend for clarity:

      “b, We identified articles reporting on genome-wide CRISPR screens (CRISPR, 15 focus articles and 18 citing articles), transcriptomics (T-omics, 148 focus articles and 1,678 citing articles), affinity purification–mass spectrometry (AP-MS, 296 focus articles and 1,320 citing articles), and GWAS (450 focus articles and 3,524 citing articles). Focusing only on protein-coding genes (white box plot), we retrieved data uploaded to repositories describing which genes came up as “hits” in each experiment (first colored box plot). We then retrieved the hits mentioned in the titles and abstracts of those articles (second colored box plot) and hits mentioned in the titles and abstracts of articles citing those articles (third colored box plot). Unique hit genes are only counted once.”

      The number of genes in each box plot is now reported in the x-axis labels for each step. For example, the results for CRISPR were obtained from 15 focus studies (original research) and 18 subsequent studies (papers citing focus articles). Those 15 studies identified 9,268 genes where loss-of-function changed phenotypes but, in their titles and abstracts, mentioned only 18 of those 9,268 genes. While the 9,268 hit genes have received similar research attention to the entirety of protein-coding genes, the 18 hit genes mentioned in the title or abstract are significantly more well studied. The articles citing the focus articles also only mentioned in their titles and abstracts 19 highly studied hit genes.

      Addressing the second element, we updated the axis label to “Number of articles about gene”, to distinguish it from number of articles mentioned in the legend, convey that this is the number of articles about each gene that were published independently of the genomics assays we inspect. To further underscore this point we now label the “20% highest-studied genes” that we mention in the main text, and reworded the figure caption to better capture where the critical increase occurs: “A shift in focus towards well-studied genes occurs during the summarization and write-up of results and remains in subsequent studies.”.

      Fig 4 is also confusing. It appears that the genes were clustered by many features that the authors developed. But does it have any relationship with genes being under- or over-studied?

      We again apologize for the lack of clarity. As is described in the main text, while the results of Figs. 1-2 suggest that gene popularity may be predict the highlighting of a differentially expressed gene in the title or abstract, we want to conduct a systematically analysis of the factors that correlate with such a decision. We thus build a set of 45 factors that have been discussed as factors explaining why some genes receive increased research attention.

      The data in Fig. 4 shows that those 45 factors are not independent but that some are highly correlated. Because of those correlations, we are able to select a smaller number as representative of the full set. Those are the default factors shown to users of FMUG. While users can choose all factors that are significantly correlated with the highlighting in title or abstract, the default of presenting factors representing different clusters of factors enabled us to limit the number of factors that are initially displayed.

      Please note that following the suggestion of Reviewer 3, we have now moved this Figure to the supplemental material, as Figure S11.

      Reviewer #2 (Public Review)

      Summary and strengths

      In this manuscript the authors analyse the trajectory of understudied genes (UGs) from experiment to publication and study the reasons for why UGs remain underrepresented in the scientific literature. They show that UGs are not underrepresented in experimental datasets, but in the titles and abstracts of the manuscripts reporting experimental data as well as subsequent studies referring to those large-scale studies. They also develop an app that allows researchers to find UGs and their annotation state. Overall, this is a timely article that makes an important contribution to the field. It could help to boost the future investigation of understudied genes, a fundamental challenge in the life sciences. It is concise and overall well-written, and I very much enjoyed reading it. However, there are a few points that I think the authors should address.

      We thank the reviewer for their kind assessment.

      Weaknesses

      The authors conclude that many UGs "are lost" from genome-wide assay at the manuscript writing stage. If I understand correctly, this is based on gene names not being reported in the title or abstract of these manuscripts. However, for genome-wide experiments, it would be quite difficult for authors to mention large numbers of understudied genes in the abstract. In contrast, one might highlight the expected behaviour of a well-studied protein simply to highlight that the genome-wide study provides credible results.

      We agree that it is not reasonable to expect a title or abstract to highlight hundreds or even thousands of differentially expressed genes. We’ve now extended our Study Limitations section to address this:

      “we take a gene being mentioned in the title or abstract of an article as a proxy for a gene receiving attention by the article’s authors. The title and abstract are space-limited and thus cannot accommodate discussion of large numbers of genes.”

      We also agree that highlighting the expected behavior of a well-studied protein may provide credibility to a study and increase confidence on other results. The soundness of such a strategy was quantitatively studied in a study by Uzzi et al. (Science 2013), which we now include in the section on study limitations as:

      “authors beginning manuscripts with something familiar before introducing something new”.

      To convey the practical limitation of abstracts needing to be concise, we added the following sentence to our discussion section, when suggesting controlled trials that add genes to abstracts:

      “This intervention would need to be carefully designed since abstracts are limited in their size.”

      To avoid over-interpretation we have in the discussion also extended the sentence on “lost in a leaky pipeline” to “lost to titles and abstracts of research articles in a leaky pipeline”.

      Our focus on titles and abstracts has been equally motivated by their availability (full text still is often behind paywalls and/or not accessible for bulk-download and text-mining) and by abstracts being the most visible and most read parts of research articles (e.g.: bioRxiv estimates that for the preprint for the present manuscript, the abstract was read ~10 times more frequently than full-text HTML and 4 times more frequently than the pdf).

      Could this bias the authors' conclusions and, if so, how could this be addressed? For example, would it be worth to normalise studies based on the total number of genes they cover?

      We previously described that – in line with the reviewer’s expectations – unstudied genes are preferentially added to the title or abstract of articles that feature more genes in the title or abstract (Stoeger et al., Plos Biology, 2022; Fig. 2B). Normalizing by the total number of genes should thus preserve the pronounced division between well-studied genes and unstudied genes show in Figure 1B. In line with these predictions, we randomly select one gene per title/abstract and find that the effect remains (see new Figure S7).

      Author response image 1.

      Figure 1B is confusing in its present form. I think the plot and/or the legend need revising. For example, what "numbers to the right of each box plot" are the authors referring to? Also, I assume that the filled boxes are understudied genes and the empty/white box is "all genes", but that's not explained in the legend. In the main text, the figure is referred to with the sentence "we found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature ". I cannot follow how the figure shows this. My interpretation is that the y-axis is not showing the number of articles, but represents the percentage of articles mentioning a gene in the title/abstract, displayed on a log scale. If so, perhaps a better axis labels and legend text could be sufficient. But then one would also need to somehow connect this to the statement in the main text about the 20% highest-studied genes (a dashed line?). Alternatively, the authors could consider other ways of plotting these data, e.g. simply plotting the "% of publication in which a gene appears" from 0-100% or so.

      Reviewer 1 raised a similar point on overall figure clarity. We identified two interrelated elements that contribute to overall confusion and have now fixed them (see response to Reviewer 1 beginning on page 2 of this document).

      We attempted an alternative plotting of Fig 1B according to the reviewer’s suggestion. In the version below, the y-axis instead shows the percent of gene-related articles that are about each gene. We chose to keep the original y-axis (showing number of articles about each gene) as it additionally conveys the absolute scale of scholarship on individual genes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary and strengths

      The manuscript investigated the factors related to understudied genes in biomedical research. It showed that understudied are largely abandoned at the writing stage and identified biological and experimental factors associated with selection of highlighted genes.

      It is very important for the research community to recognize the systematic bias in research of human genes and take precautions when designing experiments and interpreting results. The authors have tried to profile this issue comprehensively and promoted more awareness and investigation of understudied genes.

      We thank the reviewer for their kind assessment of our work.

      Weaknesses

      Regarding result section 1 "Understudied genes are abandoned at synthesis/writing stage", the figures are not clear and do not convey the messages written in the main text. For example, in Figure 1B, figure S5 and S6,

      • There is no "numbers to the right of each box plot".

      The “numbers to the right” statement in the caption was an erroneous inclusion from an earlier version of the figure. We apologize for our error and have now removed this statement.

      • Do these box plots only show understudied genes? How many genes are there in each box plot? The definition and numbers of understudied genes are not clear.

      The x-axis describes genes featured in each stage of the publication process (from all protein-coding genes to genes found as hits in genome-wide screen to genes found in the title/abstract to genes found in the title/abstract of citing articles) and the y-axis describes the number of articles annotated to those genes. We have also now added the number of genes in each box plot to the figure. This information is also in Materials and Methods under each technology’s heading (see also response to Reviewer 1 beginning on page 2 of this document).

      Author response image 3.

      • "We found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature (Figure 1B)". This is not clear from the figure.

      We have revised Figure 1B and its caption to better communicate the main point of the figure: that genes which make it to the title/abstract of the reporting article tend to be more popular than genes which are hits in genome-wide experiments from those articles. We have added a horizontal line that shows the cutoff for the top 20% most popular genes.

      Regarding result section 2 "Subsequent reception by other scientists does not penalize studies on understudied genes", the authors showed in figure 2 that there is a negative correlation between articles per gene before 2015 and median citations to articles published in 2015. Another explanation could be that for popular genes, there are more low-quality articles that didn't get citations, not necessarily that less popular genes attract more citations.

      We believe that both explanations for the observed phenomenon are not mutually exclusive. Previously, we focused on the median of citations to articles about a gene to capture the typical effect. In a new analysis, we also find support for the possibility outlined by the reviewer and believe that adding this to our manuscript complements and balances our analysis of citations. Specifically, in the new Figure S8B we find that most popular genes are slightly more likely to be among least cited papers (and in Figure S8A that the least studied genes have been much more likely to be among the most cited papers). In-text, we state:

      “Further, since 1990, articles about the least popular genes have at times been 3 to 4 times more likely to be among the most cited articles than articles on the most popular genes whereas articles on the most popular genes have been slightly less to be highly cited than lowly cited (Figure S8)”.

      We thank the reviewer for their suggestion, which strengthens our manuscript. The figure caption reads:

      “Figure S8: Likelihoods of being highly cited (top 5% of citations among all articles about genes, panel a) or lowly cited (bottom 5% of citations among all articles about genes, panel b) for articles about the most popular genes (top 5% accumulated articles) versus articles about the least popular genes (bottom 5% accumulated articles) by year of publication. Only articles with a single gene in the title/abstract are considered. Shaded regions show ±1 standard error of the proportion."

      Author response image 4.

      Regarding result section 3 "Identification of biological and experimental factors associated with selection of highlighted genes", in Figure 3 and table s2, the author stated that "hits with a compound known to affect gene activity are 5.114 times as likely to be mentioned in the title/abstract in an article using transcriptomics", The number 5.144 comes out of nowhere both in the figure and the table. In addition, figure 4 is not informative enough to be included as a main figure.

      This is the result of both a typo and imprecise terminology. The number should read 4.262 (the likelihood ratio of being mentioned in the title/abstract between genes with and without a compound), which corresponds to an odds ratio of 4.331. We have clarified this in the table caption, stating:

      “e.g. hits with a compound known to affect gene activity are 4.262 times as likely to be mentioned in the title/abstract in an article using transcriptomics, corresponding to an odds ratio of 4.331".

      We have removed Figure 4 as a main-text figure and added a version, with revised color scheme along comments of Reviewer 1, as Figure S11. We added to the figure caption “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      • Fig 2a shows that papers highlighting understudied genes are actually cited more. I wonder why authors only looked at data before 2015. Fig 2b shows an increased correlation since 2015. Please consider redrawing Fig 2a to include data from 2015-2020?

      We highlight data from 2015 since, from our used version of iCite (v32, released July 2022, covering citations made through most of 2021), papers published in 2015 have had about 6 years to accumulate citations. With fewer years to accumulate citations, insufficient signal may cause correlation to converge toward zero. Below, we repeat the analysis in Figure 2 but only considering citations made within a year of an article’s publication, which substantially reduces correlation (although remaining significant).

      Author response image 5.

      We added a note to the figure caption:

      “We forgo depicting more recent years than 2015 to allow for citations to accumulate over multiple years, providing a more sensitive and robust readout of long-term impact.”

      For Figure 2B, we add:

      “For more recent years, where articles have had less time to accumulate citations, insufficient signal may cause correlation to converge toward zero.”

      • Can FMUG be posted on the web for easy access by researchers with non-computational backgrounds?"

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #2 (Recommendations for the authors):

      • Related to the first weakness in my public review: The observed disparity between CRISPR and GWAS study in terms of which genes they promote to the abstract is interesting. I wonder if this has to do with the application of these techniques. GWAS studies will often highlight that they retrieve known associations between a gene and a phenotype, to show that a screen is working. I guess often the point is to subsequently identify more genes associated with a particular phenotype, but often it is unclear how to validate/verify newly found associations. In contrast, CRISPR screens might be more focussed on functionally/mechanistically understanding unknown processes, e.g. observing a phenotype that appears/disappears in response to a gene deletion. In such studies, the follow-up of a previously unknown gene could be more straightforward and relevant to the outcome. Does that mean CRIPSR screens are better than GWAS studies for addressing the UG problem? Perhaps the authors could briefly discuss this issue.

      The number of studies we included featuring CRISPR screens is relatively small (n = 15 compared to n = 450 for GWAS). Thus, it is not possible to conclude in a statistically sound manner whether authors of CRISPR screens are truly more likely to highlight understudied genes.

      However, the reviewer raises compelling reasons for why this might be the case, and we now embed the broader discussion point that some techniques might be more powerful toward understudied genes.

      The discussion now includes:

      “Further, the observed discrepancy between the popularity of hits highlighted by GWAS versus other technologies suggests that some -omics technologies may be more powerful than others for characterizing understudied genes. This possibility merits further research and researchers participating in unknomics should consider the relative strengths of each technology towards providing tractable results for follow-up.”

      • Affinity capture mass spectrometry (Aff-MS): Perhaps I misunderstood this, but typically this is referred to as affinity purification MS (AP-MS)

      Thank you for the clarification. We have changed ‘Aff-MS’ to ‘AP-MS’ throughout the manuscript.

      • Page 3, line 96. The sentence "The first possibility is that seemingly understudied genes are, in fact, not understudied as they would rarely be identified through experiments.". Would they not still be understudied, just not intentionally?

      We have rephrased this sentence to:

      “The first possibility is that some genes are less studied because they are rarely identified as hits in experiments.”

      • Fig 4 is very interesting, but I also found it a bit confusing. First, the choice of colour scheme, where blue shows the absence and white shows the presence of something, seems counterintuitive, especially on a white background. Second, I find it confusing that only some of the experiments are labelled in the heatmap. Could the authors not simply use Fig S9 as Fig 4? Or alternatively, only include the 8 labelled factors in the simplified figure.

      In line with this feedback and that of Review #1 and #3, we have removed Figure 4 as a main-text figure and instead include this figure as Supplementary Figure S11. We have reversed the color scheme so that purple indicates one and white indicates zero. We also now label all factors. Previously we had only listed the default features of FMUG. We also now updated the figure legend to convey how it assisted the choice of default factors in FMUG. It reads:

      “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)”.

      • The FMUG app is fantastic and sounds exactly like something that is required to boost the visibility of understudied genes and overcome the understudied gene bias. However, I did not understand the choice of reporting this in the Discussion section.

      We thank the reviewer for their enthusiasm, and have now moved FMUG into the results section.

      • To further increase usability of the FMUG app, is there a way it could be deployed online? I appreciate this could require a major amount of coding work, which would not be reasonable to demand. So please consider this a suggestion, potentially for a future implementation.

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #3 (Recommendations for the authors):

      Table s2 and s3: p values are indicated by star signs. However, with so many hypothesis tests, the p values should be corrected for multiple tests.

      We have now applied Benjamini-Hochberg multiple hypothesis correction to these tables, correcting p-values within each of the four technologies. We update our significance calling to read:

      “We identified 45 factors that relate to genes and found 33 (12 out of 23 binary factors and 21 out of 22 continuous factors) associated with selection in at least one assay type at Benjamini-Hochberg FDR < 0.001.”

      Figure S1 - S4

      These figures contain too many noninformative boxes. In all the figures, only the last three boxes are informative (reports assessed for eligibility, reports excluded, and studies included in review). The rest boxes convey little information and should be simplified.

      We have simplified these diagrams, removing boxes which contained no information.

      Figure S6: what does it mean by "prior to the publication of the first article represented in this sample"? What is "this sample"?

      “This sample” refers to the collection of 450 GWAS articles, 296 articles using AP-MS, 148 transcriptomics articles, and 15 genome-wide CRISPR screen articles. We have rephrased this sentence to make this clear. It now reads:

      “Variant of Figure 1B only considering articles published in 2002 or before, prior to the publication of any of the articles featuring -omics experiments which we considered for this analysis.”

    1. Author Response

      Reviewer #1 (Public Review):

      Reviewer 1: The structural part of this work is interesting, as it is the first structure of Pin1 with a ligand that bridges both domains. They might want to underline this - all other structures in the PDB have a single domain complex, but never both domains by a single longer peptide.

      Done. We have highlighted the novelty of the structure in the abstract, introduction (page 5); and discussion (section “The Pin1-PKC interface is described by a novel bivalent interaction mode”, page 24).

      Reviewer 1: I would however question the static representation of this structure - the 90{degree sign} kink in the peptide when complexed is probably one single snapshot, but I hardly believe the PPIase/WW domain orientation to be static. Unless the authors have additional information to stand by this static structure, this point merits being commented on in the manuscript.

      Done. Following the reviewer’s suggestion and to avoid the impression of “static” structure, we have added sentences that highlight the dynamic aspects of the complex evident from the entire ensemble representation of Figure 5-figure supplement 2:

      Page 15 (Results):

      “Of note, the linker region connecting the two domains retains its flexibility in the complex and confers some variability onto the relative positions of the WW and PPIase domains, as is evident from the ensemble representation of Figure 5-figure supplement 2. The complex exhibits novel structural features that distinguish it from all other structures of Pin1 complexes known to date. These features are highlighted in Fig. 6 using the lowest-energy structure of the ensemble.”

      Page 24 (Discussion): “Moreover, the retention of linker flexibility in the Pin1::pV5bII complex suggests that Pin1 can potentially adopt minor “extended” states that would not be readily detectable by ensemble-averaged methods such as solution NMR.”

      Also, in describing specific interactions in the section “Structural basis of the Pin1-PKCII C-term bivalent recognition mode”, we now note how many structures of the Pin1-pV5bII ensemble have those interactions.

      Reviewer 1: I would like to point out to literature that described for example the non-canonical binding (Yeh ES, Lew BO & Means AR (2006) The loss of PIN1 deregulates cyclin E and sensitizes mouse embryo fibroblasts to genomic instability. J Biol Chem 281, 241-251. Pin1 recognizes cyclin E via a noncanonical pThr384- Gly385 motif [33] rather than the pThr380-Pro381 motif.). They mention briefly the absence of isomerase activity in similar TPP motifs, but this information might already come in the Results section.

      Done. We have incorporated this information in the Discussion section, page 25 (last paragraph).

      Reviewer 1: The expression levels of Pin1 and PKCa are amazingly linear (Fig 7A), but when they overexpress WT Pin1 in a KO line, with 3-4 times higher overexpression, the PKCa levels are hardly higher than in the original WT cell line.

      We thank Reviewer 1 for raising this interesting point. Our simple interpretation of the data is that physiological expression of Pin1 in the cell model we use is a limiting factor in the stimulated PKCa degradation pathway, but that Pin1 is no longer a limiting factor at higher expression levels. We now include this point in the Discussion, page 26.

      Reviewer 1: Also, the levels in the W34A/R68A/R69A (abolishing both WW and PPIase binding functions) are surprising, why would PKCa levels rise above the level found in the Pin1 KO cells?

      This result remains a puzzle but, as we are including all independent biological replicates in the analysis, the data are the data. Moreover, by assessing the functional complementation data to the KO by two-tailed t-test (see last point below), this effect does not reach statistical significance. Nonetheless, as the result is reproducible, we now comment on this effect in the Results, page 21. One speculation is this triple mutant has dominant negative properties imposed on some limiting factor in PKCa degradation that are revealed in the absence of WT Pin1. Considerably more work needs to be done to settle this issue. However, in light of the fact that this result does not conflict with the structural/biochemical data (rather, it is consistent with it), we hope this positive response satisfies the Reviewer.

      Reviewer 1: Finally, if even slight overexpression of the C113S catalytically inactive mutant leads to more efficient PKCa degradation than overexpression of the WT Pin1 (Figure 7C), it is hard to interpret. The conclusion that Pin1-mediated regulation of PKCa requires a bivalent interaction mode of Pin1 with PKCa independent of its catalytic activity do depend on these data, so they merit further analysis.

      We certainly had no intention of concluding that the C113S catalytically inactive mutant is more efficient with regard to promoting PKCa degradation than overexpression of the WT Pin1. That overstates the data. We concede that our organization of the Pin1 rescue data in the original Fig 7C confused the issue, and that the original text also invited conclusions that overstate the result. To correct this problem, we reorganized Fig. 7C to simplify the presentation by comparing the complementation data to the KO. All statistical comparisons are now to the KO cell line (not to WT as before) and we employ the two-tailed t-test to compare the data. Statistical significance is attained only for reconstituted WT and C113S Pin1 expression. The text is also appropriately revised to describe the results clearly. We trust the Reviewer agrees that the C113S data are compelling and are consistent with a noncanonical (noncatalytic) mode of PKCa regulation by Pin1. This is a major point of Fig 7C as it links the structural/biochemical data to a cellular context.

    1. Author Response

      eLife assessment

      This computational study is a valuable empirical investigation into the common trait of neurons in brains and artificial neural networks: responding effectively to both objects and their mirror im- ages and it focuses on uncovering conditions that lead to mirror symmetry in visual networks and the evidence convincingly demonstrates that learning contributes to expanding mirror symmetry tuning, given its presence in the data. Additionally, the paper delves into the transformation of face patches in primate visual hierarchy, shifting from view specificity to mirror symmetry to view invariance. It empirically analyzes factors behind similar effects in two network architec- tures, and key claims highlight the emergence of invariances in architectures with spatial pooling, driven by learning bilateral symmetry discrimination and importantly, these effects extend be- yond faces, suggesting broader relevance. Despite strong experiments, some interpretations lack explicit support, and the paper overlooks pre-training emergence of mirror symmetry.

      As detailed above, we have now analyzed several convolutional architectures and made a direct link between the artificial neural networks and neuronal data to further support our claims (refer to Figure 6, S10- 13).

      To address the concern about pre-training emergence of mirror symmetry, we conducted a new analysis inspecting unit-level response profile, following Baek and colleagues (2021). This analysis is described in detail below (response to R3). In brief, we found that the first fully connected layer in trained networks exhibits twice the number of mirror symmetric units found before training. In addition to our population-level observations (Fig. S2) and explicit training- dataset manipulations (Fig. 4), this finding supports the interpretation of training to discriminate among mirror- symmetric object categories as a major factor behind the emergence of mirror symmetric viewpoint tuning.

      Reviewer 1 (Public Review):

      By using deep convolutional neural networks (CNNs) as model for the visual system, this study aims at understanding and explaining the emergence of mirror-symmetric viewpoint tuning in the brain.

      Major strengths of the methods and results:

      1) The paper presents comprehensive, insightful and detailed analyses investigating how mirror- symmetric viewpoint tuning emergence in artificial neural networks, providing significant and novel insights into this complex process.

      2) The authors analyze reflection equivariance and invariance in both trained and untrained CNNs’ convolutional layers. This elucidates how object categorization training gives rise to mirror-symmetric invariance in the fully-connected layers.

      3) By training CNNs on small datasets of numbers and a small object set excluding faces, the authors demonstrate mirror-symmetric tuning’s potential to generalize to untrained categories and the necessity of view-invariant category training for its emergence.

      4) A further analysis probes the contribution of local versus global features to mirror-symmetric units in the first fully-connected layer of a network. This innovative analysis convincingly shows that local features alone suffice for the emergence of mirror-symmetric tuning in networks.

      5) The results make a clear prediction that mirror-symmetric tuning should also emerge for other bilaterally symmetric categories, opening avenues for future neural studies.

      We are grateful for your insightful feedback and the positive evaluation of our study on mirror-symmetric viewpoint tuning in neural networks. Your constructive comments considerably improved the manuscript. We eagerly look forward to exploring the future research avenues you have highlighted.

      Major weaknesses of the methods and results:

      Point 1.1) The authors propose a mirror-symmetric viewpoint tuning index, which, although innovative, complicates comparison with previous work and this choice is not well motivated. This index is based on correlating representational dissimilarity matrices (RDMs) with their flipped versions, a method differing from previous approaches.

      We have revised the Methods section to clarify the motivation for the mirror-symmetric viewpoint tuning index we introduced.

      Manuscript changes:

      Previous work quantified mirror-symmetry in RDMs by comparing neural RDMs to an idealized mirror- symmetric RDM (see Fig. 3c-iii in [14]). Although highly interpretable, such an idealized RDM encompasses implicit assumptions about representational geometry that are unrelated to mirror-symmetry. For example, consider a neural RDM reflecting perfect mirror-symmetric viewpoint tuning and wherein for each view, the distances among all of the exemplars are equal. Such a neural RDM would fit an idealized mirror- symmetric RDM better than a neural RDM reflecting perfect mirror-symmetric viewpoint tuning but with non-equidistant exemplars. In contrast, the measure proposed in Eq. 2 equals 1.0 in both cases.

      Point 1.2> Faces exhibit unique behavior in terms of the progression of mirror-symmetric viewpoint tuning and their training task and dataset dependency. Given that mirror-symmetric tuning has been identified in the brain for faces, it would be beneficial to discuss this observation and provide potential explanations.

      We revised the caption of Figure S1 to explicitly address this point:

      Manuscript changes:

      For face stimuli, there is a unique progression in mirror-symmetric viewpoint tuning: the index is negative for the convolutional layers and it abruptly becomes highly positive when transitioning to the first fully connected layer. The negative indices in the convolutional layers can be attributed to the image-space asymmetry of non-frontal faces; compared to other categories, faces demonstrate pronounced front-back asymmetry, which translates to asymmetric images for all but frontal views (Fig. S8). The features that drive the highly positive mirror-symmetric viewpoint tuning for faces in the fully connected layers are training-dependent (Fig. S2), and hence, may reflect asymmetric image features that do not elicit equivariant maps in low-level representations; for example, consider a profile view of a nose. Note that cars and boats elicit high mirror- symmetric viewpoint tuning indices already in early processing layers. This early mirror-symmetric tuning is independent of training (Fig. S2), and hence, may be driven by low-level features. Both of these object categories show pronounced quadrilateral symmetry, which translates to symmetric images for both frontal and side views (Fig. S8).

      Point 1.3: 3. Previous work reported critical differences between CNNs and neural represen- tations in area AL indicating that mirror-symmetric viewpoint tuning is less present than view invariance in CNNs compared to area AL. While such findings could potentially limit the use- fulness of CNNs as models for mirror-symmetric viewpoint tuning in the brain, they are not addressed in the study.

      This point is now addressed explicitly in the caption of Figure S9:

      Manuscript changes:

      Yildirim and colleagues [14] reported that CNNs trained on faces, notably VGGFace, exhibited lower mirror- symmetric viewpoint tuning compared to neural representations in area AL. Consistent with their findings, our results demonstrate that VGGFace, trained on face identification, has a low mirror-symmetric viewpoint tuning index. This is especially notable in comparison to ImageNet-trained models such as VGG16. This difference between VGG16 and VGGFace can be attributed to the distinct characteristics of their training datasets and objective functions. The VGGFace training task consists of mapping frontal face images to identities; this task may exclusively emphasize higher-level physiognomic information. In contrast, training on recognizing objects in natural images may result in a more detailed, view-dependent representation. To test this potential explanation, we measured the average correlation-distance between the fc6 representations of different views of the same face exemplar in VGGFace and VGG16 trained on ImageNet. The average correlation-distance between views is 0.70±0.04 in VGGFace and 0.93±0.04 in VGG16 trained on ImageNet. The converse correlation distance between different exemplars depicted from the same view is 0.84±0.14 in VGGFace and 0.58±0.06 in VGG16 trained on ImageNet. Therefore, as suggested by Yildirim and colleagues, training on face identification alone may result in representations that cannot explain intermediate levels of face processing.

      Point 1.4) The study’s results, while informative, are qualitative rather than quantitative, and lack direct comparison with neural data. This obscures the implications for neural mechanisms and their relevance to the broader field.

      We addressed this point by conducting a quantitative comparison between the architectures of various networks and neural response patterns in monkey face patches (see Figures 6, S10-S13, appearing above).

      Point 1.5) The study provides compelling evidence that learning to discriminate bilaterally symmetric objects (beyond faces) induces mirror-symmetric viewpoint tuning in the networks, qualitatively similar to the brain. Moreover, the results suggest that this tuning can, in principle, generalize beyond previously trained object categories. Overall, the study provides important conclusions regarding the emergence of mirror-symmetric viewpoint tuning in networks, and potentially the brain. However, the conducted analyses and results do not entirely address the question why mirror-symmetric viewpoint tuning emerges in networks or the brain. Specifically, the results leave open whether mirror-symmetric viewpoint tuning is indeed necessary to achieve view invariance for bilaterally symmetric objects.

      We believe that mirror-symmetric viewpoint tuning is not strictly necessary for achieving view-invariance. However, it is a plausible path from view-dependence to view invariance. We addressed this point in the updated limitations subsection of the discussion.

      Manuscript changes:

      A second consequence of the simulation-based nature of this study is that our findings only establish that mirror-symmetric viewpoint tuning is a viable computational means for achieving view invariance; they do not prove it to be a necessary condition. In fact, previous modeling studies [10, 19, 61] have demonstrated that a direct transition from view-specific processing to view invariance is possible. However, in practice, we observe that both CNNs and the face-patch network adopt solutions that include intermediate representations with mirror-symmetric viewpoint tuning.

      Taken together, this study moves us a step closer to uncovering the origins of mirror-symmetric tuning in networks, and has implications for more comprehensive investigations into this neural phenomenon in the brain. The methods of probing CNNs are innovative and could be applied to other questions in the field. This work will be of broad interest to cognitive neuroscientists, psychologists, and computer scientists.

      We appreciate your acknowledgment of our study’s contribution to understanding mirror-symmetric tuning in networks and its wider implications in the field.

      Reviewer 2 (Public Review);

      Strengths

      1) The statements made in the paper are precise, separating observations from inferences, with claims that are well supported by empirical evidence. Releasing the underlying code repository further bolsters the credibility and reproducibility. I especially appreciate the detailed discussion of limitations and future work.

      2) The main claims with respect to the two convolutional architectures are well supported by thorough analyses. The analyses are well-chosen and overall include good controls, such as changes in the training diet. Going beyond ”passive” empirical tests, the paper makes use of the fully accessible nature of computational models and includes more ”causal” insertion and deletion tests that support the necessity and sufficiency of local object features.

      3) Based on modeling results, the paper makes a testable prediction: that mirror-symmetric viewpoint tuning is not specific to faces and can also be observed in other bilaterally symmetric objects such as cars and chairs. To test this experimentally in primates (and potentially other model architectures), the stimulus set is available online.

      We express our gratitude for your constructive feedback. Your acknowledgment of the clarity of our statements and the robustness of our empirical evidence is greatly appreciated. We are also thankful for your recognition of our comprehensive analyses and the testable predictions arising from our work.

      Point 2.1: Weaknesses

      My main concern with this paper is in its choice of the two model architectures AlexNet and VGG. In an earlier study, Yildirim et al. (2020) found an inverse graphics network ”EIG” to better correspond to neural and behavioral data for face processing than VGG. All claims in the paper thus relate to a weaker model of the biological effects since this work does not analyze the EIG model. Since EIG follows an analysis-by-synthesis approach rather than standard classification training, it is unclear whether the claims in this paper generalize to this other model architecture. It is also unclear if the claims will hold for: 1) transformer architectures, 2) the HMAX architecture by Leibo et al. (2017) which has also been proposed as a computational explanation for mirror-symmetric tuning, and, as the authors note in the Discussion, 3) deeper architectures such as ResNet-50 which tend to better align to neural and behavioral data in general. These architectures include different computational motifs such as skip connections and a much smaller proportion of fully-connected layers which are a major focus of this work.

      Overall, I thus view the paper’s claims as limited to AlexNet- and VGG-like architectures, both of which fall behind state-of-the-art in their alignment to primates in general and also specifically for mirror-symmetric viewpoint tuning.

      We understand your concern regarding the choice of AlexNet and VGG architectures. The decision to focus on these models was driven by the need for a straightforward macroscopic correspondence between the layer structure of the artificial networks and the ventral visual stream. However, acknowledging this potential limitation of generality, we have expanded our analysis to include the EIG model, a transformer architecture, the HMAX model, and deeper convolutional architectures like ResNet-50 and ConvNeXt. Our revised analysis, detailed in Figures S1, S9, and S10-S13, incorporates these additional models and offers a comprehensive evaluation of their brain alignment and mirror-symmetric viewpoint tuning. We found that while the architectures indeed vary in their computational motifs, the emergence of mirror-symmetric viewpoint tuning is not exclusive to AlexNet and VGG. It occurs for every CNN we tested, exactly at the stage where equivariant feature maps are pooled globally. We believe that the new analyses extend the generality of our findings and remove the concern that our claims apply only to older, shallower networks.

      For details, please refer to Point 1 in the ’Essential Revisions’ section.

      Point 2.2: Minor weaknesses

      1) Figure 1A: since the relevance to primate brains is a major motivator of this work, the results from actual neural recordings should be shown and not just schematics. For instance, the mirror symmetry in AL is not as clean as the illustration (compare with Fig. 3 in Yildirim et al. 2020), and in the paper’s current form, this is not easily accessible to the reader.

      Thank you for your feedback regarding the presentation of neural recordings in Figure 1A. We have updated Figure 1A to include actual neural RDMs instead of the previous schematic representations.

      Point 2.3: 2. Figure 4 L832-845: The claims for the effect of training on mirror-symmetric viewpoint tuning are with respect to the training data only, but there are other differences between the models such as the number of epochs (250 for CIFAR-10 training, 200 for all other datasets), the learning rate (2.5 ∗ 10−4 for CIFAR-10, 10−4 for all others), the batch size (128 vs 64), etc. I do not expect these choices to make a major difference for your claims, but it would be much cleaner to keep everything but the training dataset consistent. Especially the different test accuracies worry me a bit (from 81% to 92%, and they appear different from the accuracy numbers in figure S4 e.g. for CIFAR-10 and asymSVHN), at the very least those should be comparable.

      We addressed this point by retraining the models while holding most of the hyperparameters constant. Specifically, we standardized the number of epochs, batch size, and weight decay. The remaining differences are necessitated by the characteristics of the specific training image sets used (natural images versus digits). Please note that we do not directly contrast models trained on CIFAR-10 and SVHN; the controlled comparisons are conducted while holding the SVHN training images constant, and are not confounded by hyperparameter choice.

      Manuscript changes:

      The networks’ weights and biases were initialized randomly using the uniform He initialization [70]. We trained the models using 250 epochs and a batch size of 256 images. The CIFAR-10 network was trained using stochastic gradient descent (SGD) optimizer starting with a learning rate of 10−3 and momentum of 0.9. The learning rate was halved every 20 epochs. The SVHN/symSVHN/asymSVHN networks were trained using the Adam optimizer. The initial learning rate was set to 10−5 and reduced by half every 50 epochs. The hyper-parameters were determined using the validation data. The models reached around 83% test accuracy (CIFAR-10: 81%, SVHN: 89%, symSVHN: 83%, asymSVHN: 80%). Fig. S4 shows the models’ learning curves.

      Point 2.4: 3. L681-685: The general statement made in the paper that ”deeper models lose their advantage as models of cortical representations” is not supported by the cited limited comparison on a single dataset. There are many potential confounds here with respect to prior work, e.g. the recording modality (fMRI vs electrodes), the stimulus set (62 images vs thousands), the models that were tested (9 vs hundreds), etc.

      We agree that the recording modality and stimulus set may play a critical role in determining model ranking. Since we generalized the analyses to deeper models, we removed this statement from the paper. While we still believe that shallower networks may prove to be better models of the visual cortex, this empirical question is out of the scope of the current manuscript.

      Reviewer 3

      This study aimed to explore the computational mechanisms of view invariance, driven by the observation that in some regions of monkey visual cortex, neurons show comparable responses to (1) a given face and (2) to the same face but horizontally flipped. Here they study this known phenomenon using AlexNet and other shallow neural networks, using an index for mirror symmetric viewpoint tuning based on representational similarity analyses. They find that this tuning is enhanced at fully connected- or global pooling layers (layers which combine spatial information), and that the invariance is prominent for horizontal- but not vertical- or rotational transformations. The study shows that mirror tuning can be learned when a given set of images are flipped horizontally and given the same label, but not if they are flipped and given different labels. They also show that networks learn this tuning by focusing on local features, not global configurations.

      We are grateful for your thorough reading, reflected by the comprehensive summary of our study and its main findings.

      Point 3.1) I found the study to be a mixed read. Some analyses were fascinating: for example, it was satisfying to see the use of well-controlled datasets to increase or decrease the rate of mirror-symmetry tuning. The insertion- and deletion¬ experiments were elegant tests to probe the mechanisms of mirror symmetry, asking if symmetry could arise from (1) global feature configurations (in a holistic sense) vs. (2) local features, with stronger evidence for the latter. These two sets of results were successful and interpretable. They stand in contrast with the first analysis, which relies on observations that do not seem justified. Specifically, Figure 2D shows mirror-symmetry tuning across 11 stages of image processing, from pixels space to fully connected layers. It shows that images from different object categories evoke considerably different tuning index values. The explanation for this result is that some categories, such as ”tools,” have ”bilaterally symmetric structure,” but this is not explicitly measured anywhere. ”Boats” are described as having ”front-back symmetry,” more so than flowers. One imagines flowers being extremely symmetric, but perhaps that depends on the metric. What is the metric? At first I thought it was the mirror-symmetric viewpoint tuning index in the image (pixel) space, but this cannot be, as the index for faces and flowers is negative, cars have no symmetry, and boats are positive. To support these descriptions, one must have an independent variable (for object class symmetry) that can be related to the dependent variable (the mirror-symmetric viewpoint tuning index). If it exists, it is not a part of the Results section. This omission undermines other parts of the Results section: ”some car models have an approximate front-back symmetry...however, a flower typically does not...” ”Some,” ”typically:” how many in the dataset exactly, and how often?

      We thank you for your insightful observation. You are correct that we did not refer to pixel-space symmetry; our descriptions relate to the 3D structure of the objects used in the study.

      Following this comment, we objectively quantified the symmetry planes of the 3D objects. Unfortunately, we do not have direct access to the proprietary 3D meshes of these objects, only to their renders. Therefore, we devised measures that assess the symmetry of the 3D objects through the symmetry they elicit in the different 2D renders.

      This analysis is described in the new supplemental figure S8. We believe that these measurements support the qualitative claims we made in the previous draft.

      Point 3.2) The description of CIFAR-10 as having bilaterally symmetric categories - are all these categories equally symmetric? If not, would such variability matter in terms of these results?

      When considering their 3D structure, all ten CIFAR10 categories exhibit pronounced left-right symmetry. These categories encompass vertebrate animals (birds, cats, deer, dogs, frogs, horses); They also include man-made vehicles (airplanes, cars, ships, and trucks), which, at least externally, are nearly perfectly symmetric by design. It is important to note that this symmetry pertains to the photographed 3D objects, rather than the images themselves, which could be highly asymmetric. Other axes of symmetry (e.g., back-front) in CIFAR10 cannot be measured without 3D representations of the objects.

      Point 3.3) These assessments of object category symmetry values are made before experiments are presented, so they are not interpretations of the results, and it would be circular to write it otherwise.

      We have changed the order so that the explanations follow the experimental results. This includes the relevant main text paragraph, as well as the relevant figure—both the order of panels and the phrasing of the figure caption.

      Point 3.4) Overall, my bigger concern is that the framing is misleading or at best incomplete. The manuscript successfully showed that if one introduces left-right symmetry to a dataset, the network will develop population-level representations that are also bilaterally symmetric. But the study does not explain that the model’s architecture and random weight distribution are sufficient for symmetry tuning to emerge, without training, just to a much more limited degree. Baek et al. showed in 2021 that viewpoint-invariant face-selective units and mirror-symmetric units emerge in untrained networks (”Face detection in untrained deep neural networks”; this current manuscript cites this paper but does not mention that mirror symmetry is a feature of the 2021 study). This current study also used untrained networks as controls (Fig. 3), and while they were useful in showing that learning boosts symmetry tuning, the results also clearly show that horizontal-reflection invariance is far from zero. So, the simple learning-driven explanation for the mirror-symmetric viewpoint tuning for faces is wrong: while (1) network training and (2) pooling are mechanisms that charge the development of mirror-symmetric tuning, the lottery ticket hypothesis is enough for its emergence. Faces and numbers are simple patterns, so the overparameterization of networks is enough to randomly create units that are tuned to these shapes and to wire many of them together. How learning shapes this process is an interesting direction, especially now that this current study has outlined its importance.

      We agree with the reviewer that random initialization may result in units that show mirror-symmetric viewpoint tuning for faces in the absence of training. In the revised manuscript, we quantify the occurrence of such units, first reported by Baek et al, in detail, and discuss the relation between Baek et al., 2021 and our work. In brief, our analysis affirms that units with mirror-symmetric viewpoint tuning for faces appear even in untrained CNNs, although we believe their rate is lower than previously reported. Regardless of the question of the exact proportion of such units, we believe it is unequivocal that at the population level, mirror-symmetric viewpoint tuning to faces (and other objects with a single plane of symmetry) is strongly training-dependent.

      First, we refer the reviewer to Figure S2, which directly demonstrates the effect of training on the population-level mirror symmetric viewpoint tuning:

      Note the non-mirror-symmetric reflection invariant tuning profile for faces in the untrained network.

      Second, the above-zero horizontal reflection-invariance referred by the reviewer (Figure 3) is distinct from mirror- symmetric viewpoint tuning; the latter requires both reflection-invariance and viewpoint tuning. More importantly, it was measured with respect to all of the object categories grouped together; this includes objects with quadrilateral symmetry, which elicit mirror-symmetric viewpoint tuning even in shallow layers and without training. To clarify the confusion that this grouping might have caused, we repeated the measurement of invariance in fc6, separately for each 3D object category:

      Disentangling the contributions of different categories to the reflection-invariance measurements, this analysis under-scores the necessity of training for the emergence of mirror-symmetric viewpoint symmetry.

      Last, we refer the reviewer to Figure S5, which shows that the symmetry of untrained convolutional filters has a narrow, zero-centered distribution. Indeed, the upper limit of this distribution includes filters with a certain degree of symmetry. This level of symmetry, however, becomes the lower limit of the filters’ symmetry distribution following training.

      Therefore, we believe that training induces a shift in the tuning of the unit population that is qualitatively distinct from, and not explained by, random-lottery-related mirror-symmetric viewpoint tuned units. In the revised manuscript, we clarify the distinction between mirror-symmetric viewpoint tuning at the population level and the existence of individual units showing pre-training mirror symmetric viewpoint tuning, as shown by Baek et al.

      Manuscript changes: (Discussion section)

      Our claim that mirror-symmetric viewpoint tuning is learning-dependent may seem to be in conflict with findings by Baek and colleagues [17]. Their work demonstrated that units with mirror-symmetric viewpoint tuning profile can emerge in randomly initialized networks. Reproducing Baek and colleagues’ analysis, we confirmed that such units occur in untrained networks (Fig. S15). However, we also identified that the original criterion for mirror-symmetric viewpoint tuning employed in [17] was satisfied by many units with asymmetric tuning profiles (Figs. S14 and S15). Once we applied a stricter criterion, we observed a more than twofold increase in mirror-symmetric units in the first fully connected layer of a trained network compared to untrained networks of the same architecture (Fig. S16). This finding highlights the critical role of training in the emergence of mirror-symmetric viewpoint tuning in neural networks also at the level of individual units.

      Point 3.5) Finally, it would help to cite other previous demonstrations of equivariance and mirror symmetry in neural networks. Chris Olah, Nick Cammarata, Chelsea Voss, Ludwig Schubert, and Gabriel Goh of OpenAI wrote of this phenomenon in 2020 (Distill journal).

      We added a reference to the study by Olah and colleagues (2020).

      Manuscript changes: (Discussion section)

      (see Olah and colleagues (2020) [60] for an exploration of emergent equivariance using activation maximiza- tion).

      Point 3.6) Some other observations that might help:

      I am enthusiastic about the experiments using different datasets to increase or decrease the rate of mirror-symmetry tuning (sets including CIFAR10, SVHN, symSVHN, asymSVHN); it is worth noting, however, that the lack of a ground truth metric for category symmetry is a problem here too. In the asymSVHN dataset, images are flipped and given different labels. If some categories are naturally symmetric after horizontal flips, such as images containing ”0” or ”8”, then changing the label is likely to disturb training. This would explain why the training loss is larger for this condition (Figure S4D).

      We now acknowledge that the inclusion of digits 0 and 8 reduces the accuracy of asymSVHN:

      Manuscript changes: (Figure S4 caption)

      Note that the accuracy of asymSVHN might be negatively affected by the inclusion of relatively symmetric categories such as 0 and 8.

      Our rationale for retaining these digits in the dataset was to manipulate the symmetry of the learned categories (compared to symSVHN) while keeping the images themselves constant.

      Regarding ground-truth symmetry of these dataset: For CIFAR-10, the relevant measure of symmetry pertains to the 3D structure of the photographed objects, which we believe is unequivocally symmetric (see Point 3.2). Note that 2D, pixel-space image symmetry is not directly indicative of symmetry in 3D.

      For SVHN, which consists of two-dimensional characters, the pixel-space symmetry of the images indeed reflects the objects’ symmetry. However, since we are worried that some readers might confuse our claims that relate to the symmetry of objects with claims (we did not make) about symmetry of 2D images, we prefer to avoid reporting measurements of image-space symmetry. We believe that our interpretation of the experiments with SVHN/symSVHN/asymSVHN holds even in the absence of such measurements.

      For your reference, we include here a quantification of image-space horizontal symmetry for each category of CIFAR-10 and SVHN:

      Point 3.7) It is puzzling why greyscale 3D rendered images are used. By using greyscale 3D render (at least as shown in the figures) the study proceeds as if the units are invariant under color transformations. Unfortunately, this is not true and using greyscale images impact the activations of different layers of Alexnet in a way that is not fully defined. Moreover, many units in shallow networks focus on color and exactly these units could be invariant to other transformation like the mirror symmetry, but grey scaling the images makes them inactive.

      We use grayscale 3D rendered images to align with the setting in other studies investigating mirror- symmetric viewpoint tuning, including Freiwald et al. (2010), Leibo et al. (2017), and Yildirim et al. (2020). The choice of using grayscale images in these studies is motivated by the need to dissociate face-processing from lower-level, hue-specific responses.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors perform a very thorough, extensive characterization of the impact of an iron-rich diet on multiple phenotypes in a wide range of inbred mouse strains. While a work of this type does not offer mechanistic insights, the value of the study lies not only in its immediate results but also in what it can offer to future researchers as they explore the genetic basis of iron levels and other related phenotypes in rodent studies. The creation of a web resource and the offer from the authors to share all available samples is particularly laudable, and helps to increase the accessibility of the work to other scientists. There is one shortcoming to the work however. To induce iron overload in mice in the main study in this work, mice were placed on an iron-rich diet that differed in its composition from the baseline diet in more than just iron. This could influence some of the phenotypes observed in this study.

      We thank the reviewer for their comments. We hope that this work can provide insight and/or support for a wide variety of future studies. Regarding the diets, yes, in our initial pilot study with 6 strains, the baseline diet was inadvertently not isocaloric with the high iron diet, and it also used a different source of cellulose and contained individual amino acids in ratios found in casein, instead of casein, which was used as the protein source for the high iron diet. The baseline metal composition however was the same. We included data from the pilot study in this manuscript because it provided some important early insight, but made sure to note this caveat since it could potentially affect some results. We added some additional text to the Methods section to help clarify this further. The other subsequently performed studies in this paper were not affected, for example the Control study performed in C57BL/6J has a baseline diet that matches the high iron diet except for iron. For our HMDP genetic study with 114 strains, we did not have a baseline group, so all mice were on the same high iron diet.

      Reviewer #2 (Public Review):

      Here, the authors tried to identify the genes and biological pathways underlying iron overload and its associated pathologies in mice. Several wet lab experiments and measurements alongside many bioinformatic analyses like GWAS, RNA-seq data analysis (DEG), eQTL analysis, TWAS, and gene-set enrichment analysis have been performed. The study design is good enough and the author tried to validate the results. The data have been submitted (Accession #: GSE230674) but are not public yet.

      Thank you very much for your detailed and thoughtful review and for helping us to improve our manuscript.

      1) The main issue of this manuscript is its length. It's too long, especially the result section. It's hard for readers to follow the paper. Moreover, you added results about other minerals, mostly copper, which seems too much (considering the fact that this study is about iron). The text doesn't have the required Integrity and focus. You should decide where you want to put the focus of this manuscript and I strongly recommend shortening the manuscript, try to be short and sweet as much as you can.

      Thank you for this helpful suggestion. We have moved or removed excess discussion from the Results section. We moved the specific GWAS results for copper and related red cell traits to the Supplementary text file “Supplementary File 24” so that only iron and triglyceride GWAS results are described in the main text. We kept in the discussion about the copper findings in the Discussion section, since we believe the deficiency is an important phenotype induced by the high iron diet that may impact other studies of dietary iron overload. We also believe that the copper and anemia GWAS loci may be of interest to some readers. We considered putting the copper and anemia findings in a separate manuscript, but ultimately decided to include it here, although we do agree it makes the manuscript longer.

      2) Also, the "Methods" section is long, some parts are over-detailed (mostly wet lab procedures) and some parts are not detailed enough. It seems the "Statistical analyses" part doesn't have extra information. I recommend removing the first paragraph and moving some of the information from the second paragraph to the right place in the Method section.

      We reorganized the first part of the statistical analyses section for clarity, and as mentioned further below, added in more detail regarding the GWAS significance thresholds:

      “Analyses were performed using GraphPad Prism (GraphPad Software, La Jolla, CA) and in R. P < 0.05 was considered significant for these tests and for bicor analyses. All reported P values are based on a two-sided hypothesis. The initial number of mice per group in the pilot (N = 6 per group) and Control studies (N = 8 per group) were determined based on previous studies where similar phenotypes were measured. For the HMDP study, permutation and simulation studies were previously used to test the statistical power of the HMDP using parameters including the variance explained by SNPs, genetic background, random errors, and the number of repeated measurements per strain (Bennett, Farber et al. 2010). Appropriate sample sizes to achieve adequate statistical power were determined based on previous analyses. Differences in sample sizes among the HMDP strains were due to differences in strain availability as determined by breeding success and losses. For GWAS, thresholds for significant (P < 4.1e-6; -log10P > 5.387) loci were defined using permutation as previously described (Bennett, Farber et al. 2010). The suggestive locus threshold (P < 4.1e-5; -log10P > 4.387) was based on reducing the significance threshold by one log unit. The cis eQTL GWAS threshold (P < 1e-4) was based on a calculated 1% FDR threshold of 1.73e-3, adjusted to 1e-4 to be slightly more conservative. The trans-eQTL threshold (P < 1e-6) was based on the 4.1e-6 threshold, adjusted to 1e-6 to be more conservative as well.”

      We tried moving the missing values notes in the second paragraph to the various method sections in the paper they apply to, but this led to much repetition and was in some cases not clear, so we decided to keep this information together in the statistical analyses section.

      3) Some part of your discussion section, is retelling the results. Please discuss your results and compare them with previous findings.

      We have revised the discussion to remove several parts that mostly just summarized the results and agree this improves the text. As mentioned above, we moved some discussion that was in the Results section to the Discussion section as well.

      4) Add detail about your GWAS model. As you had repeated samples from each strain, it's good to mention how you considered this. Also, show how you determined the significance threshold.

      Thank you for this suggestion. The GWAS software we used (FaST-LMM) derives a kinship matrix from the genotypes of the individuals considered in the analysis; this kinship matrix is used to correct for population structure including multiple individuals per strain.

      The trait GWAS significance threshold was determined using permutation analysis (Bennett, Farber et al. 2010). The suggestive GWAS threshold was based on reducing the significance threshold by one log unit. The cis eQTL GWAS threshold was based on a calculated 1% FDR threshold of 1.73e-3, adjusted to 1e-4 to be slightly more conservative. The trans-eQTL threshold was based on the 4.1e-6 threshold, adjusted to 1e-6 to be more conservative as well.

      To improve the text, we added to the Methods section under the “Genome-wide association analysis and heritability estimation” header the following:

      “Traits were quantile transformed to normalize the distribution and then GWAS was performed using the FaST-LMM program (Lippert, Listgarten et al. 2011), which corrects for population structure (including multiple samples per strain) by using a kinship matrix derived from the genotypes to be analyzed.”

      We also revised the GWAS threshold text to include more detail:

      “Analyses were performed using GraphPad Prism (GraphPad Software, La Jolla, CA) and in R. P < 0.05 was considered significant for these tests and for bicor analyses. All reported P values are based on a two-sided hypothesis. For GWAS, thresholds for significant (P < 4.1e-6; -log10P > 5.387) loci were defined using permutation as previously described (Bennett, Farber et al. 2010). The suggestive locus threshold (P < 4.1e-5; -log10P > 4.387) was based on reducing the significance threshold by one log unit. The cis eQTL GWAS threshold (P < 1e-4) was based on a calculated 1% FDR threshold of 1.73e-3, adjusted to 1e-4 to be slightly more conservative. The trans-eQTL threshold (P < 1e-6) was based on the 4.1e-6 threshold, adjusted to 1e-6 to be more conservative as well. “

      5) The abstract could be better. It also doesn't have a conclusion.

      We revised the abstract and added in a conclusion:

      “Tissue iron overload is a frequent pathologic finding in multiple disease states including non-alcoholic fatty liver disease (NAFLD), neurodegenerative disorders, cardiomyopathy, diabetes, and some forms of cancer. The role of iron, as a cause or consequence of disease progression and observed phenotypic manifestations, remains controversial. In addition, the impact of genetic variation on iron overload related phenotypes is unclear, and the identification of genetic modifiers is incomplete. Here, we used the Hybrid Mouse Diversity Panel (HMDP), consisting of over 100 genetically distinct mouse strains optimized for genome-wide association studies (GWAS) and systems genetics, to characterize the genetic architecture of dietary iron overload and pathology. Dietary iron overload was induced by feeding male mice (114 strains, 6-7 mice per strain on average) a high iron diet for six weeks, and then tissues were collected at 10-11 weeks of age. Liver metal levels and gene expression were measured by ICP-MS/ICP-AES and RNASeq, and lipids were measured by colorimetric assays. FaST-LMM was used for genetic mapping, and Metascape, WGCNA, and Mergeomics were used for pathway, module, and key driver bioinformatics analyses. Across the HMDP, we identified many traits that exhibited high inter-strain variability on the high iron diet, and we found a substantial contribution of genetics to many traits. Mice on the high iron diet accumulated iron in the liver, with a 6.5 fold difference across strain means. The iron loaded diet also led to a spectrum of copper deficiency and anemia, with liver copper levels highly positively correlated with red blood cell count, hemoglobin, and hematocrit. Hepatic steatosis of various severity was also observed histologically, with 52.5 fold variation in triglyceride levels across the strains. Most clinical traits examined had at least one significant GWAS locus, and notably, liver triglyceride and iron mapped most significantly to an overlapping locus on chromosome 7 that has not been previously associated with either trait. By genetically mapping liver mRNA expression, we identified cis- and trans-eQTL for thousands of genes, and we integrated this with trait correlation data to identify candidate causal genes at many trait loci. Using network modeling, significant key drivers for both iron and triglyceride accumulation were found to be involved in cholesterol biosynthesis and oxidative stress management. To make the full data set accessible and useable by others, we have made our data and analyses available on a resource website. Overall, our study confirms and expands upon the contribution of mouse genetic background to dietary iron overload and associated pathology. The numerous GWAS loci, candidate genes, and biological pathways identified here provide a rich public resource to drive further investigation.”

      6) Page 8, lines 4-7: Please remove these lines or move them to the Method section. The last paragraph of the introduction should clearly explain the goal of the study.

      We removed these lines and revised this paragraph for clarity:

      In order to gain further insight into genetic contributors to iron overload and associated pathology, we measured clinical traits and hepatic mRNA expression in 114 mouse strains fed a high iron diet. The mice are from a genetically diverse cohort known as the Hybrid Mouse Diversity Panel (HMDP), a panel optimized for systems genetics studies that has previously been used to examine numerous complex traits, including obesity, diabetes, atherosclerosis, heart failure, carbon tetrachloride induced liver fibrosis, and fatty liver disease (Lusis, Seldin et al. 2016; Seldin, Yang et al. 2019; Tuominen, Fuqua et al. 2021; Cao, Wang et al. 2022).

      7) Page 68, line 13: Explain the abbreviation (RINe) before use. Also, most probably it is RIN (RNA Integrity Number).

      Thank you for pointing this out. We updated the methods text as follows: “All samples had RNA integrity number equivalents (RINe) values greater than 8 as measured on an Agilent 2200 TapeStation (Agilent, Santa Clara, CA).” We also added RINe to the abbreviations section.

      8) The heritability estimates seem high and the 1% difference between broad- and narrow-sense heritability means there is almost no dominant and epistatic genetic variance between alleles affecting the studied trait (which is hard to accept). I recommend considering a within-group (strain) variance (common environmental effect) component in the model to absorb this source of variation in this component, so the genetic variance and consequently the heritability estimates would be more accurate. You also can consider this source of variance in your GWAS model.

      Thank you for bringing up these points. While we try to minimize environmental effects by keeping these mice and samples in as similar environmental and experimental conditions as feasible, some will remain. Thus, in our analyses, we try to factor in remaining environmental variation by using data from multiple mice per strain. The programs we used for GWAS and heritability calculations take into account within-group (strain) variance. We added the following sentence to the Methods section just after mention of the programs used to calculate heritability:

      “Both of the software packages used for heritability estimation account for environmental variance within strains.”

      We agree that the broad-sense and narrow-sense estimates are close to each other for many traits and that this suggests low levels of dominance and epistasis. A low level of non-additive genetic variance is not uncommon and theoretically predicted for complex traits, as has been reported previously and discussed in the references below:

      Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008 Feb 29;4(2):e1000008. doi: 10.1371/journal.pgen.1000008. PMID: 18454194

      Hivert V, Sidorenko J, Rohart F, Goddard ME, Yang J, Wray NR, Yengo L, Visscher PM. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. Am J Hum Genet. 2021 May 6;108(5):786-798. doi: 10.1016/j.ajhg.2021.02.014. Epub 2021 Apr 2. Erratum in: Am J Hum Genet. 2021 May 6;108(5):962. PMID: 33811805

      It has also been argued that many human GWAS studies, as well as studies using populations of mice designed for complex trait analyses, including the HMDP population, inherently lack the statistical power to detect epistasis:

      Buchner DA, Nadeau JH. Contrasting genetic architectures in different mouse reference populations used for studying complex traits. Genome Res. 2015 Jun;25(6):775-91. doi: 10.1101/gr.187450.114. Epub 2015 May 7. PMID: 25953951

      Taking all this together we would argue that it is not surprising to see the little difference between the narrow and broad heritability estimates for many traits in our study. To provide more context to the reader regarding how to interpret our heritability findings, we added the following text to the discussion section, under limitations:

      “Finally, in our study with the HMDP population, estimated broad and narrow sense heritabilities were similar for many traits, suggesting modest non-additive contributions (e.g dominance and epistasis) to the variance in these traits. While such results are common and theoretically predicted for complex traits (Hill, Goddard et al. 2008; Hivert, Sidorenko et al. 2021), our study population may also not be optimal for detection of these effects (Buchner and Nadeau 2015).”

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study by Lee et al. is a direct follow-up on their previous study that described an evoluBonary conservancy among placental mammals of two moBfs (a transmembrane moBf and a juxtamembrane palmitoylaBon site) in CD4, an anBgen co-receptor, and showed their relevance for T-cell anBgen signaling. In this study, they describe the contribuBon of these two moBfs to the CD4-mediated anBgen signaling in the absence of CD4-LCK binding. Their approach was the comparison of anBgen-induced proximal TCR signaling and distal IL-2 producBon in 58-/- T-cell hybridoma expressing exogenous truncated version of CD4 (without the interacBon with LCK), called T1 with T1 version with the mutaBons in either or both of the conserved moBfs. They show that the T1 CD4 can support signaling to the extend similar to WT CD4, but the mutaBon of the conserved moBfs substanBally reduced the signaling. The authors conclude that the role of these moBfs is independent of the LCK-binding.

      Strengths:

      The authors convincingly show that T1 CD4, lacking the interacBon with LCK supports the TCR signaling and also that the two studied moBfs have a significant contribuBon to it.

      Weaknesses:

      The study has several weaknesses.

      (1) The whole study is based on a single experimental system, geneBcally modified 58-/- hybridoma. It is unclear at this moment, how the molecular moBfs studied here contribute to the signaling in a real T cell. The evoluBonary conservancy suggests that these moBfs are important for T cell biology. However, the LCK-binding moBf is conserved as well (perhaps even more) and it plays a very minor role in their model. Without verifying their results in primary cells, the quanBtaBve, but even qualitaBve, importance of these moBfs for T-cell signaling and biology is unclear. Although the authors discuss this issue in the Discussion, it should be noted in all important parts of the manuscript, where conclusions are made (abstract, end of introducBon, perhaps also in the Btle) that the results are coming from the hybridoma cells.

      We appreciate the Reviewer’s thoughWul comments and suggesBon. We now state in the abstract and introducBon that wet-lab experiments were performed with T cell hybridomas. We have also beXer highlighted work from Killeen and LiXman (PMID: 8355789) wherein they showed that C-terminally truncated CD4, which lacked the moBfs that mediate CD4-Lck interacBons, can drive CD4+ T cell development, proliferaBon, and T-helper funcBon because we now provide mechanisBc data to help explain those in vivo results. Also, as noted by the reviewer, we discuss how the sum of our data provides jusBficaBon for the investment in and use of mouse models to interrogate how the funcBonally important residues/moBfs idenBfied and studied here influence T cell biology.

      We will take the opportunity to reiterate here that, while the study is based on a well characterized, albeit single, wet-lab experimental system, the whole study is based on two lines of invesBgaBon. The other approach was a systems biology computaBonal approach that analyzes data from real-world experiments in a variety of jawed vertebrate species over evoluBon. Specifically, we used a computaBonal reconstrucBon of the evoluBonary history of CD4 by performing mulBple analyses of CD4 from 99 jawed vertebrates spanning ~435 million years of evoluBon. This analysis allowed us to idenBfy residues, and networks of evoluBonarily coupled residues, that are predicted to be funcBonally important in vivo. Like other systems biology approaches, this allowed us to look at the larger picture by evaluaBng data points that have emerged from constant tesBng and adjustments of CD4 funcBon in vivo through selecBon on an evoluBonary Bmescale in more jawed vertebrate species, and under more real-world condiBons, than can be tested in the laboratory. Our structure-funcBon analysis provided a second, wet-lab reducBonist experimental system to cross-validate that the residues idenBfied by our evoluBonary analysis are funcBonally significant. This experimental validaBon is criBcal and elevates the relevance of our studies above ad hoc observaBons. Our work also provides mechanisBc insights for why the residues studied here are funcBonally significant (i.e., key determinants of pMHCII-specific signaling iniBaBon). In short, using both systems allowed us to cross-validate the funcBonal significance of the residues within the GGXXG and (C/F)CV+C moBfs studied here by two independent methods.

      (2) Many of the experiments lack the negaBve control. I believe that two types of negaBve controls should be included in all experiments. First, hybridoma cells without CD4 (or with CD4 mutant unable to bind MHCII). Second, no pepBde control, i.e., acBvaBon of the hybridoma cells with the APC not loaded with the cognate pepBde. These controls are required to disBnguish the basal levels of phoshorylaBon and CD4-independent anBgen-induced phosphorylaBon to quanBfy, what is the contribuBon of the parBcular moBfs to the CD4-mediated support. Although these controls are included in some of the experiments, they are missing in other ones. The binding mutant appears in some FC results as a horizontal bar (without any error bar/variability), showing that CD4 does not give a huge advantage in these readouts. Why don't the authors show no pepBde controls here as well? Why the primary FC data (histograms) are not shown? Why neither of these two controls is shown for the % of responders plots? Although the IL-2 producBon is a very robust and convincing readout, the phosphoflow is much less sensiBve. It seems that the signaling is elevated only marginally. Without the menBoned controls and showing the raw data, the precise interpretaBon is not possible.

      These comments, and those in point #3, concern our flow cytometry-based analysis of early intracellular signaling events where we asked: how do the moBfs under invesBgaBon impact phosphorylaBon of CD3z, ZAP-70, and PLCg1 in response to agonist pMHCII? Thank you for poinBng out areas of confusion regarding these analyses. We will try to clarify here and have worked to clarify the text.

      Our approach was to mutate consBtuent residues within the moBfs that our evoluBonary analysis predicted to be funcBonally significant, compare the performance of the mutants to that of controls bearing WT moBfs, and then infer the funcBon of the moBfs based on the differenBal phenotype of the mutants relaBve to their controls. In most cases, the C-terminally truncated CD4-T1 mutant served as the appropriate CD4 control backbone against which to evaluate the phenotypes of the GGXXG and (C/F)CV+C moBf mutants. This is a convenBonal structure-funcBon strategy.

      All experiments included APCs expressing null pMHCII (Hb:I-Ek) as negaBve controls. These were a necessary component of the data analysis, explained further below, which involved background subtracBon of the signal from control or mutant T cell hybridomas bound to these negaBve control APCs from those bound to the agonist pMHCII (MCC:I-Ek). Doing so allowed us to establish a true signal over background for calculaBng percent responders and signaling intensity. These negaBve controls served the same purpose of APCs expressing I-Ek not loaded with cognate pepBde requested by the reviewer. It is important to note that we previously published that TCR-CD3-pMHCII interacBons reciprocally increase CD4-pMHCII dwell Bme, and vice versa, such that dwell Bmes of the 5c.c7 TCR and CD4 to the null Hb:I-Ek are both basal in this system relaBve to antagonist, weak agonist, and agonist pMHCII (PMID 29386113). A recent study using different techniques also concluded that TCR-CD3 and CD4 cooperaBvely enhance signaling to pMHCII (PMID 36396644). The use of the null pMHCII, Hb:I-Ek, in each experiment thus serves as a well-characterized negaBve control for both TCR and CD4 engagement in this experimental system with regards to assembly of the TCR-CD3 and CD4 around pMHCII to drive signaling. In our view, it is the most important negaBve control for interpreBng our results, and it is present in each experiment. In Fig 1B and related supplemental figures we compare the Cterminally truncated CD4-T1 mutant to the full-length WT CD4 to evaluate the contribuBons of the intracellular domains to early signaling events. We found no significant differences for pCD3z, pZAP-70, and pPLCg1 levels demonstraBng that, in our system, CD4 WT and T1 are staBsBcally indisBnguishable.

      In Fig 1C we asked: what is the contribuBon of CD4-pMHCII interacBons made by CD4 T1, which lacks the intracellular domain, using our CD4 T1Dbind mutant. Fig 2C and Table 3 show that pCD3z levels for T1Dbind were ~54% of T1, meaning that CD4 binding to pMHCII roughly doubles pCD3z levels (even without the intracellular domain). We also showed that the percent of responders were not different between the CD4 T1 and T1Dbind mutant in Fig 2C. The impact on ZAP-70 and PLCg1 are shown in Figure 2—figure supplement 4. These differences, including the magnitude of the decrease, were observed reproducibly (p<0.001) in three independently generated sets of lines. We believe that this analysis saBsfies the request by the reviewer for an analysis of the contribuBons of CD4 binding to pMHCII. We did not include this as a negaBve control in experiments evaluaBng the contribuBons of the GGXXG and (C/F)CV+C moBfs to CD4 T1 signaling because the quesBon being asked in those experiments was how do the moBfs impact signaling in the absence of the intracellular domain (i.e., within the CD4 T1 backbone, making CD4 T1 the proper comparator for the quesBon we were asking). We showed the average normalized intensity for the T1Dbind mutant, relaBve to T1, for this lower bound of signaling mediated by TCR-CD3-only as a doXed line in those figures to provide a reference point for the readers to evaluate and put into perspecBve how the mutants we generated impacted the overall contribuBon of CD4 to these early signaling events. The T1Dbind mutants were not always measured in the same experiment at the same Bme with other mutants, because the cell lines used were not always made at the same Bme, so we did not think it appropriate to graph the results together.

      We do not know how to interpret the comment “Although the IL-2 producBon is a very robust and convincing readout, the phosphoflow is much less sensiBve. It seems that the signaling is elevated only marginally.” We will offer our perspecBve that we do not know how to equate the sensiBvity of the phos-flow to the IL-2. Because the IL-2 is a signaling output, it results from signaling amplificaBon from the membrane to the nucleus. If CD3z phosphorylaBon is the iniBaBng event for a signaling cascade that leads to IL-2 gene transcripBon and transducBon, as is widely believed, our data strongly suggests that the ~2-fold difference in pCD3z levels between CD4 T1 and T1Dbind (Fig 2C/Table 3 data) contributes to the difference between no IL-2 output for T1Dbind and IL-2 output by T1 in this experimental system. Because CD4 WT and T1 have significantly different levels of IL-2 output, but show no significant differences in pCD3z, pZAP-70, or pPLCg1 levels, there are likely to be other differences we did not measure via other pathways that intersect at the nucleus. At many levels, biology works on gradients such that small differences can Bp a system in one direcBon or another. The kineBc discriminaBon model (PMID 8643643), which is thought to be a reasonable descripBon of the relaBonship between pMHC engagement and signaling outcomes, suggests that very small differences in molecular interacBons at the earliest stages of a response can lead to big differences in signaling outcome. We therefore have no basis at this juncture to think that ~2-fold differences in pCD3z levels could not account for bigger differences in signaling output such as IL-2.

      (3) The processing of the data is not clear. Some of the figures seem to be overprocessed. For instance, I am not sure what "Normalized % responders of pCD3zeta" means (e.g., Fig. 1C and elsewhere)? Why do not the authors show the actual % of pCD3zeta+ cells including the gaBng strategy? Why do the authors subtract the two histograms in Fig. 2- Fig.S3? It is very unusual.

      We did develop and implement a novel strategy for measuring the impact of our mutaBons on CD3z, ZAP-70, and PLCg1 phosphorylaBon. This was explained in more detail in our prior study. The instrucBons to authors indicated that we should not repeat methods in the current manuscript. However, we will go through the approach here, and address why we did not show primary FC histograms for all experiments from above. First, we think that a brief explanaBon as to what moBvated us to develop our approach will add to a beXer understanding:

      (1) For experimental and staBsBcal rigor, our goal was to perform both experimental and biological replicates by measuring and comparing the average of at least three independently generated sets of paired WT/T1 control Vs. mutant cells lines generated at different Bmes to determine the staBsBcal significance of the difference, if any, between averages of the control and mutant lines.

      (2) Our quesBons necessitated that we measure signals generated naturally by the cooperaBve engagement of cognate pMHCII by TCR-CD3 and CD4 on APCs, rather than through aCD3/aCD4 crosslinking.

      (3) We chose to use flow cytometry rather than bulk cell analysis by Western Bloung to analyze signaling occurring in cells that were engaged to the agonist APC in order to avoid diluBon of that signal by cells that are not engaged to APCs and not signaling. 4. For each experiment, we wanted to subtract background signals from cells bound to APCs expressing a null pMHCII (Hb:I-Ek) from signals generated by cells bound to APCs expressing agonist pMHCII (MCC:I-Ek). Doing so allowed us to idenBfy cells that are signaling (responders) to agonist over null pMHCII. The goal here was to quanBtate the level of signaling in an objecBve manner with a method that can be applied to all samples uniformly rather than seung a flow cytometry gate on posiBve cells (e.g. pCD3z) because gaBng is subjecBve and can vary from experiment to experiment. To put that another way, as detailed below, we used our subtracBon method to idenBfy signaling responders rather than seung a signaling gate on the posiBve populaBon.

      Regarding gaBng schemes, controls, and data processing:

      Figure 2—figure supplement 3 of the current study and Figure 6—figure supplement 1 of our prior study are designed to walk the reader through our experimental design, gaBng, data processing and thinking. Here we will provide a detailed explanaBon to complement the figure legend as well as the methods provided in our prior manuscript (see pt #4 below).

      We will refer to Figure 2—figure supplement 3 here:

      Panel A. The dot plots show our approach to idenBfying 5c.c7+ CD4+ 58a-b- T cell hybridomas (yaxis, GFP posiBve) coupled to M12 cells (x-axis, TagIt Violet) expressing the null pMHCII Hb:I-Ek (lev) or agonist pMHCII MCC:I-Ek (right). The gaBng shows the frequency of GFP+ T cell hybridomas that are bound to TagIt violet posiBve APCs (i.e., cell couples). The histogram on the right then shows the staining intensity for pCD3z on the x-axis for the 10,000 coupled events collected wherein the APCs express the null pMHCII (filled cyan) or the agonist pMHCII (black line).

      Panel B. The data presented here is the same as in Panel A, but for CD4 T1 cells.

      Panel C. The data presented here walks through how we idenBfy 5c.c7+ CD4+ 58a-b- T cell hybridomas responding (i.e., signaling) to agonist pMHCII, as well as the mean signaling intensity of the responding populaBon, in a gaBng-independent manner aver background subtracBon. For the lev graph, we exported the data for the histograms shown in Panel A from FlowJo 10 sovware and ploXed them here using Prism 9 as smoothed lines (500 nearest neighbors). The cyan line is therefore a replicate of the flow cytometry histogram shown in Panel A for pCD3z intensity from 5c.c7+ CD4+ 58a-b- T cell hybridomas coupled to M12 cells expressing the null pMHCII (Hb:I-Ek), while the black histogram is a replicate of the pCD3z intensity for 5c.c7+ CD4+ 58a-b- T cell hybridomas coupled to M12 cells expressing the agonist pMHCII (MCC:I-Ek). Next, to idenBfy the responding populaBon in a gaBng-independent manner, we used Excel to subtract the pCD3z intensity for the null pMHCII (cyan) negaBve control populaBon on a bin-by-bin bases from the pCD3z intensity for the agonist pMHCII (black) responding populaBon. We then transferred the background subtracted values to Prism 9 for smoothing and ploung (grey line: MCC:I-Ek minus Hb:I-Ek). The middle graph shows the same data processing for the data from Panel B for the CD4 T1 cells. Please note that the background subtracted grey line has negaBve values and posiBve values. The negaBve values represent intensity bins where signaling in response to agonist pMHCII leads to fewer cells per bin than in the null pMHCII populaBon that is not signaling, while the posiBve values represent bins of intensity where signaling cells outnumber non-signaling cells. The right graph in this panel shows the populaBons aver background subtracBon for intensity bins that had more cells with pCD3z signal in the agonist pMHCII populaBon than the null pMHCII populaBon (grey = WT full length CD4 and blue = T1). In short, the right graph shows idenBficaBon of those cells that are signaling in response to agonist pMHCII. This approach miBgated the need for subjecBve gaBng in FlowJo to idenBfy signaling cells (i.e., pCD3z posiBve) and allowed for background subtracBon which could not be done in FlowJo. We used this approach for all analyses of pCD3z, pZAP-70, and pPLCg1 in this study.

      The number of cells in these background-subtracted populaBons were divided by 10,000 (the number of events collected and analyzed) to calculate the percent of responding 5c.c7+ CD4+ 58a-b- T cell hybridomas, while the mean fluorescent intensity for the cells within these populaBon represent the signaling intensity.

      Panel D. The graph on the lev shows the mean fluorescence intensity (MFI) ± SEM for the posiBve signaling populaBon from the right graph of panel C. We see in this example comparing a WT and T1 cell line, generated at the same Bme from the same parental 58a-b- T cell hybridoma populaBon, that the T1 MFI is significantly greater than the WT. These intensity values represent one of the paired intensity values used in the main Fig 2B (Lev graph), where we show the paired MFI analysis of responding populaBons from 5 independently generated sets of cell lines. Please note that these single MFI values are directly derived from the flow cytometry histograms aver background subtracBon. Figure 2B, and similar figures, therefore equate to a disBllaBon of all of the histograms for the populaBons tested in a manner that we consider easier to digest than either overlaying all histograms or showing mulBple panels individually. It also conserves more space. This is why we only showed representaBve flow cytometry histograms, rather than all histograms.

      The graph on the right shows the % responders for the posiBve signaling populaBon from the right graph of panel C. Specifically, the total number of cells that were determined to be signaling in response to agonist pMHCII was divided by 10,000 (the number of coupled cells collected by flow cytometry) to determine the percent responders. These values represent one of five sets of values used to determine the average normalized percent responders (all normalized to WT). There was no significant difference between these two populaBons in terms of percent responders.

      Regarding graphing normalized values for the mean MFI for signaling intensity or the percent responders: in our first manuscript, we presented the individual MFI intensity values for matched pairs of cells as well as the actual percent responders per group. The feedback we received from colleagues on this presentaBon was that it was confusing, distracBng, and otherwise hard to digest. It was suggested to us by mulBple individuals that the normalized values would be preferable because it is easier and faster to understand. Upon reflecBon, we agreed with this feedback because the normalized presentaBon with staBsBcs allows for the two key relevant quesBons to be quickly evaluated: 1. Are the mutants different than the control? 2. By how much? We have lev the raw intensity values and well as the normalized intensity values in the version of record. Given the Reviewer’s comments, we have now graphed the average % responders instead of normalized values in the figures, and lev the normalized values in Table 3.

      (4) The manuscript lacks Materials and Methods. It only refers to the previous paper, which is very unusual. Although most of the methods are the same, they sBll should be menBoned here. Moreover, some of the mutants presented here were not generated in the previous study, as far as I understand. Perhaps the authors plan to include Materials and Methods during the revision...

      Because we submiXed this as a Research Advances arBcle we followed the journal instrucBons to reference the Materials and Methods in our prior publicaBon, upon which this work builds, as the methods used are the same. They are detailed in that study. We have now included a copy of the Materials and Methods for the eLife staff to determine how best to link with this manuscript. We have also included the gene sequences for the novel constructs used in this study. Thank you for poinBng out the omission.

      (5) Membrane rafts are a very controversial topic. I recommend the authors stick to the more consensual term "detergent resistant microdomains" in all cases/occurances.

      We agree this is a controversial topic with a variety of viewpoints. Because we are not experts in the field of membrane composition, we turned to the literature to inform our view of how best to refer to these membrane subdomains. In our reading, we found a 2006 meeting report from a Keystone symposium on lipid rafts and cell function authored by Linda Pike (PMID 16645198). At this meeting, a central focus was reaching a consensus on how best to refer to these domains. The consensus term agreed upon by this group was “membrane rafts”. Specifically, we will quote from this report published in the Journal of Lipid Research, ‘Together, the discussions permitted the generation of a definition for “lipid rafts” in an ad hoc session on the final day of the meeting. All participants were invited to contribute to this effort, and the work product reflects the consensus of this broad-based group…… First and foremost, the term “lipid raft” was discarded in favor of the term “membrane raft.”’ We chose to use the term “membrane raft” based on this consensus opinion.

      (6) Last, but not least, the mechanistic explanation (beyond the independence of LCK binding) of the role of these motifs is very unclear at the moment.

      We agree with this comment. One goal in making these results, and those in our prior study, available to the field at large is to provide evidence in support of our view that the dominant paradigm that is thought to explain the earliest events in T cell signaling needs re-evaluating. How T cell signaling is initiated in response to pMHCII is clearly more complex than is currently thought. However, out data is inconsistent with the dominant paradigm in which CD4 recruits Lck to TCR-CD3 to phosphorylate ITAMs to initiate signaling.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kuhn and colleagues follows upon a 2022 eLife paper in which they identified residues in CD4 constrained by evolutionary purifying selection in placental mammals and then performed functional analyses of these conserved sequences. They showed that sequences distinct from the CXC "clamp" involved in recruitment of Lck have critical roles in TCR signaling, and these include a glycine-rich motif in the transmembrane (TM) domain and the cyscontaining juxtamembrane (JM) motif that undergoes palmitoylation, both of which promote TCR signaling, and a cytoplasmic domain helical motif, also involved in Lck binding, that constrains signaling. Mutations in the transmembrane and juxtamembrane sequences led to reduced proximal signaling and IL-2 production in a hybridoma's response to antigen presentation, despite retention of abundant CD4 association with Lck in the detergent-soluble membrane fraction, presumably mislocalized outside of lipid rafts and distal to the TCR. A major conclusion of that study was that CD4 sequences required for Lck association, including the CXC "clasp" motif, are not as consequential for CD4 co-receptor function in TCR signaling as the conserved TM and JM motifs. However, the experiments did not determine whether the functions of the TM and JM motifs are dependent on the Lck-binding properties of CD4 - the mutations in those motifs could result in free Lck redistributing to associate with CD4 in signaling-incompetent membrane domains or could function independently of CD4-Lck association. The current study addresses this specific question.

      Using the same model system as in the earlier eLife paper (the entire methods section is a citation to the earlier paper), the authors show that truncation of the Lck-binding intracellular domain resulted in a moderate reduction in IL-2 response, as previously shown, but there was no apparent effect on proximal phosphorylation events (CD3z, Lck, ZAP70, PLCg1). They then evaluated a series of TM and JM motif mutations in the context of the truncated Lck-nonbinding molecule, and showed that these had substantially impaired co-receptor function in the IL-2 assay and reduced proximal signaling. The proximal signaling could be observed at high ligand density even with a MHC non-binding mutation in CD4, although there was still impaired IL-2 production. This result additionally illustrates that phosphorylation of the proximal signaling molecules is not sufficient to activate IL-2 expression in the context of antigen presentation.

      Strengths:

      The strength of the paper is the further clear demonstration that the classical model of CD4 coreceptor function (MHCII-binding CD4 bringing Lck to the TCR complex, for phosphorylation of the CD3 chain ITAMs and of the ZAP70 kinase) is not sufficient to explain TCR activation. The data, combined with the earlier eLife paper, further implicate the gly-rich TM sequence and the palmitoylation targets in the JM region as having critical roles in productive co-receptordependent TCR activation.

      Weaknesses:

      The major weakness of the paper is the lack of mechanistic insight into how the TM and JM motifs function. The new results are largely incremental in light of the earlier paper from this group as well as other literature, cited by the authors, that implicates "free" Lck, not associated with co-receptors, as having the major role in TCR activation. It is clear that the two motifs are important for CD4 function at low pMHCII ligand density. The proposal that they modulate interactions of TCR complex with cholesterol or other membrane lipids is an interesting one, and it would be worth further exploring by employing approaches that alter membrane lipid composition. The JM sequence presumably dictates localization within the membrane, by way of palmitoylation, which may be critical to regulate avidity of the TCR:CD4 complex for pMHCII or TCR complex allosteric effects that influence the activation threshold. Experiments that explore the basis of the mutant phenotype could substantially enhance the impact of this study.

      We appreciate these thoughtful comments and suggestions. We will restate what we wrote in our preliminary response to the reviews to explain the scope of the current study:

      To address comments about the limited scope of this study and referencing of the Methods secBon to our prior study, we would like to note that we submiXed the current study via the Research Advance mechanism. Our goal was to build upon the conclusions of our 2022 eLife publicaBon (PMID: 35861317) and address an unresolved quesBon from that study (as nicely summarized by Reviewer #2). In the current manuscript we present data from reducBonist experiments that were designed specifically for this purpose and, as noted by the reviewers, we provide answers to the quesBon being asked. We think that the Research Advance mechanism is an ideal opportunity to make these results available to the field given the stated purpose of such arBcles (for reference: “A Research Advance might use a new technique or a different experimental design to generate results that build upon the conclusions of the original research by, for example, providing new mechanis=c insights or extend the pathway under inves=ga=on…”). Now that we have provided evidence that CD4 does not recruit Lck to phosphorylate TCR-CD3 ITAMs in our system, nor do the GGXXG and (C/F)CV+C motifs play a role in enabling CD4 to regulate Lck proximity to TCR-CD3, we agree that it is important to form and test alternative hypotheses for how TCR-CD3 signaling is initiated.

    1. Author Response

      Reviewer #1 (Public Review):

      Combining functional MRI with a decoder, the authors probe the neural substrate of the double drift illusion in visual cortex. Their elegant behavioural paradigm keeps the actual retinal position of the stimulus stable while inducing the illusion with a combination of smooth pursuit and visual motion. The results show that the illusory drift path can be decoded from a signal in extrastriate visual area hMT+ but not other visual areas. Importantly, this can be done in the absence of spatial attention to the stimulus location.

      The particular strengths of this study lie in the elegant paradigm and the clear attentional control. The methodology of the decoder is powerful and at the same time straightforward, well explained, and well accepted in the literature. A potential weakness of the study is the lack of simultaneous eye movement recordings in the scanner. Such data could have provided further clarification of the potential underlying neural mechanism and whether differences in eye movements could contribute to the decoding of the visual illusion path. There are some controls that mitigate this.

      We have addressed the Reviewer's comment by repeating the fMRI experiment in a new group of subjects in which we were able to also obtain concurrent, high-quality eye tracking. When we initially conducted the experiment, it was not possible to perform eye tracking in the 7T scanner at NIH. Because of this limitation, we were forced to depend on careful eye tracking in a pre-scan behavioral experiment. But in the ensuing period of time, we have developed a protocol for obtaining high quality eye tracking with an Eyelink 1000 mounted in the bore of the scanner. Now that we have the ability to collect concurrent eye tracking, we repeated the fMRI experiment and found that our main fMRI result replicated (i.e, it was possible to decode the direction of the illusion from fMRI responses in hMT+). Additional, the concurrent fMRI eye tracking enabled us to make four important observations (see new Fig 4):

      First, subjects maintained stable fixation when the target was stationary during fixation and accurately pursued the vertically moving target during illusion (Fig 4). This analysis confirms that the drifting Gabor remained at a relatively fixed position on the retina during the illusory period.

      Second, there were no differences in microsaccades between any of the conditions. We quantified the direction, amplitude, and frequency of all saccades for each condition. While we did observe small rightward microsaccades, none of the microsaccade characteristics differed between conditions. The rightward microsaccades may have been due to the sustained eccentric leftward fixation. Or, it may have been due to attention to the right visual field stimulus (despite the foveal attention task). Or it may have reflected the known horizontal microsaccade bias. Regardless, we do not believe our fMRI results are related to microsaccades because these small saccades did not differ across condition.

      Finally, we wondered if small not-easily-quantified ocular deviations could have differed between conditions, and somehow result in differences in fMRI activity picked up by the decoding analysis. To test for this possibility, we trained a classier to discriminate condition based on the raw eye traces (just as we did in the main fMRI data analysis). But unlike the fMRI analysis, we found that it was not possible to decode the direction of the illusion from the eye traces themselves.

      We conclude that the ability to decode the illusion from fMRI responses were not due to differences in eye movements caused by the illusion.

      The authors provide important evidence for a potential neural substrate in the extrastriate visual cortex for encoding the perceived spatial location of a moving stimulus. This significantly extends previous studies that showed relevant spatiotopic signals outside visual cortex. Understanding the neural substrate and the underlying neural mechanisms for encoding perceived spatiotopic location are of broad importance for our understanding of the neural basis of sensory perception.

      We thank the Editor for this positive assessment of our work.

      Reviewer #3 (Public Review):

      The authors studied the neural basis of the double drift illusion, an illusion in which a Gabor drifting both horizontally within an aperture and moving vertically along a path appears to follow a diagonal trajectory, perceptually displaced off its true vertical path in the direction of the horizontal drift. The illusion is strong and its neural basis is intriguing. The authors suggest it can be used to address the locus of spatiotopic processing in the brain. They find that fMRI BOLD activity in hMT+ can be used to decode the illusory drift direction of the stimulus, even under conditions of withdrawn attention. They internally replicate this result and ensure it is not due to local motion. They interpret the finding to indicate that hMT+ contains spatiotopic information. This was a carefully designed and conducted study, and the manuscript writing and figures are clear.

      Despite the care that went into the study design and control experiments, I see some potential interpretational issues, and I am uncertain about the scientific advance. My main questions are about the interpretation of the findings, the possible confound of smooth pursuit eye movements, and the relation to previous studies, including previous fMRI studies of the same illusion. I also would like to see more thorough reporting of behavior.

      Major comments

      1) The authors motivate the study by saying that there have been conflicting results about which brain areas are involved in spatiotopic coding, but they did not give an indication about why there might be conflicting results or why the current study is suitable to address the previous discrepancies. Is this study simply adding another observation to the existing body of literature, or does it go beyond previous studies in a critical theoretical way?

      There have indeed been conflicting results in the literature. One idea that has received some prior support in the literature is that spatiotopic location information can depend on the task. Our experiment tests this idea by measuring cortical responses during an illusion that involves spatiotopic coding. Previous human fMRI studies reporting spatiotopic coding have not really linked cortical activity with the perception of spatiotopic coordinates. Hence, we feel that our results make a unique contribution to the field.

      2) The authors interpret the finding of illusory drift direction encoding in hMT+ to mean that hMT+ is coding the illusory spatial position of the stimulus. But could an alternative explanation be that hMT+ is coding the illusory global motion direction, and not the spatial position per se? If this is a possible account, then the result would still indicate that an illusory motion percept is reflected in hMT+ but it would seem not to answer the question about spatiotopic coding which motivated the paper.

      Here, the Reviewer suggests an interesting alternative explanation—that responses in MT pertain to the direction of global motion rather than stimulus position. However, this alternative possibility would still involve spatiotopic coding. In order for the brain to compute the direction of global motion of a stimulus that is at a fixed retinal position, some spatiotopic computation must occur. So, we do not agree with the Reviewers suggestion that this alternative explanation undermines the motivation of this study.

      3) It is good that the authors sought to rule out the possibility that smooth pursuit eye movements were driving the decoding results in hMT+, but I'm not sure they have yet convincingly done so. Decoding based on the pursuit selective voxels alone was very nearly significant (p = 0.052), which was not acknowledged in the text of the paper. Furthermore, because voxels that were both pursuit and stimulus selective were excluded from the pursuit selective ROI, decoding performance in that ROI may have been underestimated.

      To clarify, voxels that were identified by both localizers were NOT excluded from either ROI. When we repeated decoding (from Expt 2, Fig 3B) using disjoint voxel selection (i.e., analyzing voxels that only responded in the stim localizer, or only responded in the pursuit localizer, and excluding voxels that responded to both), we obtained qualitatively similar results, although the magnitude of the effects were smaller, which is not surprising given the much smaller number of voxels remaining in the ROI, and hence the disjoint ROIs only proved marginally significant in MT for the stim localizer (p=0.049).

      4) A previous fMRI study of the double drift illusion (Liu et al. 2019 Current Biology) also found above chance decoding of illusory drift direction in hMT+. The authors mention this study but do not discuss it, so it was unclear to me what the advance is of the current study over that study. The main differences I see are that in the current study, 1) the observer is also moving their eyes so that the double drift stimulus is theoretically stabilized on the retina, and 2) attention is withdrawn from the stimulus. But in both studies, hMT+ contains information about the illusory drift direction even though retinotopic information is the same, so it's not clear to me that the differences between these studies lead to fundamentally different interpretations.

      The results of Liu et al. are not relevant to the reference frame used to encode the stimulus. Because subjects were fixating in Liu et al., the encoding of the illusion could have been in either retinal or spatiotopic coordinates. In our study, the stimulus must have been encoded in spatiotopic coordinates. One interesting feature of Liu et al. is the issue of cross decoding the illusion and actual percept (training the decoder on veridical motion of different angles, and then testing the decoder on data collected during the illusion). One potentially interesting extension of the cross decoding approach would be to train the decoder on a version of the illusion involving fixation (as in Liu et al), but then testing the decoder on the illusion during pursuit. One would expect cross decoding if spatiotopic coordinates are used in both cases. We now discuss this possibility (Discussion: Relationship to a previous study of the double-drift illusion).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study addresses the fundamentally unresolved question of why many thousands of small-effect loci contribute more to the heritability of a trait than the large-effect lead variants. The authors explore resource competition within the transcriptional machinery as one possible explanation with a simple theoretical model, concluding that the effects of resource competition would be too small to explain the heritability effects. The topic and approximation of the problem are very timely and offer an intuitive way to think about polygenic variation, but the analysis of the simple model appears to be incomplete, leaving the main claims only partially supported.

      We thank eLife for recognizing the importance of our work. We hope the revised manuscript addresses the reviewers’ reservations.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study explores whether the extreme polygenicity of common traits can be explained in part by competition among genes for limiting molecular resources (such as RNA polymerases) involved in gene regulation. The authors hypothesise that such competition would cause the expression levels of all genes that utilise the same molecular resource to be correlated and could thus, in principle, partly explain weak trans-regulatory effects and the observation of highly polygenic architectures of gene expression. They study this hypothesis under a very simple model where the same molecule binds to regulatory elements of a large number m of genes, and conclude that this gives rise to trans-regulatory effects that scale as 1/m, and which may thus be negligible for large m.

      We thank the reviewer for their thorough and thoughtful review of our manuscript.

      The main limitation of this study lies in the details of the mathematical analysis, which does not adequately account for various small effects, whose magnitude scales inversely with the number m of genes that compete for the limiting molecular resource. In particular, the fraction of "free" molecule (which is unbound to any of the genes) also scales as 1/m, but is not accounted for in the analysis, making it difficult to assess whether the quantitative conclusions are indeed correct.

      It is explicitly accounted for in the supplement.

      Second, the questions raised in this study are better analysed in the framework of a sensitivity or perturbation analysis, i.e., by asking how changes in expression level or binding affinity at one gene (rather than the total expression level or total binding affinity) affect expression level at other genes. In the context of complex traits, where an increase in gene expression can either increase or decrease the trait, we believe the most important quantity of interest is variation in expression and, therefore, trait variation. Nevertheless, our results do show that the relative change in expression due to competition is also small.

      Thus, while the qualitative conclusion that resource competition in itself is unlikely to mediate trans-regulatory effects and explain highly polygenic architectures of gene expression traits probably holds, the mathematical reasoning used to arrive at this conclusion requires more care.

      In my opinion, the potential impact of this kind of analysis rests at least partly on the plausibility of the initial hypothesis- namely whether most molecular resources involved in gene regulation are indeed "limiting resources". This is not obvious, and may require a careful assessment of existing evidence, e..g., what is the concentration of bound vs. unbound molecular species (such as RNA polymerases) in various cell types?

      We intentionally looked at the most extreme case of extreme resource limitation, and we conclude that since extreme resource limitation is a small effect, the same would be true of weak resource limitation, when unbound molecules play an important role. We put more emphasis on this point in our revised text.

      Reviewer #1 (Recommendations For The Authors):

      While the main conclusion that resource competition in itself is unlikely to mediate trans effects and explain high levels of polygenicity may well be correct, I am not convinced that the mathematical reasoning presented in support of this conclusion is entirely correct. I will attempt to outline my concerns mainly in the context of section 2, since the arguments in sections 3 and 4 build upon this.

      (a) The key assumption underlying the approximations in equations 3, 4, and 5 is that there is very little free polymerase, in other words /_0 is a small quantity. However, the second and third terms that emerge in equation 7 are also small quantities and (as far as I can see) of the same order as /_0. Thus, one cannot simply use equation 4 or 5 as a starting point to derive eq. 7 and should instead use the exact x_i = (g_i [G])/ (1+g_tot [G]), in order to make sure that all (and not just some) terms that are similar in order of magnitude are accounted for in the analysis.

      The concentration of free polymerase is marked as [P], and we explicitly assume (just before eq. 2) that [P]<<[P]0 with [P]0 being the overall concentration of polymerase. This is a conservative assumption – we consider extreme resource competition with little free polymerase and since we since only a small effect in this extreme scenario we assume it would be a small effect also for less extreme scenarios. We put more emphasis on this point in our revised text.

      More concretely, the difference between the exact x_i = (g_i [G])/ (1+g_tot [G]) and the approximate x_i = (g_i / g_tot) is precisely 1/m (for large m) in the example considered line 246 onwards. Thus, I suspect that the conclusion that Var[x_i] = (1-1/m)Var[g_i] in that example is just an artefact of starting with eqs. 4 and 5. As a sanity check, it may be useful to actually simulate resource competition explicitly (maybe using a deterministic simulation) under the explicit model [PG_i] = g_i [G] and _0 = + Sum[[PG]_i , i=1,m] without making any further approximations to see if perturbations in g_i actually produce Order [1/m] effects in the variance of x_i for the example considered line 246 onwards (this would require simulating with a few different m and plotting Var[x_i] vs. m for example).

      The exact equation the reviewer is alluding to describes a scenario of non-extreme resource competition. If g_tot [G]>>1, i.e. if most polymerase is bound to a gene then x_i is equal to g_i/g_tot and this is the scenario we are considering of extreme competition. If g_tot [G]<<1, then x_i=g_i [G] and competition has no effect. While the intermediate case is interesting, we see no reason for the effects to be larger than in the extreme competition case. We have added the results of simulations in the supplement to validate our arguments.

      Lines 231-239: Because of the concerns highlighted above and questions about the validity of equation 7, I am not convinced that the interpretations given here and also in section 4 are correct.

      (b) Lines 219-230 (including equations 6 and 7): I think to address the question of whether genetic changes in cis-regulatory elements for a given gene have an effect on other genes (under this model of resource competition), it is better to spell out the argument in terms of Var[ dx_i ] rather than Var[x_i], where dx_i is the change in expression level at gene i due to changes at all m genes, dg_i is the change in gene activity due to (genetic) changes in the relevant regulatory elements associated with gene i etc. Var[ dx_i ] can then be expressed as a sum of Var[dg_i], Var[dg_tot] and Cov[d g_i, dg_tot]. However, I suspect that to do this correctly, one should not start with the approximate x_i=g_i/g_tot : see previous comment.

      The variance of the deviation from the mean is mathematically identical to the overall variance, Var[ dx_i ]= Var[ x_i ]. Our analysis is therefore equivalent to the suggested analysis.

      Somewhere in all of this, there is also an implicit assumption that E[dg_i] is zero, i.e, mutations are as likely to increase as to decrease binding affinities so that one needs to only consider Var[dx_i] and not E[dx_i]; this assumption should be spelled out.

      Our results concern the variation around trait means and therefore we have not included a possible mean effect of mutation, which would not affect the results but just shift the mean.

      Some minor comments (mostly related to the introduction and general context):

      • I think it would be worth connecting more with the literature on molecular competition and gene regulation (see e.g., How Molecular Competition Influences Fluxes in Gene Expression Networks, De Vos et al, Plos One 2011). Even though this literature does not frame questions in terms of "polygenicity of traits", these analyses address the same basic questions: to what extent do perturbations in gene expression at one gene affect other genes, or to what extent is there crosstalk between different genes or pathways?

      We have expanded our introduction to refer to De Vos et al, as well as a few other papers we have recently become aware of. (e.g., Jie Lin & Ariel Amir Nature Communications volume 9, Article number: 4496 (2018))

      • Lines 88-89: "supports the network component of the model" is a vague phrase that does not convey much. It would be useful to clarify and make this more precise.

      We have clarified this phrasing in the text.

      • Lines 113-114: In the context of "selective constraint", it may also be worth discussing previous work by one of the authors: "A population genetic interpretation of GWAS findings for human quantitative traits". What implications would stabilizing selection on multiple traits (as opposed to simple purifying selection) have for the distribution of variances across trait loci and the extent to which trait architectures appear to be polygenic?

      While most definitely of great interest to some of the authors, the distribution of variance across loci does not affect our results.

      References: Barton and Etheridge 2018 in line 54 is not the correct reference; it should be Barton et al 2017 (paper with Amandine Veber). Fisher 1919 in line 52 is actually Fisher 1918. The formatting of references in the next paragraph (and in various other places in the paper) is also a bit unusual, with some authors referred to by their full names and others only by their last. I believe that it may be useful to crosscheck references throughout the paper.

      We have crosschecked the references in the paper.

      Line 164: Some word appears to be missing here. Maybe bound -> bound to ?

      Fixed

      Reviewer #2 (Public Review):

      The question the authors pose is very simple and yet very important. Does the fact that many genes compete for Pol II to be transcribed explain why so many trans-eQTL contribute to the heritability of complex traits? That is, if a gene uses up a proportion of Pol II, does that in turn affect the transcriptional output of other genes relevant or even irrelevant for the trait in a way that their effect will be captured in a genome-wide association study? If yes, then the large number of genetic effects associated with variation in complex traits can be explained but such trans-propagating has effects on the transcriptional output of many genes.

      This is a very timely question given that we still don't understand how, mechanistically, so many genes can be involved in complex traits variation. Their approach to this question is very simple and it is framed in classic enzyme-substrate equations. The authors show that the trans-propagating effect is too small to explain the ~70% of heritability of complex traits that are associated with trans-effects. Their conclusion relies on the comparison of the order of magnitude of a) the quantifiable transcriptional effects due to Pol II competition, and b) the observed percentage of variance explained by trans effects (data coming from Liu et al 2019, from the same lab).

      The results shown in this manuscript rule out that competition for limited resources in the cell (not restricted to Pol II, but applicable to any other cellular resource like ribosomes, etc) could explain the heritability of complex traits.

      We thanked the Reviewer for his resounding support of our paper!

      Reviewer #2 (Recommendations For The Authors):

      The authors rely on simulated data, and although the conclusions hold in a biologically-realistic scenario given the big difference in effect sizes, I wonder if the authors could provide data from the literature (if available) that give the reader a point of reference for the steady state of cells in terms of free/occupied Pol II molecules and/or free/occupied transcription binding sites. This information won't change the conclusion of the manuscript, but it will put it in the context of real biological data.

      We have scoured the literature, but have not found readily available data with which to validate our results (beyond that which is already referenced).

      Reviewer #3 (Public Review):

      Human complex traits including common diseases are highly polygenic (influenced by thousands of loci). This observation is in need of an explanation. The authors of this manuscript propose a model that competition for a single global resource (such as RNA polymerase II) may lead to a highly polygenic architecture of traits. Following an analytical examination, the authors reject their hypothesis. This work is of clear interest to the field. It remains to be seen if the model covers the variety of possible competition models.

      We thank the Reviewer for his assessment, support and comments.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript provides a straightforward and elegant quantitative argument that the competition for the RNA polymerase is not a significant source of trans-eQTLs and, more generally, of genetic variance of complex polygenic phenotypes. This is an unusual manuscript because the authors propose a hypothesis that they confidently reject based on a calculation. This negative result is intuitive. Still, the manuscript is of interest. Progress in understanding the highly polygenic architecture of complex traits is welcome, and the resource competition hypothesis is quite natural. I have three specific comments/concerns listed below.

      (1) The manuscripts states that V(x_i)=V(g_i/g_tot). Unless I am missing something, this seems to result from a very strong implicit assumption that all genetic variance is due to variation in the binding of RNA polymerase, while x_i_max is a constant. I would expect that x_i_max may also be genetically variable due to many effects unrelated to the Pol II binding (e.g. transcription rate, bursting, presence of R-loops etc.). I guess that the assumption made by the authors is conservative.

      Indeed. We made conservative assumptions throughout, aiming to consider the most extreme scenario in which resource competition may affect trait variation. Our logic being that if even under the most extreme scenario resource competition is a small effect then it is a small effect in all scenarios. We put more emphasis on this point in our revised text.

      (2) The manuscript focuses on the competition for RNA polymerase but suggests that the lesson learned is highly generalizable. However, it is an example of a single global limiting resource resulting in first-order kinetics. What happens in a realistic scenario of competition for multiple resources associated with transcription and with downstream processes (free ribonucleotides, spliceosome, polyadenylation machinery, ribosome, post-translational modifications)? It is possible that in most cases a single resource is a limiting factor, but an investigation (or even a brief discussion) of this question would support the claim that the results are generalizable.

      We expect competition for multiple resource to result in similarly weak effects. Since there is not a great number of such resources, we do not expect it to change our qualitative result. We added language to that effect in the main text.

      (3) Alternatively, what happens in a scenario of competition for multiple local resources shared by a few genes (co-factors, substrates, chaperones, micro-RNAs, post-translational modification factors such as kinases, degradation factors, scaffolding proteins)? In this case, each gene would compete for resources with a few other genes increasing polygenicity without a global competition with all other genes. Intuitively, a large set of such local competitions may lead to a highly polygenic architecture.

      This is indeed a scenario in which competition may be a large effect which we mention in our discussion. “the conclusions may differ in contexts where a very small number of genes compete for a highly limited resource, such as access to a particular molecular transporter”

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their reading of the manuscript, and their suggestions. We have extensively addressed all these concerns in the text, and also included several new data and figures in the revised version of the manuscript. We hope that our response and the new experimental data fully address the concerns raised by the reviewers. We include a detailed, pointby-point response to each of the reviewer concerns, pointing to new data and specific changes made in the main manuscript.

      Note: Do note that these new data have resulted in a new figure-figure 6, a new supplementary figure -figure 2-figure supplement 2, and an increase in the number of panels in each figure, as well as supplementary figures.

      General response comments, highlighting a few aspects missed by the reviewers

      This manuscript has an enormous amount of data in it. This is understandable, since in part we are proposing an entirely new hypothesis, and way to think about mitochondrial repression, built around substantial circumstantial evidences from diverse literature sources. But to keep the narrative readable and the main idea understandable, a lot of information had to be only very briefly mentioned in the text, and is therefore included as supplemental information. Due to that, it may not always be apparent that this study has set several technical benchmarks. These experiments are extremely challenging to perform, took many iterations to standardize, and in themselves are a first in the field. Yeast cells have the highest known rate of glycolytic flux for any organism. Measuring this glycolytic rate using the formation of intermediates is hard, and all current estimates have been in vitro, and using a stop-flow type set up. In this study, we optimized and directly measured the glycolytic flux using isotope labelled glucose (13C-glucose), which has never been reported before in highly glycolytic cells such as yeast. This is due to the very rapid label saturation (within seconds) after 13C glucose pulse (as is now shown in the figure 2-figure supplement 1). For brevity, this is summarized in this study with sufficient information to reproduce the method, but we will put out a more detailed, associated methodology paper describing several challenges, infrastructure requirements, and resources to be able to carry out these types of experiments using yeast. An added highlight of these experiments with WT and Ubp3 deletion strains is the most direct till date experimental demonstration that glycolytic flux in yeast in high glucose follows zero-order kinetics, and depends entirely on the amounts of the glycolytic enzymes (presumably operating at maximal activity). This nicely complements the recent study by Grigatis 2022 (cited in the discussion), that suggests this possibility.

      Separately, this study required the estimation of total inorganic phosphates, as well as mitochondrial pools of phosphates. Till date, there are no studies that have estimated mitochondrial pools of phosphate (for a variety of reasons). In this study, we also experimentally determined the changes in mitochondrial phosphate pools. For this, we had to establish and standardize a rapid mitochondrial isolation method in yeast. Thus, this study provides the first quantitative estimates of mitochondrial Pi amounts (in the context of measured mitochondrial outputs), as shown now in Figure 4. This component on mitochondrial isolation in yeast to assess metabolites may also be explored in future as a methods paper.

      Specific responses to the Reviews:

      Reviewer #1 (Public Review):

      The study by Vengayil et al. presented a role for Ubp3 for mediating inorganic phosphate (Pi) compartmentalization in cytosol and mitochondria, which regulates metabolic flux between cytosolic glycolysis and mitochondrial processes. Although the exact function of increased Pi in mitochondria is not investigated, findings have valuable implications for understanding the metabolic interplay between glycolysis and respiration under glucose-rich conditions. They showed that UBP3 KO cells regulated decreased glycolytic flux by reducing the key Pidependent-glycolytic enzyme abundances, consequently increasing Pi compartmentalization to mitochondria. Increased mitochondria Pi increases oxygen consumption and mitochondrial membrane potential, indicative of increased oxidative phosphorylation. In conclusion, the authors reported that the Pi utilization by cytosolic glycolytic enzymes is a key process for mitochondrial repression under glucose conditions.

      (1) However, the main claims are only partially supported by the low number of repeats and utilizing only one strain background, which decreased the overall rigor of the study. The fullpower yeast model could be utilized with testing findings in different backgrounds with increased biological repeats in many assays described in this study. In the yeast model, it has been well established that many phenotypes are genotype/strain dependent (Liti 2019, Gallone 2016, Boekout 2021, etc...). with some strains utilizing mitochondrial respiration even under high glucose conditions (Kaya 2021). It would be conclusive to test whether wild strains with increased respiration under high glucose conditions would also be characterized by increased mitochondrial Pi.

      “However, the main claims are only partially supported by the low number of repeats and utilizing only one strain background, which decreased the overall rigor of the study. The full-power yeast model could be utilized with testing findings in different backgrounds with increased biological repeats in many assays described in this study.”

      Thank you for the suggestion. We agree that a larger, universal statement cannot be made with data from a single strain, since yeasts do have substantial diversity. In this study, we had originally used a robust, prototrophic industrial strain (CEN.PK background). We have now utilized multiple, diverse strains of S. cerevisiae to test our findings. This includes strains from the common laboratory backgrounds – W303 and BY4742 – which have different auxotrophies, as well as another robust, highly flocculent strain from a prototrophic Σ1278 background. Using all these strains, we now comprehensively find that the role of altered Pi budgeting as a constraint for mitochondrial respiration, and the role of Ubp3 as a regulator of mitochondrial repression is very well conserved. In all tested strains of S. cerevisiae the loss of Ubp3 increases mitochondrial activity (as shown by increased mitochondrial membrane potential and increased Cox2 levels in Figure 6A, B). These data now expand the generality of our findings, and strengthen the manuscript. These results are included in the revised manuscript as a new figure- Figure 6 and the associated text.

      Some of the included data in the revised manuscript are shown below:

      Author response image 1.

      Mitochondrial activity and Cox2 levels in ubp3Δ in different genetic backgrounds

      We also used the W303 strain to assess Pi levels, and its role in increasing mitochondrial respiration. We find that the loss of Ubp3 in this genetic background also increases Pi levels and that the increased Pi is necessary for increasing mitochondrial respiration (Figure 6C, D).

      Author response image 2.

      Basal OCR in WT vs ubp3Δ (W303 strain background) in normal vs low Pi

      These experiments collectively have strengthened our findings on the critical role of intracellular Pi budgeting as a general constraint for mitochondrial respiration in high glucose.

      “It would be conclusive to test whether wild strains with increased respiration under high glucose conditions would also be characterized by increased mitochondrial Pi.”

      Addressed partially above. Right now the relative basal respiration in glucose across different strains is not well known. We measured mitotracker activity in high glucose in multiple WT strains of S. cerevisiae (W303, Σ1278, S288C and BY4742, compared to the CEN.PK strain). These strains all largely had similar mitotracker potential, except for a slight increase in mitochondrial membrane potential in Σ1278 strain, but not in other strains. We further characterized this using Cox2 protein levels as well as basal OCR, and found that these do not increase. These data is shown below, and is not included in the main text since it does not add any new component to the study.

      Author response image 3.

      Mitochondrial respiration in different WT strains

      We did find this suggestion very interesting though, and are exploring directions for future research based on this suggestion. Since we have now identified a role for intracellular Pi allocation in regulating the Crabtree effect, an interesting direction can be to understand the glucose dependent mitochondrial Pi transport in Crabtree negative yeast strains. We will have to bring in a range of new tools and strains for this, so these experiments are beyond the focus of this current study.

      We hope that these new experiments in different genetic backgrounds increases the breadth and generality of our findings, and stimulates new lines of thinking to address how important the role of Pi budgeting as a constraint for mitochondrial repression in high glucose might be.

      (2) It is not described whether the drop in glycolytic flux also affects TCA cycle flux. Are there any changes in the pyruvate level? If the TCA cycle is also impaired, what drives increased mitochondrial respiration?

      Thank you for pointing this out, and we agree this should be included. We have addressed these concerns in the revised version of the manuscript

      Since glucose derived pyruvate must enter the mitochondrial TCA cycle, one possibility is that a decrease in glycolytic rate could decrease the TCA flux. An alternate possibility is that the cells coincidently increase the pyruvate transport to mitochondria, to thereby maintain the TCA cycle flux comparable to that of WT cells. To test both these possibilities, we first measured the steady state levels of pyruvate and TCA cycle intermediates in WT vs ubp3Δ cells. We do not observe any significant change in the levels of pyruvate, or TCA cycle intermediates (except malate, which showed a significant decrease in ubp3Δ cells). This data is now included in the revised manuscript as Figure 2 – figure supplement 1D and figure supplement 2 A, along with associated text.

      Author response image 4.

      Pyruvate levels in WT vs ubp3Δ

      Author response image 5.

      Steady state TCA cycle intermediate levels

      Next, in order to address if the TCA cycle flux is impaired in ubp3Δ cells, we also measured the TCA cycle flux in WT vs ubp3Δ cells by pulsing the cells with 13C glucose and tracking 13C label incorporation from glucose into TCA cycle intermediates. This experiment first required substantial standardization, for the time of cell collection and quenching post 13C glucose addition, by measuring the kinetics of 13C incorporation into TCA cycle intermediates at different time points after 13C glucose addition. The standardization of this method is now included in the revised manuscript as Figure 2 – figure supplement 2 C, along with associated text, and is shown below for reference.

      Author response image 6.

      Kinetics of 13C labelling in TCA cycle intermediates

      Actual TCA cycle flux results: For measuring the TCA cycle flux, cells were treated with 1% 13C glucose, quenched and samples were collected at 7 mins post glucose addition which is in the linear range of 13C label incorporation (Figure 2- Figure 2 – figure supplement 2 C).

      Result: We did not observe any significant changes in the relative 13C label incorporation in TCA cycle intermediates. This data is included in the revised manuscript as Figure 2 – figure supplement 2 D, along with associated text, and is below for your reference.

      Author response image 7.

      TCA cycle flux

      What these data show is that the TCA cycle flux itself is not altered in ubp3Δ. A likely interpretation of this data is that this is due to the increase in the pyruvate transport to mitochondria in ubp3Δ cells, as indicated by the ~10-fold increase in Mpc3 (mitochondrial pyruvate transporter) protein levels (shown in Figure 5-figure supplement 5H), allowing the net same amount of pyruvate into the mitochondria. This increased mitochondrial pyruvate transport could support maintaining the TCA flux in ubp3Δ cells, and supporting the increased respiration. Putting a hierarchy together, the increased respiration in ubp3Δ cells could therefore be primarily due to increased Pi transport, followed by a consequent increase in ETC proteins. We leave it to the readers of this study to make this conclusion.

      We hope that we have addressed all concerns that the reviewer has with respect to TCA cycle flux in ubp3Δ cells.

      (3) In addition, some of the important literature was also missed in citation and discussion. For example, in a recent study (Ouyang et al., 2022), it was reported that phosphate starvation increases mitochondrial membrane potential independent of respiration in yeast and mammalian cells, and some of the conflicting results were presented in this study.

      We are very aware of the recent study by Ouyang et al, which reports that Pi starvation increases mitochondrial membrane potential independent of respiration. However, this study is distinct from the context of our case due to the reasons listed below.

      (a) The reviewer may have misinterpreted our low Pi condition as Pi starvation. There is no Pi ‘starvation’ in this study. Here, we cultured ubp3Δ and tdh2Δtdh3Δ cells in a low Pi medium with 1 mM Pi concentration in order to bring down the intracellular free Pi to that of WT levels. These cells are therefore not Pi-starved, but have been manipulated to have the same intracellular Pi levels as that of WT cells, as shown in Figure 4-figure supplement 1D. The Pi concentration in the medium is still in the millimolar range, and the cells are grown in this medium for a short time (~4 hrs) till they reach OD600 ~ 0.8. This is entirely different from the conditions used in Ouyang et al., 2022, where the cells were grown in a Pi-starvation condition with 1-100 micromolar Pi in the medium for a time duration of 6-8 hrs. Since cells respond differentially to changes in Pi concentrations over time (Vardi et al., 2014), the response to low Pi vs Pi starvation will be completely different.

      (b) In our study, mitochondrial membrane potential is used as only one of the readouts for mitochondrial activity. Our estimations of mitochondrial respiration are established by including other measurements such as Cox2 protein levels (as an indicator of the ETC) and basal OCR measurements (measuring respiration), all of which provide distinct information. The mitochondrial membrane potential can be regulated independent of mitochondrial respiration state (Liu et al., 2021), using membrane potential alone as a readout to estimate mitochondrial respiration can therefore be limiting in the information it provides. As indicated earlier, mitochondrial membrane potential can change, independent of mitochondrial respiration (Ouyang et al., 2022) and ATP synthesis (Liu et al., 2021). Since the focus of our study is mitochondrial respiration, and not just the change in membrane potential, making conclusions based on potential alone are ambiguous. Most studies in the field have in fact not used the comprehensive array of distinct estimates that we use in this study, and we believe the standards set in this study should become a norm for the field.

      (c) The only mutant that is similar to the Ouyang et al study is the Mir1 deletion mutant, which results in acute Pi starvation in mitochondria. In this strain, we find an increase in mitochondrial membrane potential. The data is not included in the manuscript but is shown below.

      Author response image 8.

      Mitochondrial potential in WT vs mir1Δ

      As clear from this data, mitochondrial membrane potential is significantly high in mir1Δ cells. However, the basal OCR and Cox2 protein levels clearly show decreased mitochondrial respiration which is expected in this mutant (Figure 5 A,B). This in fact highlights the limitations of solely relying on mitochondrial membrane potential measurements to draw conclusions, as doing so will lead to a misinterpretation of the actual mitochondrial activity in these cells. We do not wish to highlight limitations in other studies, but hope we make our point clear.

      (4) An additional experiment with strains lacking mitochondrial DNA under phosphate-rich and restricted conditions would further strengthen the result.

      Strains lacking mitochondrial DNA (Rho0 cells) cannot express the mitochondrially encoded ETC subunit proteins. These strains are therefore incapable of performing mitochondrial respiration. Since Rho0 cells are known to utilize alternate mechanisms to maintain their mitochondrial membrane potential (Liu et al., 2021), using mitotracker fluorescence as a readout of mitochondrial respiration in these strains under different Pi conditions is inconclusive and misleading due to the reasons mentioned in point number 3(b and c). However, since this was a concern raised by the reviewer, we now measured basal OCR in WT and Rho0 strains with Ubp3 deletion under normal vs low Pi medium. As expected, Rho0 cells show extremely low basal OCR values, an entire order of magnitude lower than WT cells. At these very low (barely detectable) levels the deletion of Ubp3 or change in Pi concentration in the medium does not change basal OCR, since these strains are not capable of respiration. We have included this data as Figure 4-figure supplement 1G.

      Author response image 9.

      Basal OCR in Rho0 cells

      (5) Western blot control panels should include entire membrane exposure, and non-cut western blots should be submitted as supplementary.

      The non-cut western blot images and the loading controls are now included in the revised manuscript as a supplementary file 2.

      (6) In Figure 4, it is shown that Pi addition decreases basal OCR to the WT level. However, the Cox2 level remains significantly higher. This data is confusing as to whether mitochondrial Pi directly regulates respiration or not.

      As described in the previous point, the Cox2 levels and the OCR provide distinct pieces of information. In figure 4, we show that culturing ubp3Δ in low Pi significantly decreases both Cox2 protein levels and basal OCR. Since Cox2 protein levels and basal OCR are different readouts for mitochondrial activity, there could be differences in the extent by which Pi availability controls each of these factors. Basal OCR is a direct readout for mitochondrial respiration, and is regulated by multiple factors including ETC protein levels, rate of ATP synthesis, rate of Pi transport etc. In figure 4, we find that culturing ubp3Δ in low Pi decreases basal OCR to WT level. This strongly suggests that high Pi levels are necessary to increase basal OCR in ubp3Δ.

      (7) Representative images of Ubx3 KO and wild-type strains stained with CMXRos are missing.

      Thank you for noticing this. This data is now included in the revised manuscript as Figure 1figure supplement 1C.

      Author response image 10.

      (8) Overall, mitochondrial copy number and mtDNA copy number should be analyzed in WT and Ubo3 KO cells as well as Pi-treated and non-treated cells, and basal OCR data should be normalized accordingly. The reported normalization against OD is not appropriate.

      This is a valid concern raised by the reviewer, and something we had extensively considered during the study. To normalize the total mitochondrial amounts in each strain, we always measure the protein levels of the mitochondrial outer membrane protein Tom70. While we had described this in the methods, it may not have been obvious in the text. But this information is included in Figure 1-figure supplement 1G. We did not observe any significant change in Tom70 levels, suggesting that the total mitochondrial amount does not change in ubp3Δ, and we have noted this in the manuscript (results section relevant to Figure 1). As an additional control, to directly measure the mitochondrial amount in these conditions, we have now measured the mitochondrial volume in ubp3Δ cells and WT cells treated with Pi. For this, we used a strain which encodes mitochondria targeted with mNeon green protein (described in Dua et al., JCB, 2023), and which can therefore independently assess total mitochondrial amount. We do not observe any changes in mitochondrial volume or amounts in ubp3Δ cells or WT+Pi, compared to that of WT cells. Therefore, the change in mitochondrial respiration in Ubp3 deletion and Pi addition are not due to changes in total amounts of mitochondria in these conditions. Given all these, the normalization of basal OCR using total cell number is therefore the most appropriate way for normalization. This is also conventionally used for basal OCR normalization in multiple studies.

      We have now included these additional data on mitochondrial volumes and amounts in the revised manuscript as Figure1-figure supplement 1F and Figure5-figure supplement 1D, and associated text, and is shown below.

      Author response image 11.

      Mitochondrial volume in WT vs ubp3Δ cells

      Author response image 12.

      Mitochondrial volume in WT and WT+Pi

      These data collectively address the reviewer’s concerns regarding changes in mitochondrial amounts in all the conditions and strains used in this study.

      Reviewer #2 (Public Review):

      Summary:

      Cells cultured in high glucose tend to repress mitochondrial biogenesis and activity, a prevailing phenotype type called Crabree effect that is observed in different cell types and cancer. Many signaling pathways have been put forward to explain this effect. Vengayil et al proposed a new mechanism involved in Ubp3/Ubp10 and phosphate that controls the glucose repression of mitochondria. The central hypothesis is that ∆ubp3 shifts the glycolysis to trehalose synthesis, therefore leading to the increase of Pi availability in the cytosol, then mitochondria receive more Pi, and therefore the glucose repression is reduced.

      Strengths:

      The strength is that the authors used an array of different assays to test their hypothesis. Most assays were well-designed and controlled.

      Weaknesses:

      I think the main conclusions are not strongly supported by the current dataset.

      (1) Although the authors discovered ∆ubp3 cells have higher Pi and mitochondrial activity than WT in high glucose, it is not known if WT cultured in different glucose concentration also change Pi that correlate with the mitochondrial activity. The focus of the research on ∆ubp3 is somewhat artificial because ∆ubp3 not only affects glycolysis and mitochondria, but many other cellular pathways are also changed. There is no idea whether culturing cells in low glucose, which derepress the mitochondrial activity, involves Ubp3 or not. Similarly, the shift of glycolysis to trehalose synthesis is also not relevant to the WT cells cultured in a low-glucose situation. “The focus of the research on ∆ubp3 is somewhat artificial because ∆ubp3 not only affects glycolysis and mitochondria, but many other cellular pathways are also changed. There is no idea whether culturing cells in low glucose, which de-repress the mitochondrial activity, involves Ubp3 or not.”

      We would like to clarify that the focus of this research is not on Ubp3, or to address mechanistic aspects of how Ubp3 regulates mitochondrial activity, or to identify the targets of Ubp3. That would be an entirely distinct study, with a very different approach.

      In this study, while carrying out a screen, we serendipitously found that ubp3Δ cells showed an increase in mitochondrial activity in high glucose. Subsequently, we used this observation, bolstered by diverse orthogonal approaches, to identify a general, systems-level principle that governs mitochondrial repression in high glucose. Through this, we identify a role of phosphate budgeting as a controller of mitochondrial repression in high glucose. In this study, our entire focus has been to use orthogonal approaches, as well as parsimonious interpretations, to establish this new hypothesis as a possibility. We hope this idea, supported by these data, will now enable researchers to pursue other experiments to establish the generality of this phenomenon.

      We have not focused our effort in identifying the role of Ubp3, or its regulation upon changes in glucose concentration in this context. That is a very specific, and separate effort, and misses the general point we address here. It is entirely possible that Ubp3 might also regulate mitochondrial activity by additional mechanisms other than mitochondrial Pi availability (such as via the reduction of key glycolytic enzymes at nodes of glycolysis, resulting in reduced glycolytic flux and rerouted glucose metabolism). Had the goal been to identify Ubp3 substrates, it is very likely that we would not have found the role of Pi homeostasis in controlling mitochondrial respiration. This is particularly because the loss of Ubp3 does not result in an acute disruption of glycolysis, unlike say a glycolytic enzyme mutant, which would have resulted in severe effects on growth and overall metabolic state. This would have made it difficult to dissect out finer details of metabolic principles that regulate mitochondrial respiration.

      In order to further corroborate our findings, we used the glycolysis defective mutant tdh2Δtdh3Δ cells, where we find a similar change in Pi balance. This complements the key observations made using ubp3Δ cells. Distinctly, we utilized the glycolytic inhibitor 2DG to independently assess the role of mitochondrial Pi transport in regulating respiration. Together, in this study we do not just relying on genetic mutants, but combine the Ubp3 deletion strain with a reduced GAPDH activity strain, and pharmacologic inhibition of glycolysis. Distinctly, we find that mitochondrial Pi transporter levels are repressed under high glucose (Figure 5C, Figure 5-figure supplement 1B). Further, we find that mitochondrial Pi transport is important in increasing mitochondrial respiration upon shift to low glucose and glycolytic inhibition by 2-DG. Therefore, we collectively unravel a more systems level principle that regulates glucose mediated mitochondrial repression, as opposed to a mechanistic study of Ubp3 targets.

      Of course, given the conservation of Ubp3, we are very excited to pursue a mechanistic study of Ubp3 targets in future. This is a general challenge for deubiquitinase enzymes, and till date there are very few bona fide substrates known for any deubiquitinase enzyme, from any cellular system (due to challenges in the field that we discuss separately, and have included in the discussion section of this text).

      “Similarly, the shift of glycolysis to trehalose synthesis is also not relevant to the WT cells cultured in a low-glucose situation”

      The reviewer is correct in pointing out that in low-glucose, the shift to trehalose synthesis might not be as relevant. We observe that the glycolysis defective mutant tdh2Δtdh3Δ cells does not show an increase in trehalose synthesis (Figure 3-figure supplement 1E). However, in this context, the decrease in the rate of GAPDH catalysed reaction alone appears to be sufficient to increase the Pi levels (Figure 3F) even without an increase in trehalose. Therefore, there might be differences in the relative contributions of these two arms towards Pi balance, based on whether it is low glucose in the environment, or a mutant such as ubp3 that modulates glycolytic flux. In ubp3Δ cells, the combination of low rate of GAPDH catalyzed reaction and high trehalose will happen (based on how glycolytic flux is modulated), vs only the low rate of GAPDH catalyzed reaction in tdh2Δtdh3Δ cells. As an end point the increase in Pi happens in both cases, but with slightly differing outcomes. It is also to be noted that in terms of free Pi sources a low-glucose condition (with low glycolytic rate) is very different from a no-glucose, respiratory condition (where cells perform very high gluconeogenesis). In high respiration conditions such as ethanol, cells switch to high gluconeogenesis, where there is a huge increase trehalose synthesis as a default (eg see Varahan et al 2019). In this condition, trehalose synthesis could be a major source for Pi (eg see Gupta 2021), and could support the increased mitochondrial respiration. In an ethanol medium, the directionality of GAPDH reaction is reversed. Therefore, this reaction will also now become an added source of Pi, instead of a consumer of Pi (see illustration in Figure 3G). Therefore, a reasonable interpretation is that a combination of increased trehalose and increased 1,3 BPG to G3P conversion can be a major Pi source to increasing mitochondrial respiration in a non-glucose, respiratory medium.

      “it is not known if WT cultured in different glucose concentration also change Pi that correlate with the mitochondrial activity”

      This is valid point raised by the reviewer. We have already found that the protein levels of mitochondrial Pi transporter is increased in a non-glucose respiratory (ethanol) medium and a low (0.1%) glucose medium (see Figure 5C, Figure5-figure supplement 1B). In addition, we have tried measuring mitochondrial Pi levels in cells grown in a high glucose medium vs a respiratory, ethanol medium. The results are shown below for the reviewer’s reference. Reviewer response image 3 – Mitochondrial Pi levels in ethanol vs glucose

      Author response image 13.

      We observe a clear trend where mitochondrial Pi levels are high in cells grown in ethanol medium compared to that of cells grown in high glucose. However, the estimation of Pi, and normalising the Pi levels in isolated mitochondria is extremely difficult in this condition (note that this has never been done before). This is likely due to a rapid rate of conversion of ADP and Pi to ATP (in ethanol) which increases the variation in the estimation of steady state Pi levels, and the high amounts of mitochondria in ethanol grown cells. Since the date shows high variation, we have not included this data in the manuscript, but we are happy to include it here in the response.

      Indeed, this study opens up the exciting question of addressing how intracellular Pi allocation is regulated in different conditions of glucose. This can be further extended to Crabtree negative strains such as K. lactis which do not show mitochondrial repression in high glucose. All of these are rich future research programs.

      (2) The central hypothesis that Pi is the key constraint behind the glucose repression of mitochondrial biogenesis/activity is supported by the data that limiting Pi will suppress mitochondrial activity increase in these conditions (e.g., ∆ubp3). However, increasing the Pi supply failed to increase mitochondrial activity. The explanation put forward by the authors is that increased Pi supply will increase glycolysis activity, and somehow even reduce the mitochondrial Pi. I cannot understand why only the increased Pi supply in ∆ubp3, but not the increased Pi by medium supplement, can increase mitochondrial activity. The authors said "...that ubp3Δ do not increase mitochondrial Pi by merely increasing the Pi transporters, but rather by increasing available Pi pools". They showed that ∆ubp3 mitochondria had higher Pi but WT cells with medium Pi supplement showed lower Pi, it is hard to understand why the same Pi increase in the cytosol had a different outcome in mitochondrial Pi. Later on, they showed that the isolated mito exposed to higher Pi showed increased activity, so why can't increased Pi in intact cells increase mito activity? Moreover, they first showed that ∆ubp3 had a Mir1 increase in Fig3A, then showed no changes in FigS4G. It is very confusing.

      “I cannot understand why only the increased Pi supply in ∆ubp3, but not the increased Pi by medium supplement, can increase mitochondrial activity.”

      This is an interesting point, that requires a nuanced explanation, which we try to provide below.

      For mitochondrial respiration to increase in the presence of high Pi, the cytosolic Pi has to be transported to the mitochondria sufficiently. In ubp3Δ the increased free Pi (as a consequence of rewired glycolysis) is transported to the mitochondria (Figure 4). This increased mitochondrial Pi can therefore increase mitochondrial respiration in ubp3Δ.

      In case of WT+Pi, the externally supplemented Pi cannot further enter mitochondria (as shown in Figure 5-Figure supplement 1C) and is most likely restricted to the cytosol. Because of this inability of the Pi to access mitochondria, the mitochondrial respiration does not increase in WT+Pi (Figure 5-Figure supplement 1E).

      The likely reason for this difference in mitochondrial Pi transport in ubp3Δ vs WT+Pi is the relative difference in their glycolytic rate. The glycolytic rate is inherently decreased in ubp3Δ, but not in WT+Pi. To dissect this possibility of glycolytic rate itself contributing to the Pi availability in the mitochondria, we inhibited glycolysis in WT cells (using 2DG), and then supplemented Pi. Compared to cells in the same glucose condition (with 2DG, but without supplementing excess Pi), now the WT+Pi (+2DG) has higher mitochondrial respiration (Figure 5-Figure supplement 1F). This suggests that a combination of low glycolysis and high Pi is required for increasing mitochondrial respiration (as elaborated in the discussion section of the manuscript).

      An obvious question that arises out of this observation is how does the change in glycolytic rate regulate mitochondrial Pi transport. One consequence of altering the glycolytic rate is a change in cytosolic pH. This itself will bear on the extent of Pi transport into mitochondria, as discussed in detail below.

      In mitochondria, Pi is co-transported along with protons. Therefore, changes in cytosolic pH (which changes the proton gradient) can control the mitochondrial Pi transport (Hamel et al., 2004). Glycolytic rate is a major factor that controls cytosolic pH. The cytosolic pH in highly glycolytic cells is ~7, and decreasing glycolysis results in cytosolic acidification (Orij et al., 2011). Therefore, under conditions of decreased glycolysis (such as loss of Ubp3), cytosolic pH becomes acidic. Since mitochondrial Pi transport is dependent on the proton gradient, a low cytosolic pH would favour mitochondrial Pi transport. Therefore, under conditions of decreased glycolysis (2DG treatment, or loss of Ubp3), where cytosolic pH would be acidic, increasing cytosolic Pi might indirectly increase mitochondria Pi transport, thereby leading to increased respiration.

      To explain this and integrate all these points, we have extended a discussion section in this manuscript. We include this section below:

      “Supplementing Pi under conditions of low glycolysis (where mitochondrial Pi transport is enhanced), as well as directly supplementing Pi to isolated mitochondria, increases respiration (Figure 5, Figure 5-figure supplement 1). Therefore, in order to derepress mitochondria, a combination of increased Pi along with decreased glycolysis is required. An additional systems-level phenomenon that might regulate Pi transport to the mitochondria is the decrease in cytosolic pH upon decreased glycolysis (60, 61). The cytosolic pH in highly glycolytic cells is ~7, and decreasing glycolysis results in cytosolic acidification (60, 61). Therefore, under conditions of decreased glycolysis (2DG treatment, deletion of Ubp3, and decreased GAPDH activity), cytosolic pH becomes acidic. Since mitochondrial Pi transport itself is dependent on the proton gradient, a low cytosolic pH would favour mitochondrial Pi transport (62). Therefore, under conditions of decreased glycolysis (2DG treatment, or loss of Ubp3, or decreased GAPDH activity), where cytosolic pH would be acidic, increasing cytosolic Pi might indirectly increase mitochondria Pi transport, thereby leading to increased respiration. Alternately, increasing mitochondrial Pi transporter amounts can achieve the same result, as seen by overexpressing Mir1 (Figure 5).”

      This possibility of changes in cytosolic pH regulating mitochondrial Pi transport and thereby respiration is a really interesting future research question, and an idea that has not yet been explored till date. This can stimulate new lines of thinking towards finding conserved biochemical principles that control mitochondrial repression in high glucose.

      “Moreover, they first showed that ∆ubp3 had a Mir1 increase in Fig3A, then showed no changes in FigS4G. It is very confusing”

      increase in Mir1 in ubp3Δ shown in figure 3A comes from the analysis of the proteomics dataset from a previous study (Isasa et al., 2015). Subsequently, we more systematically experimentally assessed Mir1 levels directly, and did not observe an increase in Mir1 (Figure 4figure supplement 1H in revised manuscript). It is entirely possible that in a large-scale study (as in Isasa 2015), some specific proteomic targets might not fully reproduce when tested very specifically (as is described in Handler et al., 2018 and Mehta et al., 2022). We do clearly indicate this in the text, but given the density of information in this study, it is understandable that this point was missed by the reviewer.

      (3) Given that there is no degradation difference for these glycolytic enzymes in ∆ubp3, and the authors found transcriptional level changes, suggests an alternative possibility where ∆ubp3 may signal through unknown mechanisms to parallelly regulate both mitochondrial biogenesis and glycolytic enzyme expression. The increase of trehalose synthesis usually happens in cells under proteostasis stress, so it is important to rule out whether ∆ubp3 signals these metabolic changes via proteostasis dysregulation. This echoes my first point that it is unknown whether wild-type cells use a similar mechanism as ∆ubp3 cells to regulate the glucose repression of mitochondria.

      We appreciate this point raised by the reviewer, but this again requires some clarification (as made earlier). The goal of this study was to identify systems-level principles that explain mitochondrial repression in high glucose. Although we started by performing a screen to identify proteostatic regulators of mitochondrial activity in high glucose, and identified Ubp3 as a mediator of mitochondrial activity, our approach was to use ubp3Δ cells as a model to understand the metabolic principles that regulate mitochondrial repression. This has been reiterated repeatedly in the manuscript – for example lines 123-124 “We therefore decided to use ubp3Δ cells to start delineating requirements for glucose-mediated mitochondrial repression.” and again in the discussion section – lines 442-460, where we discuss some unique advantages of using ubp3Δ cells to understand a general basis of mitochondrial regulation. To test this hypothesis, we also used orthogonal approaches, as well as other mutants and conditions with defective glycolysis, such as tdh2Δtdh3Δ cells and 2DG treatments. Only with these multiple converging evidences do we infer that there might be a role of the change in Pi balance (due to changes in glycolytic rate) in regulating mitochondrial activity.

      We certainly agree that there is great value in identifying the mechanistic details of how Ubp3 regulates mitochondria. But this requires very distinct approaches not pursued in this study. This is not the question that we are addressing in this story. Separately, identifying targets of DUBs is one of the exceptional challenges in biology, since there are currently no straightforward chemicalbiology approaches to do so for this class of proteins. Unlike kinase/phosphatase systems, or even ubiquitin ligases, substrate trapping mutants etc have proven to be abject failures in identifying direct targets of DUBs. A quantitative proteomics study might suggest some proteins/cellular processes regulated by Ubp3. This has been attempted for several DUBs, but rarely have any direct substrates of DUBs every been identified, in any system. A high quality quantitative, descriptive proteome dataset of ubp3Δ cells is already available from a previous study (Isasa et al., 2015), which we cite extensively in this manuscript, and indeed was invaluable for this study. We cannot improve the outstanding quality dataset already available. Interestingly, the findings of this study actually help substantiate our idea of an increased mitochondrial activity and change in Pi homeostasis in ubp3Δ cells. The Isasa et al dataset finds proteins involved in mitochondrial respiration that are high in ubp3Δ cells, and the glycolytic enzymes and PHO regulon proteins are reduced. In our study, using these data references, we were able to conceptually piece together how changes in glycolytic flux can alter Pi balance.

      Apart from identifying changes in protein levels, a separate challenge in making sense of this quantitative proteomics data is the difficulty in pinpointing any target of Ubp3 that specifically regulates these processes. A single DUB can have multiple substrates, and this could regulate the cellular metabolic state in a combinatorial manner. This is the essence of all signaling regulators in how they function, and it is therefore important to understand what their systems-level regulation of cell states are (separate from their specific individual substrates). Therefore, identifying the specific target of Ubp3 responsible for this metabolic rewiring can be very challenging. These experiments are well beyond the scope or interest of the current manuscript.

      If we had pursued that road in this study, we would not have made any general findings related to Pi balance, nor would this more general hypothesis have emerged.

      (4) Other major concerns:

      (a) The authors selectively showed a few proteins in their manuscript to support their conclusion. For example, only Cox2 and Tom70 were used to illustrate mitochondrial biogenesis difference in line 97. Later on, they re-analyzed the previous MS dataset from Isasa et al 2015 and showed a few proteins in Fig3A to support their conclusion that ∆ubp3 increases mitochondrial OXPHOS proteins. However, I checked that MS dataset myself and saw that many key OXPHOS proteins do not change, for example, both ATP1 and ATP2 do not change, which encode the alpha and beta subunits of F1 ATPase. They selectively reported the proteins' change in the direction along with their hypothesis.

      To clarify, we observe an increase in Cox2 protein levels but not in Tom70 levels which suggests that there is no increase in mitochondrial biogenesis. The increase is specific to some respiration related mitochondrial proteins such as Cox2 (Figure 1E, Figure 3A). We have clearly pointed out this in the manuscript. We used Cox2 protein levels as an additional readout for ETC activity, to validate our observations coming from the potentiometric mitotracker readouts, and basal oxygen consumption rate (OCR) measurements. This was for 3 reasons: Cox2 is a mitochondrial genome encoded subunit of the complex IV (cytochrome c oxidase) in the ETC, and has a redox centre critical for the cytochrome c oxidase activity. The biogenesis and assembly of complex IV subunits have been studied with respect to multiple conditions such as glucose availability and hypoxia and the expression and stability of the mitochondrial encoded complex IV subunits are exceptionally well correlated to changes in mitochondrial respiration (Fontanesi et al., 2006). Cox2 is very well characterised in S. cerevisiae, and the commercially available Cox2 antibodies are outstanding, which makes estimating Cox2 levels by western blotting unambiguous and reproducible.

      We re-analyzed the proteomic dataset from Isasa et al to find out additional information regarding the key nodes that are differentially regulated in ubp3Δ. We have not claimed at any point in the manuscript that all OXPHOS related proteins are upregulated in ubp3Δ, nor is there any need for that to be so. We identified Ubp3 from our screen, observed an increase in mitochondrial potential, basal OCR, and Cox2 levels. We later found out that the proteomic data set for ubp3Δ also supports our observations that mitochondrial respiration is upregulated in ubp3Δ. The reviewer points out that we “showed a few proteins in Fig3A to support their conclusion that ∆ubp3 increases mitochondrial OXPHOS proteins”. Our conclusion is that the deletion of Ubp3 increases mitochondrial respiration. The combined readouts which we used to reach this conclusion (OCR, mitochondrial potential, mitochondrial ATP production, Cox2 levels) are far more direct, comprehensive and conclusive than showing an increase in a few proteins related to OXPHOS, as also explained earlier toward a distinct reviewer query. Since different mitochondrial proteins are regulated by different mechanisms, we need not see an increase in all the OXPHOS proteins in a mutant like ubp3Δ where mitochondrial respiration is high. An increase in some key proteins would be sufficient to increase the respiration as seen in our case.

      To summarise, the proteomic dataset supports our observation, but our conclusions are not dependent on the increase in OXPHOS proteins observed in the dataset.

      (b) The authors said they deleted ETC component Cox2 in line 111. I checked their method and table S1, I cannot figure out how they selectively deleted COX2 from mtDNA. This must be a mistake.

      Yes, we understand that for mitochondrially encoded proteins, a simple knock-out strategy has limitations. However, we first tried to generate the Cox2 deletion mutant by a standard PCR mediated gene deletion strategy (Longtine 1998), with the optimistic assumption that even if all Cox2 is not lost, a substantial fraction of the Cox2 genes would be lost via recombination. We selected the transformants after strong antibiotic selection, and then we measured the Cox2 protein levels. Gratifyingly, we found that the mutant strain had substantially decreased Cox2 protein levels (but not a complete loss), and this was retained across generations. The data is shown below.

      Author response image 14.

      Cox2 levels in WT vs Cox2 mutants

      Since the mutants have decreased Cox2 levels, we went ahead and performed growth assays using this strain, in a WT or Ubp3 deletion background. Deletion of Ubp3 in the Cox2 mutant resulted in a more severe growth defect.

      However, we fully agree that this strain is not a complete Cox2 knockout, and it is possible that the decrease in Cox2 is due to modifications in some other unelated gene. In the text, we should also not have named this cox2Δ. Since we are not sure of the exact genetic modification in this mutant, we have removed this data from the revised manuscript.

      Instead, we have now repeated all experiments, utilizing a fully characterised Cox2 mutant -cox262, described in (5) which has defective respiration. In this revised version, we find that deletion of Ubp3 in this strain retains the originally observed severe growth defect in glucose. This is consistent with our conclusion that a functional mitochondria is required for proper growth in ubp3Δ mutant. To separately validate this conclusion, we also utilized a Rho0 strain which does not have mitochondrial DNA and thereby cannot perform mitochondrial respiration. We show that deletion of Ubp3 results in a more severe growth defect in a Rho0 strain. These results are included in the revised manuscript as figure 1-figure supplement 1 I.

      Author response image 15.

      Also, we further confirmed that the Rho0 strain and Rho0 ubp3 strain is incapable of respiration, using seahorse assay. This data is included in the revised manuscript as Figure 4-figure supplement 1G.

      Author response image 16.

      Basal OCR in Rho0 cells

      We hope that these new data address the reviewer’s concerns about the Cox2 mutant.

      (c) They used sodium azide in a lot of assays to inhibit complex IV. However, this chemical is nonspecific and broadly affects many ATPases as well. Not sure why they do not use more specific inhibitors that are commonly used to assay OCR in seahorse.

      We have now performed growth assays for WT and ubp3Δ cells in the presence of specific mitochondrial OXPHOS inhibitors - oligomycin and FCCP. We observe a more severe growth defect in ubp3Δ cells compared to WT cells in the presence of oligomycin and FCCP, similar to the results observed with sodium azide. All these data are now included in the revised manuscript as Figure 1I, Figure1-figure supplement 1H, along with associated text.

      Author response image 17.

      Growth rate in the presence of FCCP

      Author response image 18.

      Figure1-figure supplement 1H- Growth rate in the presence of oligomycin

      We hope that these new data addresses the reviewer’s concerns.

      (d) The authors measured cellular Pi level by grinding the entire cells to release Pi. However, this will lead to a mix of cytosolic and vacuolar Pi. Related to this caveat, the cytosol has ~50mM Pi, while only 1-2mM of these glycolysis metabolites, I am not sure why the reduction of several glycolysis enzymes will cause significant changes in cytosolic Pi levels and make Pi the limiting factor for mitochondrial respiration. One possibility is that the observed cytosolic Pi level changes were caused by the measurement issue mentioned above.

      The Pi estimation shown in figure 3 C, E, F and G is a measure of total Pi in the cells. The vacuole is a major storehouse of phosphate in cells. However, unlike plant cells where free phosphate is stored in vacuoles, yeast vacuoles store phosphate only in the form of polyphosphates (Yang et al., 2017, Hürlimann et al., 2007). The free Pi formed from the hydrolysis of polyphosphate is subsequently transported to cytosol via the exporter Pho91 (Hürlimann et al., 2007). This therefore makes cytosol and mitochondria the major storage of usable free Pi in yeast. Since the malachite green assay that we use for phosphate estimation is specific to free Pi, and not polyphosphate, the Pi estimates that we show in figure 3 come from a combination of cytosolic and mitochondrial Pi. As explained earlier, in order to specifically measure mitochondrial Pi, we have established methods to rapidly isolate mitochondria, and then followed this by estimating Pi in these isolated mitochondria (Figure 4B). Here we clearly see a large increase in mitochondrial Pi in the Ubp3 deletion cells. This allows us to estimate the changes in Pi levels that specific to mitochondria, without relying only on total Pi changes.

      “the cytosol has ~50mM Pi, while only 1-2mM of these glycolysis metabolites, I am not sure why the reduction of several glycolysis enzymes will cause significant changes in cytosolic Pi levels and make Pi the limiting factor for mitochondrial respiration”

      The reviewer has completely missed the fact that the glycolytic rate in yeast is the highest known for any cell. While the steady state levels of glycolytic metabolites might be ~2 mM, the process of glycolysis is not static but is rapid and continuous. Glucose is continuously broken down and converted to pyruvate, along with the consumption of Pi and generation of ATP. This is the reason for the rapid 13C label saturation (within seconds of 13C glucose addition) in yeast cells (Figure 2-figure supplement 1F). This instantaneous label saturation makes accurate flux measurements arduous because of which we had to optimize a method for measuring glycolytic flux in yeast cells (Figure 2-D, Figure 2-figure supplement 1F). Indeed, for that reason, our measurements of glycolytic flux in yeast are the first time this is being reported in the field. This in itself is an enormously challenging experiment, and establishes a new benchmark.

      In highly glycolytic cells, most of the ATP is synthesized via glycolysis and the rate of glycolysis and ATP synthesis is very high. In the reaction catalysed by GAPDH, Pi and ADP is converted to ATP. This ATP formed acts as a Pi donor to most of the Pi consuming reactions in the cells. Some of these processes such a protein translation utilizes ATP, but releases Pi and ADP and this Pi enters the cellular Pi pool. Several other reactions such as nucleotide biosynthesis, polyphosphate biosynthesis and protein phosphorylation use ATP as a Pi donor and the Pi is fixed in biomolecules. Increasing the rates of these ‘Pi sinks’ therefore can result in a decrease in Pi pools. This is a concept we have earlier tried to clarify more elaborately in (Gupta and Laxman, 2021). In fact, increasing nucleotide biosynthesis and polyphosphate synthesis has earlier been suggested to decrease available free Pi (Austin and Mayer 2020, Desfougères et al., 2016). When glycolytic flux is high, this is coupled/tuned to the consumption of Pi which will be correspondingly high due to increased ATP, nucleotide and polyphosphate synthesis. Pi levels rapidly decrease upon glucose addition, due to the continuous Pi consumption during glycolysis (Hohmann et al., 1996, Van Heerden et al., 2014 , Koobs et al., 1972). Therefore, changes in glycolytic rate due to change in glycolytic enzyme levels can result in significant changes in Pi levels due to changes in Pi consumption rate.

      Our results also show that the apart from Pi levels, the glycolytic state can regulate mitochondrial Pi transport as well. This is the reason for mitochondrial Pi levels and basal OCR not increasing merely by adding Pi to cells. We show that basal OCR can be increased by adding Pi in the presence of 2DG. This regulation of mitochondrial Pi transport is a major limiting factor for mitochondrial respiration and could be mediated partly by the regulating of Mir1 levels and also by the changes in the cytosolic pH which regulates the rate of mitochondrial Pi transport. We have discussed these points in the discussion section in our manuscript.

      We hope that this clarifies the reviewer’s concerns regarding how changes in glycolytic rate can regulate changes in cytosolic Pi levels.

      (e) The authors used ∆mir1 and MIR1 OE to show that Pi viability in the mitochondrial matrix is important for mitochondrial activity and biogenesis. This is not surprising as Pi is a key substrate required for OXPHOS activity. I doubt the approach of adding a control to determine whether Pi has a specific regulatory function, while other OXPHOS substrates, like ADP, O2 etc do not have the same effect.

      To clarify, we only used the mir1Δ cells to understand the requirement for Pi transport from cytosol to mitochondria in controlling respiration. The reviewer is correct in stating that deletion of Mir1 would reduce Pi import to mitochondria and thereby inhibit respiration. This is exactly the conclusion we suggest from this experiment as stated in the manuscript – “These data suggest that mitochondrial Pi transport (via Mir1) is critical for maintaining basal mitochondrial activity even in high glucose”. We have only used these experiments to support the idea that even though glycolysis and mitochondria are in different compartments, a change in Pi balance in one compartment (cytosol) can affect Pi levels in the other (mitochondria) since there is Pi transport between these two compartments. Since mitochondria has its own polyphosphate reserves, in the absence of these experiments with mir1Δ cells it can be imagined that mitochondria PolyP can be an additional source of Pi to support respiration, and therefore changes in cytosolic Pi may have only a minor effect on mitochondrial respiration. Our experiments with mir1Δ and Mir1-OEcells indubitably suggest that Pi transport to mitochondria from cytosol is important for respiration, and therefore changes in cytosolic Pi levels (or maintaining cytosolic Pi at a lower level due to the rate of glycolysis) will have rippling effects in mitochondrial Pi availability. Further, these data suggest that for example under glycolytic inhibition (low glucose, or 2DG), while all factors (signalling, substrate availability etc) favour respiration (and mitochondrial derepression), cells cannot unable to achieve this in the absence of ample Pi transport from cytosol. This therefore places Pi at the centre stage in controlling mitochondrial respiration.

      We conclude that Pi is a major, but not the only constraint for mitochondrial respiration. There certainly could be a role for ADP, oxygen availability etc in regulating respiration. However, these are beyond the scope of our study. We have discussed about the potential role of ADP in regulating mitochondrial repression in the discussion section. “An additional consideration is the possible contribution of changes in ADP in regulating mitochondrial activity, where the use of ADP in glycolysis might limit mitochondrial ADP. Therefore, when Pi changes as a consequence of glycolysis, it could be imagined that a change in ADP balance can coincidentally occur. However, prior studies show that even though cytosolic ADP decreases in the presence of glucose, this does not limit mitochondrial ADP uptake, or decrease respiration, due to the very high affinity of the mitochondrial ADP transporter.”

      We hope that this clarifies the reviewer’s concerns regarding the use of Mir1 OE and mir1Δ strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Some of the experiments should be repeated in other strain backgrounds for reproducibility and rigor.

      As discussed in the response to point number 1, we have now utilized multiple strains of S. cerevisiae to test our findings. We now find that our discoveries regarding the role of altered Pi budgeting as a constraint for mitochondrial respiration, and the role of Ubp3 as a regulator of mitochondrial repression are conserved across multiple genetic backgrounds of S. cerevisiae. These results are included in the revised manuscript as a new figure- Figure 6 and associated text. We used the W303, Σ1278 and BY4742 strains of S. cerevisiae to show that deletion of Ubp3 increases mitochondrial activity (as shown by increased mitochondrial membrane potential and increased Cox2 levels). Using the W303 strain we show that the deletion of Ubp3 increases Pi levels and that the increased Pi is necessary for increasing mitochondrial respiration (Figure 6C, D). These added experiments have substantially broadened the generality of our findings.

      The number of biological repeats needs to be increased in all experiments.

      We have increased the number of biological repeats in key experiments that shows that the increased Pi levels are necessary for the increased mitochondrial respiration in ubp3Δ and tdh2Δtdh3Δ cells (revised Figure 4F). Apart from a few basal OCR measurements and mitotracker data in supplementary figure, all our experiments are performed for 3 biological repeats. In case of basal OCR measurements, yeast cells have to be aliquoted to poly-L-lysine coated seahorse plates and centrifuged to ensure that the cells are properly settled. This is due to the non-adherent nature of yeast cells. During the centrifugation step, the wells in the two end rows cannot be utilized due to uneven settling of cells which affects the basal OCR readings in these wells. In case of several experiments that involve multiple samples, we were therefore limited to restrict the number of biological replicates to 2 (repeated independently), so that all samples could be accommodated in the plate.

      Full western blot images should be supplemented along with the other data.

      The complete western blot images are now included in the revised manuscript as supplementary file 2.

      TCA cycle flux should be analyzed and presented in the study to conclude some of the findings.

      As discussed in detail in the response to point number 2, we have performed steady state and flux measurements for TCA cycle intermediates. This data is now included as a new supplement figure- Figure 2-figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Fig. 2A, they should also include the gluconeogenesis enzymes (fructose 1,6 bisphosphatase, PEP carboxykinase, and pyruvate carboxylase) to exclude the possibility that glycolytic intermediates are not rerouted to gluconeogenesis.

      We measured the protein levels of Fbp1 (fructose 1,6 bisphosphatase) and Pck1 (PEP carboxykinase). We observed an increase in the protein levels in both enzymes in ubp3Δ. The data is shown below.

      Author response image 19.

      Fbp1 and Pck1 protein levels

      While we agree that this is an interesting observation which might help us in understanding the metabolic rewiring in ubp3Δ, we have not included this data in the current revised version of the manuscript due to two main reasons.

      (1) Since ubp3Δ cells have a defective glycolysis and therefore a defective glucose repression, the mRNA and protein levels of gluconeogenic enzymes which are usually glucose-repressed might increase. This might be a response at the level of transcription and translation of these enzymes and might or might not change the rate of gluconeogenesis in these cells. This is because of multiple other factors that regulate gluconeogenic flux such as allostery, mass action etc. Therefore, to avoid confounding our main points and since we cannot make a conclusive assumption on the gluconeogenic metabolism in these mutants, we don’t include this data. The primary focus of our story is the mitochondrial repression component. Understanding the feedback controls that alter gluconeogenesis in these mutants is beyond the scope of this study and could be addressed in a separate future study.

      (2) As we highlight extensively in the response letter and in the manuscript, our aim is not to understand the specific mechanistic role of Ubp3. In this manuscript, we identify the conserved constraints that control mitochondrial repression without focusing just on the role of Ubp3 in regulating this. Whether Ubp3 regulates gluconeogenesis is a question that could be addressed in a future study that focuses on identifying the altered signalling mechanisms in ubp3Δ and the targets of Ubp3.

      (2) In line 292, page 10, there is a typo "dermine".

      We apologize for this mistake. Corrected.

      (3) In Figure 5A, is there a reason why they chose 0.1% glucose condition as a low glucose condition? Also, is there a dose-dependent change in OCR or other mitochondrial functions according to the concentration of glucose?

      The glucose concentration of 0.1% was selected to decrease (but not completely remove) the available glucose. 0.1% glucose is considered as a standard low glucose condition in S. cerevisiae (Yin et al., 2003) and the effect of this glucose concentration on cellular processes has been extensively studied (Yin et al., 2003, Takeda et al., 2015 etc). <0.2% glucose is the critical threshold for activating respiratory metabolism (Takeda et al., 2015) and shifting cells to 0.1% glucose in our experiments will activate respiration, as we show in our data. However, this is very different from completely removing glucose or using an alternate carbon source such as ethanol, because this would result in full activation of gluconeogenesis. We further find that when cells are grown in ethanol, the gluconeogenic activation will also change the Pi homeostasis. This will in part be a result of the fully reversed direction of the GAPDH catalysed reaction (Figure 3G). If such a condition is used, it could lead to misinterpretations, and confound the conclusions that we make from these set of experiments where Pi homeostasis play a major role. In 0.1% glucose it has been shown that gluconeogenesis is still partly repressed (Yin et al., 2003). The pathways utilizing alternate carbon sources still remain repressed (even though to a lower extend compared to 2% glucose) in 0.1% glucose (Yin et al., 2003). We hope that this clarifies the concerns regarding the rationale behind using 0.1% glucose in our experiments.

      The extent of glucose repression is dependent on the concentration of glucose. Glucose concentration >1% has been shown to activate degradation of mRNAs involved in alternate carbon utilization. Different signaling pathways involved in growth under glucose and glucose repression is regulated by glucose concentration. This is discussed in detail in Yin et al., 2002. We (Figure 5figure supplement 1A) also observe a dose dependent increase in mitochondrial membrane potential in the presence of 2DG. This also suggests that the rate of glycolysis (which could be also mediated by changes in glucose concentration) can regulate the extent of mitochondrial derepression.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We greatly appreciate your positive assessment and the suggestions by Reviewer #2 on the previous version of our manuscript, all of which are very helpful and have greatly improved our manuscript. We have added a description of Biomineralized columnar architecture in the Results section, added a discussion of the Family Eoobolidae, provided more details in the Material and Methods section, and revised other parts of the manuscript based on her/his comments. We are grateful that these comments have enhanced the overall quality of our manuscript. In this letter, we take the opportunity to note and discuss the various changes as below.

      Reviewer #2:

      (1) Two early Cambrian taxa of linguliform brachiopods are assigned to the family Eoobolidae. The taxa exhibit a columnar shell structure and the phylogenetic implications of this shell structure in relation to other early Cambrian families is discussed. It is the interesting idea regarding the evolution of shell structure.

      We thank Reviewer 2 very much for her/his very constructive suggestions. All the comments have been thoroughly considered, and introduced into the revised version of the manuscript.

      (2) The early record of shell structures of linguliform brachiopods is incomplete and partly contradictory. The authors maintain silence regarding contradictory information throughout the article to an extent that information is cited wrongly.

      We agree with Reviewer #2 that the early record of shell structure of linguliform brachiopods is incomplete and potentially in some instances contradictory. This situation is well demonstrated in the Introduction and Systematic Palaeontology sections of this paper. This is also the reason why we think the detailed investigation of early linguliform shell architectures is so important, and we hope this work will be useful for further comparative studies on brachiopod biomineralization. We also understand that more detailed studies of the complexity and diversity of linguliform brachiopod architectures (especially their early fossil representatives) require further investigation.

      (3) The article is written under the assumption that all eoobolids have a columnar shell structure. Thus, the previously claimed columnar structure of Eoobolus incipiens which has been re-illustrated in the paper is not convincing and could be interpreted in other ways.

      Yes, the type specimen of Eoobolus is poorly known and we do not know its shell structure, but the ornamentation, pseudointerarea etc. are well preserved and promote a character diagnosis. In this paper, we focus on the detailed study of Cambrian eoobolids with exquisitely well-preserved columns from the Cambrian Series 2 based on the collection of more than 30 thousand early Cambrian brachiopod specimens in China and Australia. With the wide preservation of columnar shells in early eoobolid specimens, it is likely that Eoobolus has columnar shell architecture, although there is no documentation of the shell structure from every single Eoobolus specimen.

      The secondary columns of Eoobolus incipiens is well demonstrated in Fig. 4a. The size of these columns can be well compared with the columns from other Eoobolus species and acrotretide brachiopods, which are quite different from the criss-cross baculae. As we noted in the manuscript, the columnar structure Eoobolus incipiens is very simple (short columns and less number of columnar units) and can be readily secondarily phosphatised. This is also the reason why it is hard to find the columnar shell architecture in early eoobolids.

      (4) The article needs a proper results section. The Discussion is mainly a review of published data. Other potential results are hidden in this "discussion".

      I would recommend to reorganize the paper and make it a solid presentation of the new taxa and other new results, i.e., have a solid Results section. The Discussion should discuss relevant points that relate to the new results rather than reviewing shell structure in general but skipping relevant parts such as the tertiary shell layer.

      We have reorganised the manuscript based on these comments. A general description of the biomineralized columnar architecture is added in the Results section. As the Supplementary section (main results) includes 7 figures and 3 tables, it will increase the size of the current paper if they are moved to the main text. We would prefer to keep the main results in the Supplementary based on the style and format of eLife.

      As the current information on the shell structures of early linguliform brachiopods is unclear, we need to review most of the previous studies on brachiopod shells in the first part of Discussion section. It will help the readers to follow our results and conclusion. So, we think some of the review content is necessary and helps build the Discussion section. The tertiary shell layer, which is not developed in our studied material, is not discussed in the current research.

      (5) In addition, a more elaborate Methods section is needed in which it is explained how the data for shell thicknesses and numbers of laminae was obtained.

      The potential evolutionary patterns that are discussed towards the end (summarized in Fig 6) are interesting but rather unconvincing as the way the data has been obtained has never been clarified. Shell thicknesses and numbers of laminae that built up the shell of several taxa are compared, but at no point it is stated where these measurements were taken. Shell thicknesses vary within a shell and also the presence of the never mentioned tertiary layer is modifying shell thicknesses. Hence, the presented data appears random and is not comparable. The obtained evolutionary patterns must be considered as dubious.

      A proper Methods section would be needed that explains how the data presented in Fig. 6 has been obtained. Plus it needs to be convincingly explained that the obtained data is in fact comparable and represents, e.g., equivalent areas of the shell in all involved taxa.

      All the information is added in the Material and Methods section. We are aware of the marginal accretionary secretion of brachiopod shells. It is well known that the shell at the posterior is thicker (usually the thickest) than that at the anterior, we did not note this in the previous manuscript. We have measured all the shell data (shell thickness and number of columnar unit) from the posterior part of the adult shell for all the studied taxa. And the measurements of diameter and height of orthogonal columns are performed on available adult specimens from this study and previously published literature. Consequently, the obtained data are comparable and represent equivalent areas of the shell on all involved taxa.

      In term of the tertiary shell layer, we do not find any evidence of this tertiary shell layer from our studied material. The tertiary shell layer is well developed in some recent and Palaeozoic lingulides (Holmer, 1989), but it is not recognised in the early eoobolides and acrotretides.

      (6) A critical revision of the family Eoobolidae and Lingulellotretidae including a revision of the type species of Eoobolus and Lingulellotreta is needed.

      Concerning the families Eoobolidae and Lingulellotretidae, we are aware of the current problematic situation of these families, and we have added more remarks regarding the Eoobolidae in the Systematic Palaeontology section of the manuscript. However, the revision of the families Eoobolidae and Lingulellotretidae falls outside the scope of this paper. We prefer to exclude it just now, as it will be part of an upcoming publication based on more material from China, Australia, Sweden and Estonia that we are currently working on.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides important insights into the degradation of a host tRNA modification enzyme TRMT1 by SARS-CoV-2 protease nsp5. The data convincingly support the main conclusions of the paper. These results will be of interest to virologists interested in studying the alterations in tRNA modifications, host methyltransferases, and viral infections.

      Public Reviews:

      Response to Public Reviews

      We appreciate the reviewers’ assessment that our findings are well supported and provide important insight to the field. We also thank the reviewers for their comments and suggestions that have improved the quality of this manuscript. Through the requested edits and experiments, we provide additional results in this revision that further support and extend our original findings.

      We acknowledge the major questions that remain to be addressed, including the biological relevance of TRMT1 cleavage by Nsp5. We note that elucidating the biological role of host protein cleavage by viral proteases has been a long-standing challenge. For example, several endogenous proteins have been identified as cleavage targets of HIV protease, but the functional relevance for many of these cases took decades to resolve or remain unknown to this day. Nonetheless, we have added additional experiments that suggest a possible role for TRMT1 and TRMT1 cleavage in SARS-CoV-2 pathobiology.

      Key additions in the revised manuscript include:

      • Subcellular localization of full-length TRMT1 and TRMT1 fragments (Supplemental Figure 4).

      • Experiments demonstrating that TRMT1 levels are reduced to near background levels in SARS-CoV-2 infected human cells at higher MOI (Figure 6C and D).

      • Results showing that expression of the non-cleavable TRMT1 mutant can promote virion particle infectivity (Figure 8).

      • The addition of an “Ideas and Speculation” subsection that is now being offered to authors by eLife.

      Reviewer #1 (Public Review):

      Zhang et al. investigate the hypothesis that tRNA methyl transferase 1 (TRMT1) is cleaved by NSP5 (nonstructural protein 5 or MPro), the SARS-CoV-2 main protease, during SARS-CoV-2 infection. They provide solid evidence that TRMT1 is a substrate of Nsp5, revealing an Nsp5 target consensus sequence and evidence of TRMT1 cleavage in cells. Their conclusions are exceptionally strong given the co-submission by D'Oliveira et al showing cleavage of TRMT1 in vitro by Nsp5. Separately, the authors convincingly demonstrate widespread downregulation of RNA modifications during CoV-2 infection, including a requirement for TRMT1 in efficient viral replication. This finding is congruent with the authors' previous work defining the impact of TRMT1 and m2,2g on global translation, which is most likely necessary to support infection and virion production. What still remains unclear is the functional relevance of TRMT1 cleavage by Nsp5 during infection. Based on the data provided here, TRMT1 cleavage may be an act by CoV2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs. Theoretically, TRMT1 cleavage should inactivate the modification activity of TRMT1, which the authors thoroughly and elegantly investigate with rigorous biochemical assays. However, only a minority of TRMT1 undergoes cleavage during infection in this study and thus whether TRMT1 cleavage serves an important functional role during CoV-2 replication will be an important topic for future work. The authors fairly assess their work in this regard. This study pushes forward the idea that control of tRNA expression and functionality is an important and understudied area of host-pathogen interaction.

      We thank the reviewer for the thoughtful assessment of our study.

      We acknowledge that only a minority of TRMT1 undergoes cleavage during infection at the originally tested MOI. However, the ~40% reduction in TRMT1 levels after infection with SARS-CoV-2 is quite substantial considering that the TRMT1 in the nucleus and mitochondria are likely to be inaccessible to Nsp5. Moreover, we detected a reduction in m2,2G modification in the infected human cells, providing evidence for a functional impact on TRMT1 activity (Figure 1C).

      To further test the effects of SARS-CoV-2 infection on endogenous TRMT1, we infected 293T cells at a higher MOI and measured TRMT1 levels. At MOI=5, we found that SARS-CoV-2 infection led to near complete depletion of TRMT1 in human cells. This result suggests that SARS-CoV-2 infection could have a profound impact on TRMT1 levels during pathogenesis. We have added this new experiment as Figures 6C and D.

      Weaknesses noted:

      The detection of the N-terminal TRMT1 fragment by western blot is not robust. The polyclonal antibody used to detect TRMT1 in this work cross-reacts with a non-specific protein product. Unfortunately, this obstructs the visualization of the predicted N-terminal TRMT1 fragment. It is unclear how the authors were able to perform densitometry, given the interference of the nonspecific band. Additionally, the replicates in the source data make it clear that the appearance of the N-terminal fragment "wisp" under the non-specific band is not seen in every replicate. Though the disappearance of this wisp with mutant Nsp5 and uncleavable TRMT1 is reassuring, the detection of the N-terminal fragment with the TRMT1 antibody should be assessed critically. Considering this group has strong research interests in TRMT1, I assume that attempts to make other antibodies have proved unfruitful. Additionally, N-terminal tagging of TRMT1 is predicted to disrupt the mitochondrial targeting signal, eliminating the potential for using alternative antibodies to see the N-terminal fragment.

      We agree that the anti-TRMT1 antibody used here is sub-optimal for detection of the N-terminal TRMT1 fragment. However, as noted by the Reviewer, we provided multiple ways of corroborating that the lower-molecular weight band detected in human cells expressing Nsp5 corresponds to the N-terminal TRMT1 fragment. We have shown that the TRMT1 cleavage band is not detectable in human cells expressing GFP or inactive Nsp5. This indicates that the lower molecular weight TRMT1 band only arises when active Nsp5 protease is expressed. Moreover, the TRMT1 cleavage band is not detectable in TRMT1-KO cell lines, demonstrating that the band arises from TRMT1 cleavage rather than a non-specific protein. We have also detected the C-terminal fragment if TRMT1 is over-expressed with Nsp5. In addition, we have shown that the mutation of the predicted Nsp5 cleavage site in TRMT1 abolishes the appearance of the N- and Cterminal cleavage fragments.

      Despite the drawbacks of this antibody, we identified gel running conditions that resolves the non-specific band from the N-terminal TRMT1 cleavage fragment. Thus, for quantification, we measured the total signal of both the cleavage band and the nonspecific band in all lanes (Figure 3). After normalization to actin, the total signal from the cleavage band and the non-specific band in the control lane from cells expressing GFP was subtracted from the lanes with cells expressing Nsp5 to calculate the signal arising from the cleavage band. We have updated our Materials and Methods to provide details on how we quantified the TRMT1 cleavage band.

      While we did test other antibodies against TRMT1, none of them were sensitive enough to detect TRMT1 cleavage fragments at endogenous levels. For example, we included results with an antibody targeting the C-terminus of TRMT1 that could not detect TRMT1 cleavage products at endogenous levels (Supplemental Figure 3). However, the antibody could detect the C-terminal TRMT1 fragments if TRMT1 was overexpressed with Nsp5 (Supplemental Figure 3).

      These technical issues reiterate the fact that the functional significance of TRMT1 cleavage during CoV-2 infection remains unclear. However, this study demonstrates an important finding that the tRNA modification landscape is altered during CoV-2 infection and that TRMT1 is an important host factor supporting CoV-2 replication.

      We agree that the functional relevance of TRMT1 cleavage by Nsp5 remains an open question. Thus, we have added an experiment to test the functional impact of TRMT1 on virion particle production and infectivity (Figure 8). We find that TRMT1 expression is required for optimal virus production, consistent with our observation that TRMT1deficient cells exhibit reduced viral RNA replication. In addition, we find that expression of the non-cleavable TRMT1 mutant can promote virion particle infectivity (Figure 8, TRMT1-Q530N). These results are consistent with the Reviewer’s conclusion that “TRMT1 cleavage may be an act by CoV-2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs”. We discuss the potential implications of this result and their functional relevance in the “Ideas and Speculation” subsection.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript titled 'Proteolytic cleavage and inactivation of the TRMT1 tRNA modification enzyme by SARS-CoV-2 main protease' from K. Zhang et al. demonstrates that several RNA modifications are downregulated during SARS-CoV-2 infection including the widespread m2,2G methylation, which potentially contributes to changes in host translation. To understand the molecular basis behind this global hypomodification of RNA during infection, the authors focused on the human methyltransferase TRMT1 that catalyzes the m2,2G modification. They reveal that TRMT1 not only interacts with the main SARS-CoV-2 protease (Nsp5) in human cells but is also cleaved by Nsp5. To establish if TRMT1 cleavage by Nsp5 contributes to the reduction in m2,2G levels, the authors show compelling evidence that the TRMT1 fragments are incapable of methylating the RNA substrates due to loss of RNA binding by the catalytic domain. They further determine that expression of full-length TRMT1 is required for optimal SARS-CoV-2 replication in 293T cells. Nevertheless, the cleavage of TRMT1 was dispensable for SARS-CoV-2 replication hinting at the possibility that TRMT1 could be an off-target or fortuitous substrate of Nsp5. Overall, this study will be of interest to virologists and biologists studying the role of RNA modification and RNA modifying enzymes in viral infection.

      We thank the reviewer for the thoughtful assessment of our study.

      We agree with the possibility that TRMT1 could be a fortuitous substrate of Nsp5 due to the coincidental presence of a Nsp5 cleavage site in TRMT1. As considered in our Discussion section, TRMT1 cleavage could be a collateral effect of SARS-CoV-2 infection. While TRMT1 could be an off-target substrate during viral infection, the subsequent effect on tRNA modification levels could have physiological consequences on downstream processes that affect cellular health. This information could still be useful for understanding the pathophysiological consequences of SARS-CoV-2 infection in tissues.

      Strengths:

      • The authors use a state-of-the-art mass spectrometry approach to quantify RNA modifications in human cells infected with SARS-CoV-2.

      • The authors go to great length to demonstrate that SARS-CoV-2 main protease, Nsp5, interacts, and cleaves TRMT1 in cells and perform important controls when needed. They use a series of overexpression with strategically placed tags on both TRMT1 and Nsp5 to strengthen their observations.

      • The use of an inactive Nsp5 mutant (C145A) strongly supports the claim of the authors that Nsp5 is solely responsible for TRMT1 cleavage in cells.

      • Although the direct cleavage was not experimentally determined, the authors convincingly show that TRMT1 Q530N is not cleaved by Nsp5 suggesting that the predicted cleavage site at this position is most likely the bona fide region processed by Nsp5 in cells.

      • To understand the impact of TRMT1 cleavage on its RNA methylation activity, the authors rigorously test four protein constructs for their capacity not only to bind RNA but also to introduce the m2,2G modification. They demonstrate that the fragments resulting from TRMT1 cleavage are inactive and cannot methylate RNA. They further establish that the C-terminal region of TRMT1 (containing a zinc-finger domain) is the main binding site for RNA.

      • While 293T cells are unlikely an ideal model system to study SARS-CoV-2 infection, the authors use two cell lines and well-designed rescue experiments to uncover that TRMT1 is required for optimal SARS-CoV-2 replication.

      Weaknesses:

      • Immunoblo0ng is extensively used to probe for TRMT1 degradation by Nsp5 in this study. Regretfully, the polyclonal antibody used by the authors shows strong non-specific binding to other epitopes. This complicates the data interpretation and quantification since the cleaved TRMT1 band migrates very closely to a main non-specific band detected by the antibody (for instance Fig 3A). While this reviewer is concerned about the cross-contamination during quantification of the N-TRMT1, the loss of this faint cleaved band with the TRMT1 Q530N mutant is reassuring. Nevertheless, the poor behavior of this antibody for TRMT1 detection was already reported and the authors should have taken better precautions or designed a different strategy to circumvent the limitation of this antibody by relying on additional tags.

      We acknowledge the sub-optimal performance of the commercial anti-TRMT1 antibody used in our study. Nevertheless, we have provided multiple lines of evidence indicating that the lower molecular weight band detected using this antibody corresponds to the N-terminal TRMT1 fragment. As noted by the reviewer, we have shown that the lower molecular weight band disappears using the TRMT1-Q530N non-cleavable mutant. The lower molecular weight signal is also absent in TRMT1-KO cell lines expressing Nsp5. Moreover, we have shown that the TRMT1 cleavage band is undetectable in human cells expressing GFP or inactive Nsp5. We have also detected the C-terminal fragment when TRMT1 is over-expressed with Nsp5.

      As discussed in the response to Reviewer 1, we did consider alternative approaches for detecting the N-terminal fragment. We thought about tagging TRMT1 at the N-terminus so that we could detect the cleavage band using a different antibody. However, as noted by Reviewer 1, the tagging of TRMT1 at the N-terminus is likely to disrupt the mitochondrial targeting signal and alter the localization of TRMT1. In addition, we spent considerable time and effort testing alternative antibodies against TRMT1. However, none of them were effective at detecting the N- or C-terminal TRMT1 fragments. For example, we included results with a different antibody targeting the C-terminus of TRMT1 that could not detect TRMT1 cleavage products at endogenous levels but could detect them when TRMT1 was overexpressed with Nsp5 (Supplemental Figure 3).

      • While 293T cells are convenient to use, it is not a well-suited model system to study SARS-CoV2 infection and replication. Therefore, some of the conclusions from this study might not apply to better-suited cell systems such as Vero E6 cells or might not be observed in patient-infected cells.

      We acknowledge the potential caveats associated with using 293T human embryonic cells as a system for testing SARS-CoV2 replication. However, we note that 293T cells have been used as a physiological model for discovering and characterizing key aspects of SARS-CoV-2 biology, including viral replication. For example, SARS-CoV-2 has been shown to exhibit significant replication and virion production in 293T cells expressing ACE2 that can be inhibited by known SARS-CoV-2 antiviral compounds:

      https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(20)300045/fulltext

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9444585/

      https://www.science.org/doi/10.1126/sciadv.add3867

      https://www.pnas.org/doi/full/10.1073/pnas.2025866118

      293T cells have also been demonstrated to exhibit cytopathic effects upon SARS-CoV-2 infection that are dependent upon the ACE2 receptor and mirror that of infected lung cells in culture and in patient tissues:

      https://www.embopress.org/doi/full/10.15252/embj.2020106267

      https://journals.asm.org/doi/full/10.1128/jvi.00002-22

      https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1009715

      https://www.nature.com/articles/s41559-021-01407-1

      In addition to 293T cells, we have demonstrated that infection of MRC5 human pulmonary fibroblast cells with SARS-CoV-2 results in a decrease in TRMT1 levels and m2,2G modification (Figure 1). The reduction in TRMT1 levels in MRC5 cells after SARS-CoV-2 infection is similar to that observed in 293T cells.

      • The reduction of bulk TRMT1 levels is minor during infection of MRC5 cells with SARS-CoV-2 (Fig 1). This does not seem to agree with the more dramatic reduction in m2,2G modification levels. Cellular Localization experiments of TRMT1 would help clarify this. While TRMT1 is found in the cytoplasm and nucleus, it is possible that TRMT1 is more dramatically degraded in the cytoplasm due to easier access by Nsp5.

      We agree that the processing of newly synthesized TRMT1 in the cytoplasm is likely to be the main cause for the reduction of TRMT1 levels in the infected MRC5 cells. Thus, we followed the Reviewer’s suggestion to conduct cellular localization experiments of TRMT1 (Supplemental Figure 4). Through these experiments, we show that full-length TRMT1 exhibits localization to the cytoplasm, mitochondria, and nucleus, consistent with prior findings from our group and others. This result supports the conclusion that cytoplasmic TRMT1 is the likely target of Nsp5 cleavage while TRMT1 in the nucleus and mitochondria are inaccessible to Nsp5. We also note that the decrease in cytoplasmic TRMT1 could account for the reduction in m2,2G modifications if the cytoplasmic pool of TRMT1 is responsible for modifying any exported tRNAs that were not modified in the nucleus.

      • In Fig 6, the authors show that TRMT1 is required for optimal SARS-CoV-2 replication. This can be rescued by expressing TRMT1 (Fig 7). Nevertheless, it is unknown if the methylation activity of TRMT1 is required. The authors could have expressed an inactive TRMT1 mutant (by disrupting the SAM binding site) to establish if the RNA modification by TRMT1 is important for SARS-CoV-2 replication or if it is the protein backbone that might contribute to other processes.

      We agree that it would be interesting to test if the methylation activity of TRMT1 is important for optimal SARS-CoV-2 replication. However, the present study focuses on the cleavage of TRMT1 by Nsp5 and the biological effects of this cleavage. Thus, we feel that generating another human cell line lies outside the scope of this paper and would be an excellent idea for future studies. We thank the reviewer for the proposed experiment.

      • Fig 7, the authors used the Q530N variant to rescue SARS-CoV-2 replication in TRMT1 KO cells. This is an important experiment and unexpectedly reveals that TRMT1 cleavage by Nsp5 is not required for viral replication. To strengthen the claim of the authors that TRMT1 is required to promote viral replication and that its cleavage inhibits RNA methylation, the authors could express the TRMT1 N-terminal construct in the TRMT1 KO cells to assess if viral replication is restored or not to similar levels as WT TRMT1. This will further validate the potential biological importance of TRMT1 cleavage by Nsp5.

      Indeed, we did not expect to find that human cells expressing the TRMT1-Q530N variant exhibit higher levels of viral replication. This suggests that cleavage of TRMT1 is inhibitory for viral replication. To provide further support for this observation, we analyzed the viral titer and infectivity of supernatants derived from human cells expressing wildtype TRMT1 or TRMT1-Q530N. Consistent with our finding that TRMT1-Q530N cells contain more viral RNA, the media supernatants from TRMT1Q530N expressing cells exhibit higher viral titer and infectivity compared to supernatants from TRMT1-KO cells expressing wildtype TRMT1. These results provide additional evidence that TRMT1 is required to promote viral replication. Moreover, these findings suggest that TRMT1 cleavage and reduced protein synthesis could selflimit viral replication. The additional results have been added as Figure 8.

      • Fig 7 shows that the TRMT1 Q530N variant rescues SARS-CoV-2 replication to greater levels then WT TRMT1. The authors should discuss this in greater detail and its possible implications with their proposed statement. For instance, are m2,2G levels higher in Q530N compared to WT? Does Q530N co-elute with Nsp5 or is the interaction disrupted in cells?

      These are excellent points brought up by the Reviewer. As noted above, we have added an additional experiment that tests the functional relevance of TRMT1 expression and cleavage on virion production and infectivity (Figure 8). Moreover, we have followed the Reviewer’s suggestion and discussed the potential implications of these findings in the “Ideas and Speculation” subsection.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors have used biochemical approaches to provide compelling evidence for the cleavage of TRMT1 by SARS-CoV-2 Nsp5 protease. This work is of wide interest to biochemists, cell biologists, and structural biologists in the coronavirus (CoV) field. Furthermore, it substantially advances the understanding of how CoV's interact with host factors during infection and modify cellular metabolism.

      We thank the reviewer for the thoughtful assessment of our study.

      Strengths:

      The authors provide multiple lines of biochemical evidence to report a TRMT1-Nsp5 interaction during SARS-CoV-2 infection. They show that the host enzyme TRMT1 is cleaved at a specific site and that it generates fragments that are incapable of functioning properly. This is an important result because TRMT1 is a critical player in host protein synthesis. This also advances our understanding of virus-host interactions during SARS-CoV-2 infections.

      Weaknesses:

      The major weakness is the lack of mechanistic insights into TRMT1-Nsp5 interactions. The authors have provided commendable biochemical data on proving the TRMT1-Nsp5 interaction but without clear mechanistic insights into when this interaction takes place in the context of SARS-CoV-2 propagation, what are the functional consequences of this interaction on host biology, and does this somehow benefit the infecting virus? I feel that the authors played it a bit safe despite having access to several reagents and an extremely promising research direction.

      We agree that our findings have prompted questions on the mechanistic and functional relevance of TRMT1 cleavage by Nsp5. To begin addressing the latter point, we have included a new experiment testing the impact of TRMT1 expression and cleavage on SARS-CoV-2 virus production and infectivity (Figure 8). We find that TRMT1-deficient cells infected with SARS-CoV-2 exhibit less virion production and the viruses produced are less infectious. Intriguingly, we find that expression of the non-cleavable TRMT1-Q530N variant in TRMT1-KO cells promotes an increase of viral titer as well as infectivity compared to expression of wildtype TRMT1. These results provide evidence for an unexpected role for TRMT1 expression in virus production and the generation of optimally infectious SARS-CoV-2 particles. We discuss the potential implications of this finding in the “Ideas and Speculation” subsection.

      We agree that understanding the timing and effects of Nsp5-TRMT1 interaction will be an important area of investigation moving forward. We would like to include additional time points beyond 24- and 48-hours post-infection. However, we have found that the MRC5-ACE2 cells exhibited increased levels of cell death at 72 and 96-hours postinfection that could confound results (Raymonda et al 2022). Moreover, we would like to know how the reduction in m2,2G modifications affects host tRNA biology and translation. However, these experiments involve large-scale methods such as tRNA sequencing and ribosome profiling which are outside the scope of our current studies and will be the subject of future efforts.

      We acknowledge the Reviewer’s assessment that we “played it a bit safe” in discussing the functional consequences of Nsp5-TRMT1 interaction. We aimed for a circumspect interpretation of our results and their biological implications, but might have been too cautious in our conclusions. Thus, we have added an “Ideas and Speculation” subsection that discusses possible reasons for how TRMT1 cleavage and interaction with Nsp5 could benefit the virus. We thank the Reviewer for pointing out this issue in our initial manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Having reviewed an earlier version of this manuscript, I appreciated the recent progress made by the authors. I felt the entire body of work is quite solid and the interpretations are clear and not overstated. One piece of data I thought deserved a sentence or two of discussion was the complementation assay with Q530N TRMT1. This experiment suggests the possibility that cleavage of TRMT1 by Nsp5 may be an act to self-limit replication, although this result could also be due to the elevated levels of Q530N TRMT1 expression compared to WT. I still think it is worthy of discussion. Another thing I would recommend is to include the length of infection by SARS-CoV-2 in the figure legends.

      We thank the reviewer for their positive response and constructive comments.

      We have followed the Reviewer’s suggestion to further discuss how cleavage of TRMT1 may act to self-limit replication in the “Ideas and Speculation” subsection. We have also included the length of infection by SARS-CoV-2 in the figure legends.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the comments mentioned in the public review, this reviewer encourages the authors to address the following points:

      • Please clarify the rationale behind choosing 24 and 48 hours post-infection as time points for the analyses (Fig 1). One would expect even lower levels of TRMT1 and RNA modification after 72 and 96 hours post-infection.

      We chose the 24 and 48-hour time points since we have shown that MRC5 cells exhibit elevated accumulation of viral RNA at these time points (Raymonda et al 2022). However, at 72 and 96-hours post-infection, we have found that the MRC5-ACE2 cells exhibited cytopathic effects indicative of cell death that could confound results. We have included the rationale for these time points in our revised manuscript.

      • In Supplementary Figure 3, please add in the legend the meaning of the asterisk symbol.

      The asterisks denote non-specific bands that are still detectable in the TRMT1-KO cell line. We have updated the Figure Legend and thank the Reviewer for catching this omission.

      • In Supplementary Figure 3B, there is an intermediate band in lane 3 with C145A when using the antibody 609-659. The authors should clarify what that band is.

      The intermediate band in lane 3 (and in lane 6) of Supplemental Figure 3B represents non-specific detection of the Nsp5-C145A variant that exhibits extremely high levels of expression since it cannot self-cleave. We have clarified the identity of the band in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      I have only minor comments:

      Although the authors have done a commendable job of providing compelling biochemical evidence of TRMT1 cleavage by Nsp5, it is not clear how this enhances viral infection. The discussion presents the experimental findings and prior publications as a series of correlated observations without clearly specifying the mechanistic benefits of TRMT1 hijacking towards CoV propagation, or even proposing a mechanistic hypothesis to this end.

      We agree with the Reviewer that providing a mechanistic hypothesis on how TRMT1 cleavage impacts virus biology will help inform future studies. We have followed the Reviewer’s suggestion and discuss potential mechanisms in the “Ideas and Speculation” subsection.

      How do these experiments inform us about the cell biology of SARS-CoV- infections? Does Nsp5-mediated degradation start early in infection? Is the loss of TRMT1 sustained over the course of the infection? Do Nsp5 concentrations or relative amounts correlate with TRMT1 loss during this period? For instance, is there only a modest increase in Nsp5 levels from 24h to 48h? I would suggest adding a few more data points than just 24h and 48h in the cell culture experiments. As the manuscript stands right now, it will be a bit difficult for readers to appreciate the relevance of this study in its present form.

      These are excellent questions raised by the Reviewer. The temporal effects of SARSCoV-2 infection on TRMT1 levels will be an important area to dissect moving forward.

      As mentioned above, we would like to include additional time points beyond 24- and 48-hours post-infection. However, at 72 and 96-hours post-infection, we have found that the MRC5-ACE2 cells exhibited increased levels of cell death that could confound results.

      However, we do observe a correlation between the level of infection and the amount of TRMT1 depletion. In our newly added Figure 6C and 6D, we show that increasing the MOI leads to a concomitant increase in N-protein production that correlates with the amount of TRMT1 depletion. Moreover, we have added additional experiments to explore the biological relevance of our findings in terms of virion particle production and infectivity. We thank the reviewer for these insightful questions that have improved our manuscript and provide a foundation for future studies.

      Related to this previous comment: how do the authors rationalize their inference that TRMT1 is essential for SARS-CoV-2 infection, yet it is cleaved during the infection? What seems to be the advantage of this seemingly contradictory but possibly quite intriguing inference?

      We acknowledge the paradox that TRMT1 seems to be essential for SARS-CoV-2 replication but is cleaved during the infection. We propose several hypotheses to explain these findings:

      Hypothesis 1: TRMT1 could be a bystander target. The loss of TRMT1 expression leads to a decrease in modifications that impacts translation. This decrease in translation capacity of the infected cells would lead to decreased production of viral proteins and reduced viral replication. This could explain why TRMT1-deficient cells exhibit less virus production. This could also account for why the TRMT1-Q530N mutant might produce more virus. In this case, the cleavage of TRMT1 and biological effects on viral replication and virion production are coincidental. However, even if TRMT1 cleavage and inactivation does not impact viral replication or production, it would still be important to know the cellular impacts that contribute to disease pathogenesis.

      Hypothesis 2: The slight diminishment of viral replication due to host translation inhibition could outweigh the benefits of shutting down host responses dependent upon protein synthesis. The decrease in TRMT1-catalyzed tRNA modification caused by Nsp5 cleavage could severely inhibit host translation while viral translation can still be maintained through a tRNA pool optimized for viral translation, albeit at a slightly lower rate than if TRMT1 is not cleaved.

      Hypotheses 3: The Nsp5-TRMT1 interaction could allow the virus to bind tRNAs that are packaged in viral particles as suggested previously (Pena et al., 2022). The finding that expression of the non-cleavable TRMT1-Q530N variant enhances viral replication and infectivity supports the hypothesis that TRMT1 could facilitate tRNA uptake into viral particles. The packaging of specific tRNAs in viral particles could enhance viral translation in the subsequent round of infection, thereby enhancing infectivity and perhaps facilitating the species jump of SARS-CoV-2 towards hosts with incompatible codon bias.

      We have included these hypotheses in the new “Ideas and Speculation” subsection.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      After revision, I only have a few remaining remarks:

      l. 180 The authors write: We were able to process all 4 datasets with minimal adjustments to the default parameter values (Methods).

      But they still don't indicate how they vary parameters and how important this is for success or how this affects absolute measurements such as average cell length. Could they give a table of parameter values and some sense of sensitivity for any future user?

      We thank the reviewer for the suggestion. We see how this info is valuable for the user. We’ve added a table with the parameter values used for processing each dataset in the supplemental information, along with the default parameters for reference (lines 476 - 496). In that section we also discuss which parameters may affect the output measurements of cell size, etc.

      l. 192-193 They write 'The software performed well on BACMMAN, molyso and MoMa datasets.' Naming the datasets after the analysis methods used in the original papers could be confusing, as they analyse data with MM3. Not sure how best to resolve this, maybe using first author names instead.

      We thank the reviewer for pointing this out. We now refer to them with the first author names.

      Related to the request of ref. #1 for a video tutorial, the video currently displayed under the github readme.md section 'Usage guide' is not functional. And the video at the top of the same page is very short with minimal information.

      We thank the reviewer for letting us know the tutorial video was not functional. We’ve tested it on Linux, Mac and Windows machines on both Firefox and Chrome. We were not able to reproduce any problems for the video - could they let us know what browser / OS was used and any other specifics? If it’s easier, we can be reached through the Github page as well.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers:

      We would like to thank all the reviewers and the editors for their thorough and helpful feedback on our work. Before addressing specific questions and points, we would like to make a general comment on a mechanistic aspect of this study. The reviewers correctly pointed out that our study does not reveal the molecular mechanism that leads to centromeric histone depletion specifically from meiotic chromosomes. Identifying this mechanism requires a deep and thorough understanding of how centromeric histones are loaded and centromeres are established each cell cycle, and how they are maintained over time in different cell types. To our knowledge, these mechanisms have not been described in plants. To add a further layer of complexity, it appears that the mechanisms governing CENH3 maintenance may be (partially) different in plant mitotic and meiotic cells, and the mechanistic basis of this difference is unknown. Obviously, these are interesting but also complex questions and their resolution will require considerable resources and effort, which we believe is beyond the scope of this manuscript. Nevertheless, our finding that CENH3 maintenance and centromere function in meiotic cells are sensitive to heat stress is an unexpected discovery with profound implications for plant adaptation, which provides a strong incentive for further exploration of centromere maintenance mechanisms in plants.

      Furthermore, we would like to apologize to reviewers for poor quality of pictures in the original submission. It was decreased by conversion to a pdf format during submission.

      eLife assessment

      This important study reports how heat stress affects centromere integrity by compromising the loading of the centromere protein CENH3 and by prolonging the spindle assembly checkpoint during male meiosis in Arabidopsis thaliana. The evidence supporting the claims by live cell imaging is convincing, although deeper mechanistic insight is lacking, making the study overall somewhat preliminary in nature. This work will be of interest to a broad audience of biologists working on how chromatin states are affected by stress conditions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khaitova and co-workers present here an analysis of centromere composition and function during elevated temperatures in the plant Arabidopsis. The work relates to the ongoing climate change during which spikes in high temperatures will be found. Hence, the paper addresses a timely subject.

      The authors start by confirming earlier studies that high temperatures reduce the fertility of Arabidopsis plants. Interestingly, a hypomorphic mutant of the centromeric histone variant CENH3 (CENP-A), which was previously described by the authors, sensitizes plants to heat and results in a drop in viable pollen and silique length. The drop in fertility coincides with the formation of micronuclei in meiosis and an extension of meiotic progression as revealed by live cell imaging. Based on this finding, the authors then show that at high temperatures, the fluorescence intensity of a YFP:CENH3 declines in meiosis but remarkably not in the surrounding cells (tapetum cells). In addition, the amount of BMF1 (a Bub1 homolog and part of the spindle assembly checkpoint) also appears to decline on the kinetochores of meiocytes as judged by BMF1 reporter line. However, whether this is dependent on a decline of CENH3 or represents a separate pathway is not clear.

      We provide new data in Figure S6 showing that BMF1 loading on centromeres is substantially reduced in cenh3-4 mutants. Thus, efficient tethering of BMF1 to centromeres depends on CENH3.

      Finally, the authors measure the duration of the spindle checkpoint and find that it is extended under high temperatures from which they conclude that the attachment of spindle fibers to kinetochores is compromised under heat.

      Strengths:

      This is an interesting and important paper as it links centromere organization/function to heat stress in plants. A major conclusion of the authors is that weakened centromeres, presumably by heat, may be less effective in establishing productive interactions with spindle microtubules.

      Weaknesses:

      The paper does not explain the molecular reason why CENH3 levels in meiocyctes are reduced or why the attachment of spindle fibers to kinetochore is less efficient at high versus low temperatures.

      While we cannot explain the molecular mechanism underlying temperature-dependent depletion of CENH3 in meiocytes, the less efficient attachment of microtubules to the kinetochores at higher temperatures is likely caused by reduced levels of CENH3, which result in smaller centromeres that are less effective in establishing productive microtubule-kinetochore attachments. Here (new Figure S6) and in our previous study (Capitao et al. 2021), we have shown that amount of centromere/kinetochore proteins is reduced at centromeres in cenh3-4 mutants, and that these plants exhibit prolonged SAC and slower chromosome biorientation.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates how increased temperature affects pollen production and fertility of Arabidopsis thaliana plants grown at selected temperature conditions ranging from 16C to 30C. They report that pollen production and fertility decline with increasing temperature. To identify the cause of reduced pollen and fertility, they resort to living cell imaging of male meiotic cells to identify that the duration of meiosis increases with an increase in temperature. They also show that pollen sterility is associated with the increased presence of micronuclei likely originating from heat stress-induced impaired meiotic chromosome segregation. They correlate abnormal meiosis to weakened centromere caused by meiosis-specific defective loading of the centromere-specific histone H3 variant (CenH3) to the meiotic centromeres. Similar is the case with kinetochore-associated spindle assembly checkpoint(SAC) protein BMF1. Intriguingly, they observe a reverse trend of strong CENH3 presence in the somatic cells of the tapetum in contrast to reduced loading of CENH3 in male meiocytes with increasing temperature. In contrast to CENH3 and BMF1, the SAC protein BMF3 persists for longer periods than the WT control, based on which authors conclude that the heat stress prolongs the duration of SAC at metaphase I, which in turn extends the time of chromosome biorientation during meiosis I. The study provides preliminary insights into the processes that affect plant reproduction with increasing temperatures which may be relevant to develop climate-resilient cultivars.

      Strengths:

      The authors have mastered the live cell imaging of male meiocytes which is a technically demanding exercise, which they have successfully employed to examine the time course of meiosis in Arabidopsis thaliana plants exposed to different temperature conditions. In continuation, they also monitor the loading dynamics and resident time of fluorescently tagged centromere/kinetochore proteins and spindle assembly checkpoint proteins to precisely measure the time duration of respective proteins to study their precise dynamics and function in male meiosis.

      Weaknesses:

      Here the authors use only one representative centromere protein CENH3, one kinetochore-associated SAC protein BMF1, and the SAC protein BMF3 to conclude that heat stress impairs centromere function and prolongs SAC with increased temperatures. Centromere and its associated protein complex the kinetochores and the SAC contain a multitude of proteins, some of which are well characterized in Arabidopsis thaliana. Hence the authors could have used additional such tagged proteins to further strengthen their claim.

      Indeed, several other proteins have recently been characterized as centromere/kinetochore components and could have been included in the study to further validate the results presented. To strengthen our argument, we have added new experimental data (Figure S4) showing temperature-induced depletion of CENH3 in wild-type plants by immunocytology. Thus, we convincingly show that temperature stress reduces the amount of CENH3. This is likely to affect the loading of most kinetochore and centromeric proteins. Here (new Figure S6) and in our previous study (Capitao et al., 2021), we have shown that genetic depletion of CENH3 in cenh3-4 mutants results in reduced loading of CENPC, MIS12 and BMF1 at mitotic centromeres and reduced loading of BMF3 and BMF1 at meiotic centromeres. We also attempted to assess the levels of CENPC and MIS12 on meiotic chromosomes by immunocytology, but our antibodies, which work on mitotic spreads, did not stain meiotic chromosomes.

      Though the results presented here are interesting and solid, the study lacks a deeper mechanistic understanding of what causes the defective loading of CenH3 to the centromeres, and why the SAC protein BMF3 persists only at meiotic centromeres to prolong the spindle assembly checkpoint. Also, this observation should be interpreted in light of the fact that SAC is not that robust in plants as several null mutants of plant SAC components are known to grow as healthy as wild-type plants at normal growth conditions without any vegetative and reproductive defects.

      Thank you for raising this point. We are of the opinion that SAC operates and it is important in plants - we have added a citation to a preprint from the Schnittger lab (Lampou et al., 2023, BioRxiv) that was published while this manuscript was under review. We think this is the most comprehensive analysis of plant SAC to date, clearly showing that SAC delays progression to anaphase in the presence of spindle inhibitors, although adaptation eventually occurs and the cell cycle progresses. This is very similar to the situation in animals, which also undergo spindle adaptation in similar situations. The difference between plants and animals may be due to subsequent events, where plants are better able to tolerate genome instability and resume cell division in the presence of abnormal chromosome numbers. Robustness and redundancy may be another reason why plant mutants deficient in SAC do not show obvious growth retardation.

      One of the immediate responses to heat stress is the production of heat shock proteins(Hsps), which act as molecular chaperones to safeguard the proteome. It will be interesting to see if the expression levels of known HsPs can be correlated with their role in stabilizing the structure of SAC proteins like BMF1 to prolong its presence at the meiotic kinetochores.

      Indeed, the heat stress response is likely to be involved in this process. We sought to investigate the role of this pathway by analyzing Arabidopsis mutants deficient in HEAT-SHOCK FACTOR BINDING PROTEIN (HSBP), which acts as a negative regulator of the heat shock response. This experiment was prompted by the observation that hsbp mutants have reduced fertility. We expected that an unrestricted heat stress response might affect meiosis and pollen formation. However, our initial experiments did not show altered pollen viability in response to heat stress in hsbp plants and we did not pursue this line of research further.

      Reviewer #3 (Public Review):

      Summary:

      Khaitova et al. report the formation of micronuclei during Arabidopsis meiosis under elevated temperatures. Micronuclei form when chromosomes are not correctly collected to the cellular poles in dividing cells. This happens when whole chromosomes or fragments are not properly attached to the kinetochore microtubules. The incidence of micronuclei formation is shown to increase at elevated temperatures in wild-type and more so in the weak centromere histone mutant cenH3-4. The number of micronuclei formed at high temperatures in the recombination mutant spo11 is like that in wild-type, indicating that the increased sensitivity of cenh3-4 is not related to the putative role of cenh3 in recombination. The abundance of CENH3-GFP at the centromere declines with higher temperature and correlates with a decline in spindle assembly checkpoint factor BMF1-GFP at the centromeres. The reduction in CENH3-GFP under heat is observed in meiocytes whereas CENH3-GFP abundance increases in the tapetum, suggesting there is a differential regulation of centromere loading in these two cell types. These observations are in line with previous reports on haploidization mutants and their hypersensitivity to heat stress.

      Strengths:

      This paper is an important contribution to our insights into the impact of heat stress on sexual reproduction in plants.

      Weaknesses:

      While it is highly significant, I struggled to interpret the results because of the poor quality of the figures and the videos.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site.

      Reviewer #1 (Recommendations For The Authors):

      To complete the presented analysis, it would be great to analyze the signal strength of the here-presented BMF3 reporter at high temps, see below for further reasoning.

      Quantification of the BMF3 signal is difficult - it is only transiently associated with kinetochores and its level changes over time. Nevertheless, analysis of our movies taken under the same microscope settings indicates that the amount of BMF3 decreases with increasing temperature. This is illustrated in the new Figure S6C.

      Conversely, how is the BMF1 and BMF3 signal strength in cenh3-4 mutants?

      We performed an analysis of BMF1 and BMF3 signal in cenh3-4 mutants and observed a reduced level of signal from both proteins (Figure S6). In the case of BMF1, no signal was detectable in either somatic or meiotic cells.

      How do the authors explain the reduction in BMF1 signal at 26 and 30{degree sign}C versus the extension of the duration of the SAC as measured by the persistence of a BMF3 signal (line 192: "...reduces the amount of CENH3 and the kinetochore protein BMF1 on meiotic centromeres, potentially affecting their functionality..." versus line 213: "...We observed that while the BMF3:GFP signal persisted, on average, for about 22.7 min at 21 and 26{degree sign}C, its appearance was prolonged to 40.5 min at 30{degree sign}C..."). Is the BMF3 signal also reduced at high temps (see question above)?

      This is a very interesting point. While we see reduced levels of both proteins under heat stress or in cenh3-4 plants, the effect on BMF1 is much more pronounced and becomes undetectable under these conditions. This contrasts with BMF3, which appears to be reduced but is still clearly visible. These data suggest that BMF1 is more sensitive to reduced levels of CENH3 and it further corroborates the findings from the Schnittger lab that BMF1 is not the core component of SAC.

      Line 18-20: The observation that heat stress reduces fertility has been made by several research teams before this study. I propose to write "confirm"/"support" etc. instead of "reveal" to avoid a (presumably not intended) false priority claim in the abstract.

      We apologize, this was unintentional and we cite the relevant literature in the article. We have rewritten the abstract to avoid this impression.

      Figure 2: The panel/legend appears to be a bit mixed up. Panel C is described in legend under A. In addition, I cannot find any blue arrows in panel A (which is described as panel B). Correspondingly, the references to the panels in this figure (lines 134/135 and following) need to be updated. I am also not sure how the meiocytes in this figure were stained. The dots look like centromeres but then their intensity rather increases with increasing temperature. If correct, how can this be reconciled with the authors' statement that centromeres decrease in size at higher temps?

      We apologize for the mix up. An early version of the Figure was accidentally submitted and we now corrected it. The Panel B shows DAPI stained meiocytes at the tetrad stage and examples of micronuclei are indicated by arrowheads.

      Line 520: Should read "genotype" not "phenotype".

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) It is intriguing that heat stress impairs only the centromeres and segregation of meiotic chromosomes but not the mitotic chromosomes. No analysis of mitotic divisions is provided in the manuscript. As they have generated marker lines, it is reasonable to examine the mitotic time course as well by live monitoring of root tissues exposed to similar temperature conditions as done for meiotic analysis. This will help to address the effect of heat stress on mitotic centromeres and its comparison with meiosis will provide a better picture. There are two likely outcomes during mitosis:

      (a) It is possible that the heat stress also slows down mitotic progression as well as is the case in meiosis as shown in this paper and hence it is important to examine those as well to compare and contrast the CENH3/BMF1 dynamics in mitosis and meiosis.

      (b) The second scenario is that there is no effect of heat stress on the centromere integrity of mitotic chromosomes. In fact, the authors show indirect evidence in support of this wherein the eYFP: CENH3 showed a strong signal in the tapetal cells (somatic origin) surrounding the male meiocytes (generative origin). It is interesting that somatic cells of the tapetum show a strong signal whereas the meiocytes lack this. The authors should elaborate on this contrasting result.

      The effect we observed seems to be specific to meiosis. We analyzed the progression of mitosis in root cells and we see a negligible effect of temperature on mitotic progression and no micronuclei formation. Interestingly, in terms of CENH3 loading, root cells show a slight decrease in CENH3 at 30°C, in contrast to the situation in tapetum cells. These and other data suggest a tissue/cell specific behavior of centromere maintenance and deserve further analysis. We plan to publish data on mitosis and tissue-specific aspects of CENH3 loading in a separate manuscript.

      (2) Spindle assembly checkpoint (SAC) comprises several core proteins that are recruited to the kinetochores to correct the errors during the defective cell cycle. Here the authors demonstrate the prolonged presence of BMF3 as the only proof to claim that heat stress prolongs the spindle assembly checkpoint during metaphase I. Have the authors observed the dynamics of any other SAC core components such as MAD1, MAD2, MPS1, BUB3, and the like during heat stress?

      No, we did not. We provide several independent lines of evidence that centromere structure and functionality are affected, and spindle checkpoint analysis is only one of them. At the time we designed these experiments, the only experimentally validated and well-characterized component of the SAC was BMF3, and we used only on this protein as SAC reporter because a general analysis of the SAC was not the primary goal of our study. While this paper was under review, a preprint from the Schnittger lab focusing on plant SAC was published that comprehensively analyzed these SAC components in Arabidopsis and provided a solid foundation and resources for further research in this direction. This study also uses BMF3 as a reporter for SAC in meiotic cells. It is noteworthy that despite using different microscopic methods and different plant reporter lines, our labs independently arrived at exactly the same duration of BMF3 association with the kinetochore (i.e. 22 min).

      (3) Is BMF1 a component of SAC or the kinetochore? I understand that BMF1 is a part of the core SAC ( Komaki and Schnittger, 2017) although it localizes to the kinetochore. There are well-characterized kinetochore proteins in Arabidopsis such as Mis12, NUF2, NNF1, and SPC24(MUN1) which the authors could have used as a kinetochore marker. Regardless, here the authors used it as a kinetochore marker. Being a part of SAC, one would expect the prolonged presence of BMF1 similar to BMF3 in the meiotic kinetochores but it is the other way. How to explain these contrasting results?

      As discussed in the public section of the review, BMF1 does not seem to be the core component of SAC. Furthermore, this protein localizes to centromeres/kinetochore throughout the cell cycle and therefore, it cannot be used as SAC reporter.

      (4) Micronuclei can form as a result of chromosome missegregation as shown for spo11-1 and also due to segregation error caused by DNA repair defects. Here it is not clear what is the origin of micronuclei. It is very hard to decipher from live cell imaging. A simple meiotic spread of anthers of different treatments would address the origin of micronuclei.

      Cytology cannot easily determine the origin of micronuclei in meiotic cells. Acentric fragments produced from aberrant DNA repair will still be cytologically detectable only after metaphase I as they are tethered to the remaining chromatin via cohesion. Therefore, we took advantage of spo11 mutants that do not form any meiotic breaks, and hence cannot generate acentric fragments by aberrant repair, to discriminate the origin of micronuclei. We reason that all micronuclei produced in spo11 plants originate from chromosome mis-segregation and their increase at elevated temperature support the notion that heat stress further impairs chromosome segregation.

      (5) Fig.1 B The microspores are not clearly visible in the alexander-stained anthers. It is not clear which is fertile and which is sterile. A better quality picture would be ideal to appreciate the fact.

      Again, we apologize for poor quality of pictures due to manuscript conversion.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 2, it should be pointed out where the micronuclei are. I see here and there a single bright spot. In Arabidopsis, we have noticed bright spots under stress conditions that are autofluorescent signals. It needs to be shown that these spots are not observed in non-GFP lines. Better image quality may help too.

      The micronuclei in Figure 2 are visualized by DAPI staining, not with GFP. The nuclei are now indicated by arrowheads.

      (2) It was not possible to see the centromeres in Figure 3 hence I could not verify the fluorescence intensities of CENH3 and BMF1. There is also something wrong with the color codes blue and red in fig3B, C, and D.

      Again, we apologize for poor quality of pictures due to manuscript conversion.

      (3) Also in the videos it would help to point out where the micronuclei are seen. At what stage were these nuclei quantified? Given that meiosis progression in the cenh3-4 mutant is slower, it may be necessary to wait long enough to see established micronuclei. This information is supposed to be presented in Figure 2C. However, the X-axis shows time, not number. So I presume Fig 2C shows the duration of meiosis stages in the mutant. In Fig 2B, it shows the number of micronuclei per lobe. However, to correlate the incidence of micronuclei formation and the frequency of polyad formation (inviable microspores), one needs the quantification of the numbers of meiocytes carrying micronuclei. Then one can correlate the number of pollen per anther (shown in Fig 1c) with the incidence of micronuclei formation. The question of whether the degree of fertility reduction is due to micronuclei formation is a major issue that should be clarified.

      Then micronuclei were not quantified from the movies, but from DAPI stained whole anthers at the tetrad stage as indicated in the main text. We also apologize for confusion with the Figure 2 as we mixed up the panels in the original submission. This has been corrected in the new submission.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study nicely integrates a breadth of experimental and computational data to address fundamental aspects of RNA methylation by an important for biology and health RNA methyltransferases (MTases).

      Strengths:

      The authors offer compelling and strong evidence, based on carefully performed work with appropriate and well-established techniques to shed light on aspects of the methyl transfer mechanism of the methyltransferase-like protein 3 (METTL3), which is part of the methyltransferase-like proteins 3 & 14 (METTL3-14) complex.

      Weaknesses:

      The significance of this foundational work is somewhat diminished mostly due to mostly efficient communication of certain aspects of this work. Parts of the manuscript are somewhat uneven and don't quite mesh well with one another. The manuscript could be enhanced by careful revision and significant textual and figure edits. Examples of recommended edits that would improve clarity and allow accessibility to a broader audience are highlighted in some detail below.

      We thank the reviewer for the positive evaluation of our work. We have followed the suggestions and modified the text and figures as detailed further in our answers to the specific recommendations.

      Reviewer #2 (Public Review):

      Summary:

      Caflisch and coworkers investigate the methyltransferase activity of the complex of methyltransferaselike proteins 3 and 14 (METTL3-14). To obtain a high-resolution description of the complete catalytic cycle they have carefully designed a combination of experiments and simulations. Starting from the identification of bisubstrate analogues (BAs) as binders to stabilise a putative transition state of the reaction, they have determined multiple crystal structures and validated relevant interactions by mutagenesis and enzymatic assays.

      Using the resolved structure and classical MD simulations they obtained a kinetic picture of the binding and release of the substrates. Of note, they accumulated very good statistics on these processes using 16 simulation replicates over a time scale of 500 ns. To compare the time scale of the release of the products with that of the catalytic step they performed state-of-the-art QM/MM free energy calculations (testing multiple levels of theory) and obtained a free energy barrier that indicates how the release of the product is slower than the catalytic step.

      Strengths:

      All the work proceeds through clear hypothesis testing based on a combination of literature and new results. Eventually, this allows them to present in Figure 10 a detailed step-by-step description of the catalytic cycle. The work is very well crafted and executed.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      To fulfill its potential of guiding similar studies for other systems as well as to allow researchers to dig into their vast work, the authors should share the results of their simulations (trajectories, key structures, input files, protocols, and analysis) using repositories like Zenodo, the plumed-nest, figshare or alike.

      The reviewer is right. We have uploaded the simulation materials to Zenodo: the MD simulation data (trajectories, pdb files, parameter files), and the PLUMED file that was used for the DFTB3/MM metadynamics simulations. We provide the link in the “Data availability” section.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Coberski et al describes a combined experimental and computational study aimed to shed light on the catalytic mechanism in a methyltransferase that transfers a methyl group from Sadenosylmethionine (SAM) to a substrate adenosine to form N6-methyladenosine (m6A).

      Strengths:

      The authors determine crystal structures in complex with so-called bi-substrate analogs that can bridge across the SAM and adenosine binding sites and mimic a transition state or intermediate of the methyltransfer reaction. The crystal structures suggest dynamical motions of the substrate(s) that are examined further using classical MD simulations. The authors then use QM/MM calculations to study the methyl-transfer process. Together with biochemical assays of ligand/substrate binding and enzyme turnover, the authors use this information to suggest what the key steps are in the catalytic cycle. The manuscript is in most places easy to read.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      My main suggestion for the authors is that they show better how their conclusions are supported by the data. This includes how the electron density maps for example support the key interactions and water molecules in the active site and a better error analysis of the computational analyses.

      We thank the reviewer for the comments and suggestions. We have followed the suggestions and added error analysis of the computational results as well as additional figures (in the supplementary information) that illustrate key interactions and water molecules in the active site supported by the electron density.

      Reviewer #1 (Recommendations For The Authors):

      • The phrasing of the second sentence in the introduction is difficult to read. I am not sure it is necessary to define the DRACH motif if you are also giving the exact consensus sequence unless providing more context for other instances of the DRACH motif. Referring to this motif instead as "consensus sequence GGACU? may be more effective.

      The reviewer is right. We corrected the sentence accordingly.

      • In the second paragraph of the introduction, a further short description of how METTL3-14 is "involved" in diseases would be appreciated.

      We thank the reviewer for the comment. We made that clearer by including “by promoting the translation of genes involved in cell growth, differentiation, and apoptosis” together with a reference.

      • Is there any evidence that inhibiting METTL3-14 doesn't negatively impact healthy cells?

      We thank the reviewer for the question. Yes, there is such evidence and we added to the sentence “but not in normal non-leukaemic haemopoietic cells” together with a reference to make this point clearer.

      • Bringing up the MACOM complex in the third paragraph of the introduction is perhaps not necessary unless further discussing the MACOM complex later.

      The reviewer is right. We removed the mention of the MACOM complex.

      • Figure 1B: Color coding is difficult to distinguish on a screen and print out. More contrasting colors would be helpful.

      We thank the reviewer for the suggestion. We removed the transparency from the protein cartoon representation that was the reason for the low contrast.

      • The level of detail in the "MD simulations for mechanistic studies of RNA MTases" is not advised. Would strongly encourage condensing this section to improve clarity and accessibility to a larger audience.

      The reviewer is right. We removed non-essential parts of this paragraph.

      • Confirming the role of the hydroxyl in Y406 would be better supported by a Y406 -> F406 mutant because the A406 mutant could bind differently due to a loss of pi-stacking interactions.

      The purpose of the Y406A mutant was to eliminate the interaction of the aromatic sidechain with adenosine as seen from the structure with BA4. Since there is no involvement of the Y406-OH group with adenosine, mutating to F did not seem sufficient. Furthermore, by mutating Y406 to alanine, we also eliminate the possibility for a water-mediated hydrogen bond to the W398 backbone. Hence, with the alanine mutant we achieve the strongest possible effect on the enzymatic activity while the integrity of the active site is maintained as seen from the thermal shift assay.

      • For Figure 4D, can the authors justify why SAH was used as a metric for SAM binding instead of using SAM directly? Additionally, referring to the RNA as "ligand" instead of "RNA" in the Figure caption is more confusing than simply calling it RNA.

      We thank the reviewer for the comment. With the TSA, we wanted to show that with the adenosine binding mutants, the integrity of the METTL3 active site is still intact. It was shown that SAH is bound with higher affinity than SAM by METTL3 (DOI: 10.1016/j.celrep.2019.02.100). Since the magnitude of the thermal shift depends also on the affinity, we chose the higher-affinity binder SAH. There is no RNA per se shown in this figure. “Ligands” in the figure caption (A) refers to the three bound molecules that are shown and mentioned in the previous sentence: SAM, BA2, and BA4. “Ligand” in the figure caption (D) refers to “SAH” that was used in the experiment described and mentioned just after, but is now removed.

      • Figure 5D is very difficult to interpret. Removing the ribbons representing Y406 movement may make it easier to see. Color coding the Supplementary Movie 1 to match would be also helpful.

      The reviewer is right. We have changed the figure to make the different conformations of METTL3 and its Y406 sidechain clearer. However, we left the coloring of the different conformations as the colors are connected to different time points of the simulation. Following the suggestion of the reviewer we changed the coloring of SAM and AMP to match that of the supplementary movie.

      • Figure 10 is overwhelming as is. Removing the grey area around the binding sites and toning down the color of the substrate binding sites would help with visibility. The size of the chemical structures and illustrations is currently too small to easily be made out. A full page-sized figure may be beneficial for this figure.

      We agree with the reviewer and have changed the figure to make each reaction step clearer and better recognizable.

      Minor >edits

      • Change "Despite the growing knowledge on the diverse pathways" to "Despite growing knowledge of the diverse pathways involving METTL3-14".

      We corrected the sentence.

      • Perhaps use "redundant active site" instead of "degenerate active site".

      We changed the word as suggested.

      • Consider moving "The METTL3 MTase domain has the catalytically active SAM binding site and adopts a Rossmann fold that is characteristic of Class I SAM-dependent MTases" to before "METTL14 also has an MTase domain, however, with a degenerate active site of hitherto unknown function, and so-called RGG repeats at its C-terminus essential for RNA binding" to keep information about METTL3 together.

      We shifted the part of the text as suggested.

      • "Molecular dynamics studies have mainly focused on protein and bacterial MTases"? Does this mean bacterial MTases that methylate proteins?

      We thank the reviewer for the comment. This means bacterial MTases in general. The example that we mention is of a bacterial MTase that methylates a chemical precursor. We changed the sentence slightly to make that clearer.

      • In "Bisubstrate analogues bind in the METTL3 active site", please consider the following:

      • Change "and to investigate" to "and investigated".

      • Briefly describe the enzymatic assay in the main text.

      • Either more clearly defining "least potent" or change to "have the highest IC50 values".

      We made all the suggested changes to improve the description of the assay and its outcomes.

      • In Figure 3, remove some of the amino acid labels from panels A, C, and E for clarity, especially since panels B, D, and F more clearly demonstrate the interactions.

      We removed amino acids that were not involved in polar contacts and adapted the figure caption accordingly.

      • In panels 3D, 3F, and 4B, the lightning bolts are too small to make out as lightning bolts. An asterisk or other symbol may be easier to distinguish.

      We made the lightnings more than double the size to make them better recognizable.

      • In Figure 4C, no units are provided on the y-axis. Additionally, I do not believe the arrows indicating "Loss of activity" are necessary.

      These are arbitrary units as it is a ratio which is explained in the materials and methods section. We removed the arrows following the suggestion of the reviewer.

      • While demonstrating mutants with no activity still retain SAM binding is suggestive of the mutant impacting RNA binding, this would still be better supported with RNA binding studies. Electrophoretic mobility shift assays would be sufficient if Tm studies are time-consuming. While these experiments could be informative, we also acknowledge that they may be outside the scope of this current report.

      We thank the reviewer for suggesting these experiments and acknowledging that they would be outside of the scope of the current study. Such RNA binding experiments can turn out to be very time consuming, both in TSA and EMSA. The reason is mainly this: The RNA substrate must be chosen such that it binds sufficiently strong to the WT to cause an effect (thermal shift or electrophoretic mobility shift), but also to observe a clear difference in binding between WT and mutant proteins. Since many more residues of METTL3 and METTL14 contribite to RNA binding, the effects of individual mutants on affinity might be too small to be confidently detected in TSA or EMSA. In particular, we only identified the substrate adenosine binding residues, and mutating them and hence preventing adenosine binding alone, might not have a big effect on overall RNA binding affinity. The enzymatic assay that we used, on the other hand, is more sensitive since the detection is fluorescence based and quantifies the conversion of A to m6A in an RNA substrate, and more factors than just affinity play a role for enzymatic activity, such as correct orientation and stability of the adenosine in the active site and stabilization of the transision state.

      • A written narrative to accompany Supplementary Movie 1 would make it much more accessible to those unfamiliar with modeling and simulations.

      We thank the reviewer for the comment. We expanded the caption to the movie with a narrative describing different events at different time points in the movie.

      • Table 3 could be made clearer to those without MD experience by defining/indicating the top row as different computational models.

      The reviewer is right. We have added a footnote to Table 3 to clearly indicate the different density functional theory and semi-empirical density functional tight binding method used in this study. We also added another line in the table.

      • In the conclusion, the authors state "the height of the QM/MM free-energy barrier indicates that the methyl transfer step is not rate-determining." How does this compare to experimental data? Additional kinetic assays to demonstrate this experimentally would go a long way in convincing the reader of this conclusion.

      We thank the reviewer for the question. Kinetic assays have been performed for METTL3-14 and we mention and reference them in the text. We believe that further kinetic experiments would be outside of the scope of this study. Furthermore, the METTL3 mutants that we made show no activity in our enzymatic assay and hence kinetic studies would be probably impossible to do with them.<br /> As we show from QM/MM and describe in the text, the methyl cation in the SAM cofactor is transferred directly to the N6 position of the adenosine substrate. DFTB3/MM free energy simulations show that this mechanism has an energetic barrier of 15-16 kcal/mol. The turnover as published based on an enzymatic assay is 0.2-0.6 min-1 at ambient temperature which implies a barrier of ~20 kcal/mol. This value is higher than that determined for the methyl transfer alone as determined by QM/MM. Hence, in the overall mechanism, there must be a step that is slower than the methy transfer and hence we conclude that the methyl transfer is not the rate-limiting step.

      Reviewer #3 (Recommendations For The Authors):

      I only have a few comments about the work.

      (1) It would be good if the authors could show more of the data that is used as the basis for their conclusions. For example, IC50 values are presented (Table 1) without error estimates or an indication of the quality of the data that is used to estimate the data.

      We thank the reviewer for the suggestion. We included errors of the IC50 values and show the dose response curved from the enzymatic assay with the BAs as inhibitors in a new Supplementary Figure S1.

      (2) More substantially, it would be good to have a more detailed analysis of the crystal structures in terms of the properties that are mentioned/analysed. While the structures are relatively good (2.1 Å2.5Å), it is not clear to the reader how this data supports the interactions that are proposed. For example, the authors pinpoint a number of hydrogen bonding interactions and water molecules in the complexes. They might consider showing support for some of these in the electron density maps. Similarly, it would be good to show the densities that support the substantial differences of the Ade in the BA2 and BA4 complexes. These might be supplementary files. I note also that the structures are not yet released or available for analysis [which of course is a valid choice but also means that I cannot inspect the maps myself].

      We have added supplementary figures supporting the conformations of the BAs and their interactions with METTL3 with electron density, for BA1 and BA6 in Supplementary Figure S2, and for BA2 and BA4 in a new Supplementary Figure S3.

      (3) It would be useful with an error analysis of the off-rates estimated from the MD simulations and a discussion of the accuracy of these estimates. Even the slower dissociation events seem quite fast. What are the rough affinities of these molecules and how fast would the binding need to be to be compatible with the affinity and estimated off-rates?

      We expanded upon this in the results paragraph concerning the MD simulations. The affinities of METTL3-14 binding to AMP or m6AMP can be expected to be very low, with Kd values in the millimolar range. We have not measured these Kd values, nor have we found any published data, but we have conducted thermal shift assays with A and m6A and did not observe any significant thermal shifts in the melting temperature of METTL3-14 at high micromolar concentrations of these compounds, indicative of a very low binding affinity. This is to be expected because METTL3-14 should not methylate adenosines unspecifically but rather in the GGACU motif of substrate mRNA.

      (4) The authors use QM/MM simulations with metadynamics to estimate the energy profile of the methyl transfer reaction. They find a barrier of ca. 15 kcal/mol and suggest this to be compatible with the enzymatic turnover rate of ca. 0.3/min. Here it would be good with a clearer description of the possible sources of error and assumptions in making these statements. First, what is the error on the estimated energy profile from the metadynamics? The authors mention the analysis of progression of the PMF as a function of time, but that is in itself not a strong test for convergence (the PMF may stay constant if there is little sampling). What does the time series of the CV look like? Second, it seems as if the authors are assuming a large pre-exponential factor (10^9/s ?). Is that correct, and how sure are they of this value? Finally, when linking the barrier of the methyl-transfer reaction to the overall turnover rate it sounds like they assume that other parts of the reaction do not affect the turnover rate. Is that correctly understood, and what is the evidence for that? It sounds like the authors are saying that step 5 in the cycle (Figure 10) is limiting.

      We thank the reviewer for the questions. Accordingly, we have carried out additional simulations and statistical error analyses.

      (i) We have carried out two additional sets of multi-walker metadynamics simulations with the same setup as the original calculation, except for using different initial random seeds. Using the three independent sets of metadynamics simulations, we can better estimate the statistical uncertainty for the computed potential of mean force (PMF). We have updated the PMF in Fig. 8b, in which the solid curve represents the result averaged over three independent runs, and the shaded area represents the standard error of the mean of the three replicas. The figure caption of Fig. 8b is revised accordingly.

      (ii) To further illustrate the convergence behavior of the metadynamics simulations, we have included the following supplementary files: (1). Potentials of mean force computed with different numbers of deposited Gaussians are compared. (2). As suggested by the reviewer, we show the time series of the collective variable (CV) sampled by the 24 independent walkers during one set of metadynamics simulations. These results clearly indicate that the CV exhibits diffusive behaviors between the reactant and product regions, further supporting the adequate sampling and convergence of our metadynamics simulations.

      (iii) Regarding the issue of pre-factor used in the rate estimate, we have indeed used the common approximation of kT/h as in the regular transition state theory. Many studies in the literature support the use of this expression for very localized chemical reactions in enzymes. We have included several representative references along this line: (1) M. Garcia-Viloca, J. Gao, M. Karplus, D. G. Truhlar, How enzymes work: Analysis by modern rate theory and computer simulations, Science, 303, 186-195 (2004) (2) D. R. Glowacki, J. N. Harvey, A. J. Mulholland, Taking Ockham’s razor to enzyme dynamics and catalysis, Nat. Chem. 4, 169-176 (2012)

      (iv) Regarding the nature of the rate-limiting event, please see our response to reviewer 1.

      (5) The authors should ideally make the input files for their simulations available and deposit the plumed files in for example plumed-nest (as indicated in their reference 100).

      We agree with the reviewer. Accordingly, we have uploaded the PLUMED file that we have used for the DFTB3/MM metadynamics simulations (plumed.dat) together with the MD simulation trajectories to Zenodo.

      Minor

      (1) Many of the details in Figure 10 are very small and difficult to read without zooming in. Consider whether some parts could be made larger.

      The reviewer is right. We have changed the figure to make each reaction step clearer and better recognizable.

    1. Author Response:

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex

      field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are

      hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility

      data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in

      fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K + conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K + channels in bacterial physiology’,Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior',

      E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments pr be the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well- defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysis and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the

      referees raise.

      Critical synopsis of the articles cited by referee 2:

      1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na + instead of H + for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K + compared with 0.0000001 M of H + in E. coli, so K + is arguably a million times more important for the membrane potential than H + and thus the electrophysiology! Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H + . This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K + ) around! In our model Figure 4A is better explained by depolarisation due to K + channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K + . The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees:

      Reviewer #1:

      Summary:<br /> Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:<br /> - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:<br /> - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by

      Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al,‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2:

      Summary of what the authors were trying to achieve:<br /> The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:<br /> The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3:

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      1. An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      2. The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      3. It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      4. The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      5. Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      6. Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      1. In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      2. In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      3. The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    2. Author Response

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K+ conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K+ channels in bacterial physiology’, Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments probe the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well-defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysics and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the referees raise.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary: Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      • The authors report original data.

      • For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      • The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      • The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      • Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      • Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      • Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      Electrical signal propagation is an important aspect of the manuscript. However, a detailed >quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      • Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      • The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      • The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      • Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gKn^4 for potassium, gNam^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C).

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Fig. 3C needs the "still" for the movie of control C. owczarzaki (in Movie S1).

      We have now added a WT control in this figure panel.

      (2) The elongated cell shape is seen infrequently in control cells, and I wonder whether these events are transient inactivation of coHpo or coWts in these cells. Perhaps the authors could comment on this in the discussion.

      This is an interesting possibility and we have now included it in our discussion (Lines 401403).

      (3) Does C. owczarzaki normally aggregate or this is a lab-specific phenotype? For example, the slime mold Dictyostelium discoideum forms aggregates during its life cycle. Could some additional information about C. owczarzaki be added to the introduction?

      Unfortunately little is known about Capsaspora “in the wild”, as it was isolated as an endosymbiont from a laboratory strain of snails. However, some related filasterians isolated from natural environments also show aggregatve ability, indicating that aggregation is in fact a physiological process in this group of organisms. We have updated our introduction to include this fact (Line 78-80).

      Reviewer #2 (Recommendations For The Authors):

      The studies on Hippo signalling in Capsaspora are currently limited to genetic experiments and analysis of Yki/YAP localisation. Biochemical evidence that Co Wts phosphorylates Co Yki/YAP on a conserved serine residue(s) would give important further evidence that this essential signalling step in the animal Hippo pathway is conserved in Capsaspora. However, such experiments require antibodies that detect specific phosphorylation events, which might not be available at present. Is mass spectrometry of the phospho-proteome a potential approach that could be employed to investigate this? The benefit of this approach is it would give information on other Hippo pathway proteins and could be used to probe signalling events under different culture conditions (e.g., aggregate, non-aggregate).

      In response to this recommendation, we attempted to detect Phospho-coWts and PhosphocoHpo using commercial antibodies against mammalian their homologs, in the hope of cross-species reactivity. However, we could not detect a signal by Western blot. Thus better reagents or refinement of techniques beyond the scope of this article may be required to examine the phosphorylation of these Capsaspora proteins. There was a published report of Capsaspora phosphoproteome analysis (Sebe Pedros et al., 2016 Dev Cell), although phosphorylation of the conserved sites on coYki, coWts, and coHpo was not reported in this analysis, suggesting more targeted approaches may be needed to examine phosphorylation of these core Hippo pathway components.

      The following statement that Wts LOF is stronger than Hpo LOF Capsaspora is consistent with overgrowth phenotypes in flies and mammals:

      "Interestingly, we found that coWts-/- cells were significantly more likely to show nuclear mScarlet-coYki localization than coHpo-/- cells (Figure 1D), which is consistent with Hpo/MST independent activity of Wts/LATS previously reported in Drosophila and mammals (Zheng et al., 2015)."

      However, the following statement describes a stronger phenotype in Hpo LOF Capsaspora than Wts LOF:

      "As contractile cells in the coHpo mutant background tended to show a more extreme elongated morphology than the coWts mutant, we focused on the coHpo mutant for further analysis."

      Does this mean that Hpo can regulate actomyosin contractility in both Wts/Yki-dependent and independent manners? A genetic experiment, similar to those that have been performed in Drosophila and mammals could help to address this, e.g., what is the phenotype of Hpo, Yki Capsaspora and Wts, Yki double mutant Capsaspora? Do they phenocopy Yki LOF Capsaspora and are the actomyosin phenotypes associated with Hpo and Wts mutant Capsaspora completely or partially suppressed? The authors indicate that generation of double mutant Capsaspora is not technically possible at present, however.

      Indeed given available techniques the generation of such double mutants is not currently possible. With this phenotype (aberrant cytoskeletal dynamics), it is hard to say what a “stronger” phenotype is, and which mutant has the “stronger” phenotype. We have edited this statement to try and reflect this point (Line 208-209).

      Another outstanding question is whether the Hpo/Wts/Yki-related actomyosin phenotypes are linked to regulation of transcription by Yki, or are regulated non-transcriptionally. Indeed, a non-transcriptional role for Drosophila Yki in promoting actomyosin contractility has been reported (Fehon lab, Dev Cell, 2018). Generation of Scalloped/TEAD mutant Capsaspora would allow this question to be investigated. Alternatively, this could be explored using variant Co Yki transgenes, e.g., one a Co Yki transgene does not form a physical complex with Co Sd/TEAD and a Co Yki transgene that is targeted to the cell cortex.

      To address this point, we tested whether a conserved amino acid residue in coYki (F123) that is required for transcriptional activity of human YAP (in this case, F95) is required for the phenotypic effects of the coYki 4SA mutant. We found that, in contrast to expression of coYki 4SA, expression of a coYki 4SA F123A mutant showed no effect on cell or aggregate morphology. These new results, which support a requirement for transcriptional activity for coYki function, have now been added to Figure 7.

      Reviewer #3 (Recommendations For The Authors):

      Repetition from previous publication:

      (1) ej: last sentences of the abstract in both works: From Phillips et al. eLife 2022;0:e77598: "Taken together, these findings implicate an ancestral role for the Hippo pathway in cytoskeletal dynamics and multicellular morphogenesis predating the origin of animal multicellularity, which was co-opted during evolution to regulate cell proliferation".

      From this manuscript: "Together, these results implicate cytoskeletal regulation but not proliferation as an ancestral function of the Hippo pathway and uncover a novel role for Hippo signaling in regulating cell density in a proliferation-independent manner "

      Our two papers deal with different components of the Hippo pathway: Yorkie/YAP/coYki in Phillips et al. eLife 2022;0:e77598 and upstream kinases in the current paper. The fact that perturbing different components of the pathway leads to similar conclusions actually strengthens the overall conclusion. Nevertheless, to be more clear about the novelty of the current manuscript, we have now changed the current text from “Hippo pathway” to “Hippo kinase cascade”, to emphasize that the current analysis deals with kinases upstream of Yorkie/YAP/coYki (Lines 35, 368-371).

      (2) The authors claim that the change in localization of coYki in Hpo -/- and Wts -/- , being now able to enter the nucleus, is the demonstration that the nuclear regulation of Yki by the Hippo pathway is ancestral to animals. Nevertheless, the authors had already made this claim in their publication of eLife 2022, when they made a mutant version of Yki with the four conserved phosphorylation sites (Sebé-Padrós 2012) mutated. Figure 5 A to F in Phillips et al. eLife 2022;0:e77598. In their words "This regulation of coYki nuclear localization, along with the previous finding that coYki can induce the expression of Hippo pathway genes when expressed in Drosophila (Sebé-Pedrós et al., 2012), suggests that the function of coYki has a transcriptional regulator and Hippo pathway effector is conserved between Capsaspora and animals. ".

      I understand that the localization of Yki in the coHpo-/- and coWts-/- is needed as part of final proof that Hpo and Wts are the kinases that control Yki phosphorylation in C. owczarzaki, but does not constitute a completely new message and should be written like that. Figure 1C of the actual manuscript drives to the same conclusion as Figure 5 A to F in Phillips et al. eLife 2022;0:e77598

      We think that demonstrating that Hippo and Warts orthologs specifically are responsible for regulation of coYki localization is a very important finding: Many unicellular organisms encode Hippo, Warts, and/or Yorkie’s transcriptional factor partner Sd, but not Yorkie. Our understanding is that in these earlier-branching unicellular organisms, the Hippo/Warts kinase module and Sd-like proteins functioned in distinct signaling modules. Thus Yorkie has the interesting property of “fusing” these two distinct signaling modules when it emerged. In this framework, it is interesting to show that this “fusion” occurred in Capsaspora, the most distant known relative of animals with a Yorkie ortholog, indicating that this “fusion” event is very ancient. Although fleshing out of this idea is beyond the scope of this manuscript and we plan to write about it elsewhere, we have modified our discussion to point out the importance that Hippo and Warts specifically are upstream regulators of coYki.

      In Drosophila among the genes transcriptionally regulated by Yki, are the positive regulators of the Hippo pathway in order to down regulate the Yki production.

      (1) The authors don't explain if these upstream regulators of the Hippo pathway are conserved in C. owczarzaki.

      We have now indicated the conservation of some upstream Hippo pathway components (Line 69-71).

      (2) Also it would be important to know how much coYki is being active in the C. owczarzaki in the mutant lines of coHpo-/- and coWts-/- in respect to wt and also in respect to coYki 4SA, and how this is impacting the transcription and protein production of down stream genes of coYki. I think some transcriptional and proteomic data would be informative. At least for those genes related with cytoskeleton.

      We have now performed RNA-seq on the coHpo and coWts mutants to address the concerns above (See Figure 8 and the final section of Results).

      Related with the above. Among the downstream targets of coYki, the authors mentioned in their previous work (Phillips et al. eLife 2022;0:e77598) that B-integrins were up regulated in coYki -/- suggesting that B-integrins could be behind the stronger cell-substrate attachment observed in the coYki-/- mutant. It would be important to investigate if the integrin adhesome is now down regulated and how previous and new results are related to the stronger cellsubstrate attachment in the coHpo-/- and coWts-/- lines. It would be important that previous results on coYki-/-, a mutant line of the same pathway, are discussed in these two new mutant contexts.

      Two Capsaspora integrin beta genes were previously found to be upregulated in the coYki mutant (CAOG_05058 and CAOG_01283, from Phillips et al., 2022 eLife). In our coWts and coHpo mutant RNAseq data, we see that CAOG_05058 is upregulated in both coHpo and coWts mutants, whereas CAOG_01283 does not show significantly different expression in either the coHpo or coWts mutant. Because the CAOG_05058 expression data seems to go in the “opposite” direction than you might expect (i.e. not “down regulated” as the reviewer predicts), and because we see no change in expression in CAOG_01283, these results are difficult to interpret. Therefore the role of integrins in Capsaspora Hippo pathway mutant phenotypes is thus still an open question.

      Some cells from the coHpo-/- and coWts-/- mutant lines, show higher attachment to the substrate, which results in an elongated shape while the cell detaches from the substrate. The authors claim this phenotype as a contractile behavior in these cells. This behavior would be caused by changes in cytoskeleton regulation or increased number of microvilli or a change in the distribution of microvilli.

      (1) In my opinion, this phenotype can not be considered a behavior per se (the cells become round once they are free from the substrate, so the elongation is temporal and the contractile behavior is a consequence from this attachment to the substrate), so I would not say that the Hippo pathway controls a contractile behavior as the authors state as one of the main conclusions of the manuscript.

      Many cell behaviors are known to depend on external conditions, such as substrates, growth factors, nutrients, chemokines, etc., and are therefore “temporal” by the reviewer’s criteria. We therefore feel that the phenotype we describe here can be considered a cell behavior.

      (2) On the other hand I think that further efforts on microscopy or immunocytochemistry could be performed in order to discern among the different causes; more microvilli? change in microvilli distribution? change in the acto-myosin cytoskeleton? Moreover these options are not mutually exclusive and very likely the explanation is multifactorial.

      (3) coWts-/- has a different phenotype at the periphery of the aggregates than coHpo-/-. The authors use stable transfected lines with NMM-Venus to visualize microvilli. It would be interesting that further experiments using this tool would be performed in order to visualize putative differences of the cell membrane at the periphery in the two mutant genotypes.

      We have now performed experiments examining filopodia in round vs elongated cells using the NMM-venus marker, as well as differences in filopodial morphology within aggregates in the different genotypes. Our data and conclusions are included in our updated manuscript (Figure 3- figure supplement 1).

      The authors nicely inspect the consequences of the mutant lines coHpo-/- and coWts-/- in the formation of the aggregates. They find that the aggregates in these cases are more densely packed likely due to the higher attachment from microvilli, which they are able to revert by using myosin inhibitors.

      (1) As mentioned above, it would be interesting that further experiments are performed by using NMM-Venus transfection into the coHpo-/- and coWts-/-genotypes in order to visualize putative differences of the strength and distribution of the microvilli in the aggregates of these two mutant genotypes. These experiments would inform if more or less microvilli contacts are created in these lines and support a mechanical explanation of the denser aggregates in the mutant lines, as they now suggest in the discussion.

      We have now performed these experiments, and our data and conclusions are described in the updated manuscript (Figure 5- figure supplement 1).

      (2) On the other hand, myosin inhibition through blebbistatin increases the number of elongated cells in the mutant lines, demonstrating that myosin is necessary for the cells to resolve their substrate attachment and become round. In my view is confusing that myosin is needed for cells to become round again (wt phenotype) and at the same time myosin inhibition is needed for aggregates to become less dense (wt phenotype). Do they lose density because more elongated cells are now in the aggregate? These results look confusing to me and I think they should be better discussed. Again the above transfections of NMM-Venus into the coHpo-/- and coWts-/-genotypes could be informative.

      We have attempted to detect cells with an “elongated” morphology within WT and mutant aggregates but so far have been unable to visualize such cells. More advanced microscopy techniques at extended time scales may allow us to observe such things, but we believe such studies are beyond the scope of this manuscript.

      The authors do not connect and discuss their results with a very relevant study done in Drosophila, Xu J et al. Dev Cell. 2018; 46(3): 271-284.e5, where a transcriptionally independent role of Yki is characterized. In Drosophila, Yki has an important role in a positive feedback loop with myosin at the cortical part of the cell, which is especially relevant for cytoskeleton regulation.

      The results encountered by the authors in their previous study with coYki-/-, indicated that coYki was important for proper actin dynamics and cell shape in C. owczarzaki. At that moment they did not interrogate if this phenotype could be due to the lack of a possible role of coYki in the cortex and they argue that the phenotype was caused by the lack of transcription regulation of downstream genes of coYki, which actually many were cytoskeleton related.

      Because the cortex function of Yki is independent of regulation of Hpo and Wts, the authors could use these genotypes by comparing them with WT (where the cortical role of Yki should be the same) and coYki-/- to investigate if the cortex role of Yki, is conserved in C. owczarzaki. In Drosophila the cortex role of Yki has been suggested to control tension at the cell surface. Drosophila Yki at the cortex activates myosin II through the N-terminal part of the protein and establishes a positive feedback loop by down regulating the Hippo pathway and obtaining therefore more active DmYki into the nucleus. This mechanism has been proposed by Xu et al. to work as the link between sensing cell tensions at the surface with control of tissue proliferation.

      In my opinion these are relevant results in the field that should be addressed in this study or at least well discussed. Actually, I think they could be a great opportunity for investigating if a putative cortex role of Yki is ancestral to its role linked to the Hippo pathway.

      We have now addressed this study in our manuscript- please see our response to reviewer #2’s last comment above.

      It would be informative to understand how stable expression through hygromicin selection is achieved in the transfection experiments. Is the recombinant plasmid integrated in the genome? Or is it stable as an episome?

      We believe that the plasmids stably integrate, as we never lose fluorescent signal once established in a clonal line, even after extended culturing (>6 months). It may be worthwhile to definitely determine integration vs. episome in future studies.

      The authors do not speculate or discuss how cell tension and cell proliferation is different for a unicellular organism or a tissue (multicellular) and I think should be addressed since the contexts are different.

      This is an interesting and important point, which we plan to discuss in detail in an upcoming review article, as a proper discussion of this idea, we think, is beyond the scope of this manuscript.

      Minor point. The study should cite other unicellular holozoans that have been also developed into treatable organisms such as Monosiga brevicollis (Woznica A, Kumar A, et al 2021eLife 10:e70436) and Abeoforma whisleri (Faktorová, D., Nisbet, R.E.R., Fernández Robledo, J.A. et al. Nat Methods17, 481-494 (2020) in line 83 of the manuscript. I am sure the authors appreciate how much effort there is behind every non-model organism put forward as experimentally treatable and should be properly acknowledged.

      We agree, and we have now included these examples of non-model organism development in our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank all the reviewers for taking the time to assess and provide valuable feedback on the manuscript. We believe these comments helped clarify the manuscript’s prose, and the suggestions on the functionality and aim of the toolbox were globally incorporated into the following updates of the toolbox. Particularly, we would like to point out some changes that will help all reviewers, independently of their individual comments, to understand the current state of the toolbox and some systematic changes that were made to the manuscript.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which is reflected by changes made throughout the manuscript, most particularly in Figure 3 and Table 1. A beneficial side-effect of this is a much simpler structure for MotorNet which ought to contribute positively toward its usability by researchers in the neuroscience community.

      We also refactored some terminology to be more in line with current computational neuroscience vocabulary:

      • The term “plant”, which comes from industrial engineering and is more niche in neuroscience, has been replaced by “effector”.

      • The term “task” has been replaced by “environment” to match the gymnasium toolbox terminology, which MotorNet is now compatible with. Task objects essentially performed the same function as environment objects from the gymnasium toolbox.

      • The term “controller” has been replaced by “policy” throughout, as this term is more general.

      • The term “motor command” is very specific to the motor control subfield of neuroscience, and therefore is replaced by “action”, which is more commonplace for this modelling component in computational neuroscience and machine learning.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      “The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.”

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      (a) The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not. While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      “Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.”

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      (b) I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      (c) The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      “This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).”

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      “In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.”

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comment.

    1. Author Response

      This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.

      Thank you for the thorough and positive review of our work! We will incorporate this feedback to strengthen the manuscript. Specifically, we plan to revise the Discussion section to include a deeper consideration of the limitations of the original data, a description of the capacities of our method for conducting non-linear analyses, and the role data normalization plays in applicability of our tool.

      Reviewer 1:

      Strengths:

      The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.

      Weaknesses:

      However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.

      Thank you for the positive feedback, we will address your comments in our revision. We agree that any data pre-processing steps will have down-stream consequences on the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we argue that the sensitivity of analysis results to pre-processing choices underscores the need for establishing statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. The reviewer brings up an excellent point that we can further elaborate on how our methods actually reduce the need for such pre-processing steps. Indeed, our method provides smooth estimation results along the functional domain (i.e., across trial timepoints), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. For example, adjustment for session-to-session variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of session-level random effects. This heterogeneity would then influence the width of the confidence intervals. This stands in contrast to “sweeping it under the rug” with a pre-processing step that may have an unknown impact on the final statistical inferences. Similarly, the level of smoothing is at least in part selected as a function of the data, and again is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.

      Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution. The same question applies if the z-score is calculated based on various responses or even baselines.

      This is an important question given how common this practice is in the field. Briefly, application of pre-processing steps will change the interpretation of the results from our analysis method. For example, if one subtracts off a pre-trial baseline average from each trial timepoint, then the “definition of 0”, and the interpretation of coefficients and their statistical significance, changes. Similarly, if one scales the signal (e.g., divides the signal magnitude by a trial- or animal-specific baseline), then this changes the interpretation of the FLMM regression coefficients to be in terms of an animal-specific signal unit as opposed to a raw dF/F. This is, however, not specific to our technique, and pre-processing would have a similar influence on, for example, linear regression (and thus t-tests, ANOVAs and Pearson correlations) applied to summary measures. We agree with the reviewer that explicitly discussing this point will strengthen the paper.

      While it is difficult to make general claims about the anticipated performance of the method under all the potential pre-processing steps taken in the field, we believe that most common pre-processing strategies will not negatively influence the method’s performance or validity; they would, instead, change the interpretation of the results. We are releasing a series of vignettes to guide analysts through using our method and, to address your comment, we will add a section on interpretation after pre-processing.

      How reliable the method is if the data are non-stationary and the baselines undergo major changes between separate trials?

      This is an excellent question. We believe the statistical inferences will be valid and will properly quantify the uncertainty from non-stationarities, since our framework does not impose stationarity assumptions on the underlying process. It is worth mentioning that non-stationarity and high trial-to-trial variability may increase variance estimates if the model does not include a rich enough set of covariates to capture the source of the heterogeneity across trial baselines. However, this is a feature of our framework, rather than a bug, as it properly conveys to the analyst that high unaccounted for variability in the signal may result in high model uncertainty. Finally, mixed effects modeling provides a transparent, statistically reasonable, and flexible approach to account for between-session, and between-trial variability, a type of non-stationarity. We agree with the reviewer that this should be more explicitly discussed in the paper, and will do so.

      Finally, what is the rationale for not using non-linear analysis methods? Following the paper's logic, non-linear analysis can capture more information that is diluted by linear methods.

      Functional data analysis assumes that the function varies smoothly along the functional domain (i.e., across trial timepoints). It is a type of non-linear modeling technique over the functional domain since we do not assume a linear model (straight line). Therefore, our functional data analysis approach is able to capture more information that is diluted by linear models. While the basic form of our model assumes a linear change in the signal at a fixed trial timepoint, across trials/sessions, our package allows one to easily model changes with non-linear functions of covariates using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models.

      Reviewer 2

      Strengths:

      The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.

      The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.

      Thank you for the positive assessment of our work!

      Weaknesses:

      Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial.

      As described by the authors, fitting pointwise linear mixed models and performing t-test and Benjamini-Hochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.

      We agree with the reviewers that providing more detail about the drawbacks of the approach applied in Lee et al., 2019 will strengthen the paper. We will add an example analysis applying the method proposed by Lee et al., 2019 to show how the set of timepoints at which coefficient estimates reach statistical significance can vary dramatically depending on the sampling rate one subsamples their data at, a highly undesirable property of this strategy. Our approach is robust to this, and still provides a multiple comparisons correction through the joint confidence intervals.

      In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.

      This is a good point. In our experience, the code is still quite fast (often taking seconds to tens of seconds in our experience) on a standard laptop when fitting complex models that include, for example, 10 covariates, or complex random effect specifications on dataset sizes common in fiber photometry. In the manuscript, we included results from simpler models with few covariates in an attempt to show results from the FLMM versions of the standard analyses (e.g., correlations, t-tests) applied in Jeong et al., 2022. Our goal was to show that our method reveals effects obscured by standard analyses even in simple cases. Some of our models did, however, include complex nested random effects (e.g., the models described in Section 4.5.2).

      Like other mixed-model based analyses, our method becomes slower when the number of observations in the dataset is on the order of tens of thousands of observations. However, we coded the methods to be memory efficient so that even these larger analyses can be run on standard laptops. We thank the reviewer for this point, as we worked extremely hard to scale the method to be able to efficiently fit models commonly applied in neuroscience. Indeed, challenges with scalability were one of the main motivations for applying the estimation procedure that we did; in the appendix we show that the fit time of our approach is much faster than existing FLMM software such as the refund package function pffr(), especially for large sample sizes. While pffr() appears to scale exponentially with the number of clusters (e.g., animals), our method appears to scale linearly. We will more explicitly emphasize the scalability in the revision, since we agree this will strengthen the final manuscript.

      Reviewer #3

      Strengths:

      The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.

      Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.

      Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.

      Thank you for the positive assessment of our work!

      Weaknesses:

      While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.

      We appreciate this and, to address your and other reviewers’ comments, we are creating a series of vignettes walking users through how to analyze photometry data with our package. We will include algebraic illustrations to help users gain familiarity with the regression modeling here.

      While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:

      In section 2.3, the authors use FLMM to identify an instance of Simpson's Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors' metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors' approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects. The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.

      While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point that we had not considered. They have convinced us that acknowledging and elaborating on this alternative perspective will strengthen the paper. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals sense the reward delivery. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this (potentially) learned predictability alone could account for the increase in signal magnitude across sessions.

      After reading the reviewer’s comments, we consulted with a number of researchers in this area, and several felt that a CS+ can serve as a reward, within itself. From this perspective, the rewards in the Jeong et al., 2022 experiment might still be considered unexpected. After discussing extensively with the authors of Jeong et al., 2022, it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that served as a cue. This underscores the difficulty of preventing perception of reward delivery in practice. As this paper is focused on analysis approaches, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting both sides.

      Overall, we agree with the reviewer that future experiments will be needed for testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our attempt to document our conversations with the Jeong et al., 2022 authors may have room for improvement, we hope the reviewer can appreciate that this was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting the discussion. The Jeong et al., 2022 authors could easily have avoided acknowledging the potential incompleteness of their theory, by claiming that our results do not invalidate their predictions for a random reward, as the reward was not unpredicted in the experiment (as a result of the inadvertent solenoid CS+). Instead, they went out of their way to emphasize that their experiment did test a random reward, and that our results do present problems for their theory. We think that engagement with re-analyses of one’s data, even when findings are inconvenient, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.

      Finally, we would like to reiterate that this conversation is happening because our method, by analyzing the signal at every trial timepoint, revealed a neural signal that appears to indicate that the animals sense reward delivery. Ultimately, this was what we set out to do: help researchers ask questions of their data that they could not ask before. We believe that having a demonstration that we can indeed do this for a “live” issue is the most appropriate way of demonstrating the usefulness of the method.

      It is clear the reviewer put a lot of time into understanding what we did, and was very thoughtful about the feedback. We would like to thank the reviewer again for taking such care in reviewing our paper.

      If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane.

      While we appreciate that the post hoc reasoning of the authors of Jeong et al., 2022 may not seem germane, we would like to provide some context for its inclusion. As statisticians and computer scientists, our role is to create methods, and this often requires using open source data and recreating past analyses. This usually involves extensive conversation with authors about their data and analysis choices because, if we cannot reproduce their findings using their analysis methods, we cannot verify that results from our own methods are valid. As such, we prefer to conduct method development in a collaborative fashion, and we strive to constructively, and respectfully, discuss our results with the original authors. We feel that giving them the opportunity to suggest analyses, and express their point of view if our results conflict with their original conclusions, is important, and we do not want to discourage authors from making their datasets public. As such, we conducted numerous analyses at the suggestion of Jeong et al., 2022 and discussed the results over the course of many months. Indeed the analyses in the Appendix that the reviewer is referring to were conducted at the suggestion of the authors of Jeong et al., 2022, in an attempt to rule out alternative explanations. We nevertheless appreciate that our interpretations of these results can include some of the caveats suggested by the reviewer, and we will strive to improve these sections.

      Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.

      We agree with the reviewer that the results suggest that new experimental designs will likely be necessary to adjudicate between models. It is our hope that, by weighing the different issues and interpretations, our paper might provide useful suggestions into what experimental designs would be most beneficial to rule out competing hypotheses in future data collection efforts. We believe that our methodology will strengthen our capacity to design new experiments and analyses. We will make the reviewer’s suggestions more explicit in the discussion by emphasizing the limitations of the original data.

      Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (ΔF/F) with smoothing and baseline correction and this does not seem to have been considered in the argument.

      We are disappointed to hear that this extensive set of analyses, much of which was conducted at the suggestion of Jeong et al., 2022, was not convincing. We agree that acknowledging any pre-processing would provide useful context for the reader. We do wish to clarify that we analyzed the data that were made available online (raw data was not available). Moreover, for comparison with the authors’ results, we felt it was important to maintain the same pre-processing steps as they did. These conditions were held constant across analysis approaches; therefore, we think that the changes within-trial are likely not influenced substantially by these pre-processing choices. While we cannot speak definitively to the impact any of the processing conducted by the authors had on the results, we believe that it was likely minor, given that the timing of signals at other points in the trial, and in other experiments, were as expected (e.g., the signal rose rapidly after cue onset in Pavlovian tasks).

      Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.

      We appreciate the nuance of this point, and we will add it to our discussion. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high dF/F magnitudes in both time-windows. We do wish to point out that, at the request of the authors, we analyzed many experiments from the same animals and in most cases did not observe other indications of photobleaching. Hence, it is not clear to us why this particular set of experiments would garner additional skepticism regarding the potential for photobleaching to invalidate results. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included, at the suggestion of Jeong et al., 2022 simply as a way of acknowledging that non-linearities in photobleaching can occur.

      Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors' description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.

      Thank you for pointing this out!! We will remove the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.

      The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.

      This point was meant to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of open science is acknowledging both areas where analyses support and conflict with those of the original authors. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we will make these changes.

      A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.

      Thank you for this suggestion, and we agree this could be a useful analysis for the field. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we will make changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. We only had space in the manuscript to include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify analyzing a third dataset. As you may surmise from the one we presented, reanalyzing a new dataset is usually very time consuming, and invariably requires extensive communication with the original authors. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with five groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method and compares the results from those yielded by standard analysis of AUCs is already accepted and in press. Hence there should soon be additional demonstrations of what the method can do in less controversial settings. Finally, our forthcoming vignettes include additional analyses, not included in the manuscript, that replicate positive results. We take your point that our description of the data supporting one theory or the other should be qualified, and we will correct that. Again, your review was very thorough, and we appreciate your taking so much time to help us improve our work.

      Reviewer #2 (Recommendations For The Authors):

      First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.

      Thank you!

      I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.

      This is an excellent point and we will make this suggested change in the Methods and Discussion section in the next draft.

      From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.

      We appreciate your thinking on this point, as it would definitely help expand use of the method. We included a brief point in the Discussion that this package would be useful for other techniques, but we will expand upon this.

      Reviewer #3 (Recommendations For The Authors):

      The authors should define 'function' in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7. Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).

      Thank you, this is a very good point and will be critical for helping analysts describe and interpret results. We will add more detail to the Methods section on this point.

      The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).

      This is a great suggestion and we will add this important point to the discussion , especially in light of the factorial designs common in neuroscience experiments.

      In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.

      We will make this change and agree this is a better phrasing.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable insights into how the brain parses the syntactic structure of a spoken sentence. A unique contribution of the work is to use a large language model to quantify how the mental representation of syntactic structure updates as a sentence unfolds in time. Solid evidence is provided that distributive cortical networks are engaged for incremental parsing of a sentence, although the contribution could be further strengthened if the authors would further highlight the main results and clarify the benefit of using a large language model.

      We thank the editors for the overall positive assessment. We have revised our manuscript to further emphasize our main findings and highlight the advantages of using a large language model (LLM) over traditional behavioural and corpus-based data.

      This study aims to investigate the neural dynamics underlying the incremental construction of structured interpretation during speech comprehension. While syntactic cues play an important role, they alone do not define the essence of this parsing process. Instead, this incremental process is jointly determined by the interplay of syntax, semantics, and non-linguistic world knowledge, evoked by the specific words heard sequentially by listeners. To better capture these multifaceted constraints, we derived structural measures from BERT, which dynamically represent the evolving structured interpretation as a sentence unfolds word-by-word.

      Typically, the syntactic structure of a sentence can be represented by a context-free parse tree, such as a dependency parse tree or a constituency-based parse tree, which abstracts away from specific content, assigning a discrete parse depth to each word regardless of its semantics. However, this context-free parse tree merely represents the result rather than the process of sentence parsing and does not elucidate how a coherent structured interpretation is concurrently determined by multifaceted constraints. In contrast, BERT parse depth, trained to approach the context-free discrete dependency parse depth, is a continuous variable. Crucially, its deviation from the corresponding discrete parse depth indicates the preference for the syntactic structure represented by this context-free parse. As BERT processes a sentence delivered word-by-word, the dynamic change of BERT parse depth reflects the incremental nature of online speech comprehension.

      Our results reveal a behavioural alignment between BERT parse depth and human interpretative preference for the same set of sentences. In other words, BERT parse depth could represent a probabilistic interpretation of a sentence’s structure based on its specific contents, making it possible to quantify the preference for each grammatically correct syntactic structure during incremental speech comprehension. Furthermore, both BERT and human interpretations show correlations with linguistic knowledge, such as verb transitivity, and non-linguistic knowledge, like subject noun thematic role preference. Both types of knowledge are essential for achieving a coherent interpretation, in accordance with the “constraint-based hypothesis” of sentence processing.

      Motivated by the observed behavioural alignment between BERT and human listeners, we further investigated BERT structural measures in source-localized EEG/MEG using representational similarity analyses (RSA). This approach revealed the neural dynamics underlying incremental speech comprehension on millisecond scales. Our main findings include: (1) a shift from bi-hemispheric lateral frontal-temporal regions to left-lateralized regions in representing the current structured interpretation as a sentence unfolds, (2) a pattern of sequential activations in the left lateral temporal regions, updating the structured interpretation as syntactic ambiguity is resolved, and (3) the influence of lexical interpretative coherence activated in the right hemisphere over the resolved sentence structure represented in the left hemisphere.

      From our perspective, the advantages of using a LLM (or deep language model) like BERT are twofold. Conceptually, BERT structural measures offer a deep contextualized structural representation for any given sentence by integrating the multifaceted constraints unique to the specific contents described by the words within that sentence. Modelling this process on a word-by-word basis is challenging to achieve with behavioural or corpus-based metrics. Empirically, as demonstrated in our responses to the reviewers below, BERT measures show better performance compared to behavioural and corpus-based metrics in aligning with listeners’ neural activity. Moreover, when it comes to integrating multiple sources of constraints for achieving a coherent interpretation, BERT measures also show a better fit with the behavioural data of human listeners than corpus-based metrics.

      Taken together, we propose that LLMs, akin to other artificial neural networks (ANNs), can be considered as computational models for formulating and testing specific neuroscientific hypotheses, such as the “constraint-based hypothesis” of sentence processing in this study. However, we by no means overlook the importance of corpus-based and behavioural metrics. These metrics play a crucial role in interpreting and assessing whether and how ANNs stimulate human cognitive processes, a fundamental step in employing ANNs to gain new insights into the neural mechanisms of human cognition.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors investigate where and when brain activity is modulated by incoming linguistic cues during sentence comprehension. Sentence stimuli were designed such that incoming words had varying degrees of constraint on the sentence's structural interpretation as participants listened to them unfolding, i.e. due to varying degrees of verb transitivity and the noun's likelihood of assuming a specific thematic role. Word-by-word "online" structural interpretations for each sentence were extracted from a deep neural network model trained to reproduce language statistics. The authors relate the various metrics of word-by-word predicted sentence structure to brain data through a standard RSA approach at three distinct points of time throughout sentence presentation. The data provide convincing evidence that brain activity reflects preceding linguistic constraints as well as integration difficulty immediately after word onset of disambiguating material.

      We thank Reviewer #1 (hereinafter referred to as R1) for their recognition of the objectives of our study and the analytical approaches we have employed in this study.

      The authors confirm that their sentence stimuli vary in degree of constraint on sentence structure through independent behavioral data from a sentence continuation task. They also show a compelling correlation of these behavioral data with the online structure metric extracted from the deep neural network, which seems to pick up on the variation in constraints. In the introduction, the authors argue for the potential benefits of using deep neural networkderived metrics given that it has "historically been challenging to model the dynamic interplay between various types of linguistic and nonlinguistic information". Similarly, they later conclude that "future DLMs (...) may provide new insights into the neural implementation of the various incremental processing operations(...)".

      We appreciate R1’s positive comments on the design, quantitative modelling and behavioural validation of the sentence stimuli used in this experiment.

      By incorporating structural probing of a deep neural network, a technique developed in the field of natural language processing, into the analysis pipeline for investigating brain data, the authors indeed take an important step towards establishing advanced machine learning techniques for researching the neurobiology of language. However, given the popularity of deep neural networks, an argument for their utility should be carefully evidenced.

      We fully concur with R1 regarding the need for cautious evaluation and interpretation of deep neural networks’ utility. In fact, this perspective underpinned our decision to conduct extensive correlation analyses using both behavioural and corpus-based metrics to make sense of BERT metrics. These analyses were essential to interpret and validate BERT metrics before employing them to investigate listeners’ neural activity during speech comprehension. We do not in any way undermine the importance of behavioural or corpus-based data in studying language processing in the brain. On the contrary, as evidenced by our findings, these traditional metrics are instrumental in interpreting and guiding the use of metrics derived from LLMs.

      However, the data presented here don't directly test how large the benefit provided by this tool really is. In fact, the authors show compelling correlations of the neural network-derived metrics with both the behavioral cloze-test data as well as several (corpus-)derived metrics. While this is a convincing illustration of how deep language models can be made more interpretable, it is in itself not novel. The correlation with behavioral data and corpus statistics also raises the question of what is the additional benefit of the computational model? Is it simply saving us the step of not having to collect the behavioral data, not having to compute the corpus statistics or does the model potentially uncover a more nuanced representation of the online comprehension process? This remains unclear because we are lacking a direct comparison of how much variance in the neural data is explained by the neural network-derived metrics beyond those other metrics (for example the main verb probability or the corpusderived "active index" following the prepositional phrase).

      From our perspective, a primary advantage of using the neural network-derived metrics (or LLMs as computational models of language processing), compared to traditional behavioural and corpus-based metrics, lies in their ability to offer more nuanced, contextualized representations of natural language inputs. There seems no effective way of computationally capturing the distributed and multifaceted constraints within specific contexts until the current generation of LLMs came along. While it is feasible to quantify lexical properties or contextual effects based on the usage of specific words via corpora or behavioural tests, this method appears less effective in modelling the composition of meanings across more words on the sentence level. More critically, it struggles with capturing how various lexical constraints collectively yield a coherent structured interpretation.

      Accumulating evidence suggests that models designed for context prediction or next-word prediction, such as word2vec and LLMs, outperform classic count-based distributional semantic models (Baroni et al. 2014) in aligning with neural activity during language comprehension (Schrimpf et al. 2021; Caucheteux and King 2022). Relevant to this, we have conducted additional analyses to directly assess the additional variance of neural data explained by BERT metrics, over and above what traditional metrics account for. Specifically, using RSA, we re-tested model RDMs based on BERT metrics while controlling for the contribution from traditional metrics (via partial correlation).

      During the first verb (V1) epoch, we tested model RDMs of V1 transitivity based on data from either the behavioural pre-test (i.e., continuations following V1) or massive corpora. Contrasting sharply with the significant model fits observed for BERT V1 parse depth in bilateral frontal and temporal regions, the two metrics of V1 transitivity did not exhibit any significant effects (see Author response image 1).

      Author response image 1

      RSA model fits of BERT structural metrics and behavioural/corpus-based metrics in the V1 epoch. (upper) Model fits of BERT V1 parse depth (relevant to Appendix 1-figure 10A); (middle) Model fits of the V1 transitivity based on the continuation pre-rest conducted at the end of V1 (e.g., completing “The dog found …”); (bottom) Model fits of the V1 transitivity based on the corpus data (as described in Methods). Note that verb transitivity is quantified as the proportion of its transitive uses (i.e., followed by a direct object) relative to its intransitive uses.

      In the PP1 epoch, which was aligned to the onset of the preposition in the prepositional phrase (PP), we tested the probability of a PP continuation following V1 (e.g., the probability of a PP after “The dog found…”). While no significant results were found for PP probability, we have plotted the uncorrected results for PP probability (Author response image 2). These model fits have very limited overlap with those of BERT parse depth vector (up to PP1) in the left inferior frontal gyrus (approximately at 360 ms) and the left temporal regions (around 600 ms). It is noteworthy that the model fits of the BERT parse depth vector (up to PP1) remained largely unchanged even when PP probability was controlled for, indicating that the variance explained by BERT metrics cannot be effectively accounted for by the PP probability obtained from the human continuation pre-test.

      Author response image 2

      Comparison between the RSA model fits of BERT structural metrics and behavioural / corpusbased metrics in the PP1 epoch. (upper) Model fits of BERT parse depth vector up to PP1 (relevant to Figure 6B in the main text); (middle) Model fits of the probability of a PP continuation in the prerest conducted at the end of the first verb; (bottom) Model fits of BERT parse depth vector up to PP1 after partialling out the variance explained by PP probability.

      Finally, in the main verb (MV) epoch, we tested the model RDM based on the probability of a MV continuation following the PP (e.g., the probability after “The dog found in the park…”). When compared with the BERT parse depth vector (up to MV), we observed a similar effect in the left dorsal frontal regions (see Author response image 3). However, this effect did not survive after the whole-brain multiple comparison correction. Subsequent partial correlation analyses revealed that the MV probability accounted for only a small portion of the variance in neural data explained by the BERT metric, primarily the effect observed in the left dorsal frontal regions around 380 ms post MV onset. Meanwhile, the majority of the model fits of the BERT parse depth vector remained largely unchanged after controlling for the MV probability.

      Note that the probability of a PP/MV continuation reflect participants’ predictions based on speech input preceding the preposition (e.g., “The dog found…”) or the main verb (e.g., “The dog found in the park…”), respectively. In contrast, BERT parse depth vector is designed to represent the structure of the (partial) sentence in the speech already delivered to listeners, rather than to predict a continuation after it. Therefore, in the PP1 and MV epochs, we separately tested BERT parse depth vectors that included the preposition (e.g., “The dog found in…”) and the main verb (e.g., “The dog found in the park was…”) to accurately capture the sentence structure at these specific points in a sentence. Despite the differences in the nature of information captured by these two types of metrics, the behavioural metrics themselves did not exhibit significant model fits when tested against listeners’ neural activity.

      Author response image 3

      Comparison between the RSA model fits of BERT structural metrics and behavioural / corpusbased metrics in the MV epoch. (upper) Model fits of BERT parse depth vector up to MV (relevant to Figure 6C in the main text); (middle) Model fits of the probability of a MV continuation in the pre-rest conducted at the end of the prepositional phrase (e.g., “The dog found in the park …”); (bottom) Model fits of BERT parse depth vector up to MV after partialling out the variance explained by MV probability.

      Regarding the corpus-derived interpretative preference, we observed that neither the Active index nor the Passive index showed significant effects in the PP1 epoch. In the MV epoch, while significant model fits of the passive index were observed, which temporally overlapped with the BERT parse depth vector (up to MV) after the recognition point of the MV, the effects of these two model RDMs emerged in different hemispheres, as illustrated in Figures 6C and 8D in the main text. Consequently, we opted not to pursue further partial correlation analysis with the corpus-derived interpretative preference. Besides, as shown in Figure 8A, 8B and 8C, subject noun thematic role preference and non-directional index exhibit significant model fits in the PP1 or the MV epoch. Interesting, these effects lead corresponding effects of BERT metrics in the same epoch (see Figure 6B and 6C), suggesting that the overall structured interpretation emerges after the evaluation and integration of multifaceted lexical constraints.

      In summary, our findings indicate that, in comparison to corpus-derived or behavioural metrics, BERT structural metrics are more effective in explaining neural data, in terms of modelling both the unfolding sentence input (i.e., incremental BERT parse vector) and individual words (i.e., V1) within specific sentential contexts. This advantage of BERT metrics might be due to the hypothesized capacity of LLMs to capture more contextually rich representations. Such representations effectively integrate the diverse constraints present in a given sentence, thereby outperforming corpus-based metrics or behavioural metrics in this respect. Concurrently, it is important to recognize the significant role of corpus-based / behavioral metrics as explanatory variables. They are instrumental not only in interpreting BERT metrics but also in understanding their fit to listeners’ neural activity (by examining the temporal sequence and spatial distribution of model fits of these two types of metrics). Such an integrative approach allows for a more comprehensive understanding of the complex neural processes underpinning speech comprehension.

      With regards to the neural data, the authors show convincing evidence for early modulations of brain activity by linguistic constraints on sentence structure and importantly early modulation by the coherence between multiple constraints to be integrated. Those modulations can be observed across bilateral frontal and temporal areas as well as parts of the default mode network. The methods used are clear and rigorous and allow for a detailed exploration of how multiple linguistic cues are neurally encoded and dynamically shape the final representation of a sentence in the brain. However, at times the consequences of the RSA results remain somewhat vague with regard to the motivation behind different metrics and how they differ from each other. Therefore, some results seem surprising and warrant further discussion, for example: Why does the neural network-derived parse depth metric fit neural data before the V1 uniqueness point if the sentence pairs begin with the same noun phrase? This suggests that the lexical information preceding V1, is driving the results. However, given the additional results, we can already exclude an influence of subject likelihood for a specific thematic role as this did not model the neural data in the V1 epoch to a significant degree.

      As pointed out by R1, model fits of BERT parse depth vector (up to V1) and its mismatch for the active interpretation were observed before the V1 uniqueness point (Figures 6A and 6D). These early effects could be attributed to the inclusion of different subject nouns in the BERT parse depth vectors. In our MEG data analyses, RSA was performed using all LoTrans and HiTrans sentences. Each of the 60 sentence sets contained one LoTrans sentence and one HiTrans sentence, which resulted in a 120 x 120 neural data RDM for each searchlight ROI across the brain within each sliding time window. Although LoTrans and HiTrans sentences within the same sentence set shared the same subject noun, subject nouns varied across sentence sets. This variation was expected to be reflected in both the model RDM of BERT metrics and the data RDM, a point further clarified in the revised manuscript.

      In contrast, when employing a model RDM constructed solely from the BERT V1 parse depth, we observed model fits peaking precisely at the uniqueness point of V1 (see Appendix 1figure 10). It is important to note that BERT V1 parse depth is a contextualized metric influenced by the preceding subject noun, which could account for the effects of BERT V1 parse depth observed before the uniqueness point of V1.

      Relatedly, In Fig 2C it seems there are systematic differences between HiTrans and LoTrans sentences regarding the parse depth of determiner and subject noun according to the neural network model, while this is not expected according to the context-free parse.

      We thank R1 for pointing out this issue. Relevant to Figure 3D (Figure 2C in the original manuscript), we presented the distributions of BERT parse depth for individual words as the sentence unfolds in Appendix 1-figure 2. Our analysis revealed that the parse depth of the subject noun in high transitivity (HiTrans) and low transitivity (LoTrans) sentences did not significantly differ, except for the point at which the sentence reached V1 (two-tailed twosample t-test, P = 0.05).

      However, we observed a significant difference in the parse depth of the determiner between HiTrans and LoTrans sentences (two-tailed two-sample t-test, P < 0.05 for all results in Appendix 1-figure 2). Additionally, the parse depth of the determiner was found to covary with that of V1 as the input unfolded to different sentence positions (Pearson correlation, P < 0.05 for all plots in Appendix 1-figure 2). This difference, unexpected in terms of the contextfree (dependency) parse used for training the BERT structural probing model, might be indicative of a “leakage” of contextual information during the training of the structural probing model, given the co-variation between the determiner and V1 which was designed to be different in their transitivity in the two types of sentences.

      Despite such unexpected differences observed in the BERT parse depths of the determiner, we considered the two sentence types as one group with distributed features (e.g., V1 transitivity) in the RSA, and used the BERT parse depth vector including all words in the sentence input to construct the model RDMs. Moreover, as indicated in Appendix 1-figure 3, compared to the content words, the determiner contributed minimally to the incremental BERT parse depth vector. Consequently, the noted discrepancies in BERT parse depth of the determiner between HiTrans and LoTrans sentences are unlikely to significantly bias our RSA results.

      "The degree of this mismatch is proportional to the evidence for or against the two interpretations (...). Besides these two measures based on the entire incremental input, we also focused on Verb1 since the potential structural ambiguity lies in whether Verb1 is interpreted as a passive verb or the main verb." The neural data fits in V1 epoch differ in their temporal profile for the mismatch metrics and the Verb 1 depth respectively. I understand the "degree of mismatch" to be a measure of how strongly the neural network's hidden representations align with the parse depth of an active or passive sentence structure. If this is correct, then it is not clear from the text how far this measure differs from the Verb 1 depth alone, which is also indicating either an active or passive structure.

      Within the V1 epoch, we tested three distinct types of model RDMs based on BERT metrics: (1) The BERT parse depth vector, representing the neural network’s hidden representation of the incremental sentence structure including all words up to V1. (2) The mismatch metric for either the Active or Passive interpretation, calculated as the distance between the BERT parse depth vector and the context-free parse depth vector for each interpretation. (3) The BERT parse depth of V1, crucial in representing the preferred structural interpretation of the unfolding sentence given its syntactic role as either a passive verb or the main verb.

      While the BERT parse depth vector per se does not directly indicate a preferred interpretation, its mismatch with the context-free parse depth vectors of the two possible interpretations reveals the favoured interpretation, as significant neural fit is only anticipated for the mismatch with the interpretation being considered. The contextualized BERT depth of V1 is also indicative of the preferred structure given the context-free V1 parse depth corresponding to different syntactic roles, however, compared to the interpretative mismatch, it does not fully capture contributions from other words in the input. Consequently, we expected the interpretative mismatch and the BERT V1 depth to yield different results. Indeed, our analysis revealed that, although both metrics extracted from the same BERT layer (i.e., layer 13) demonstrated early RSA fits in the left fronto-temporal regions, the V1 depth showed relatively more prolonged effects with a notable peak occurring precisely at the uniqueness point of V1 (compare Figure 6C and Appendix 1-figure 10). These complementary results underscore the capability of BERT metrics to align with neural responses, in terms of both an incrementally unfolding sentence and a specific word within it.

      In previous studies, differences in neural activity related to distinct amounts of open nodes in the parse tree have been interpreted in terms of distinct working memory demands (Nelson et al. pnas 2017, Udden et al tics 2020). It seems that some of the metrics, for example the neural network-derived parse depth or the V1 depth may be similarly interpreted in the light of working memory demands. After all, during V1 epoch, the sentences do not only differ with respect to predicted sentence structure, but also in the amount of open nodes that need to be maintained. In the discussion, however, the authors interpret these results as "neural representations of an unfolding sentence's structure".

      We agree with the reviewer that the Active and Passive interpretations differ in terms of the number of open nodes before the actual main verb is heard. Given the syntactic ambiguity in our sentence stimuli (i.e., LoTrans and Hi Trans sentences), it is infeasible to determine the exact number of open nodes in each sentence as it unfolds. Nevertheless, the RSA fits observed in the dorsal lateral frontal regions could be indicative of the varying working memory demands involved in building the structured interpretations across sentences. We have added this perspective in the revised manuscript.

      Reviewer #2 (Public Review):

      This article is focused on investigating incremental speech processing, as it pertains to building higher-order syntactic structure. This is an important question because speech processing in general is lesser studied as compared to reading, and syntactic processes are lesser studied than lower-level sensory processes. The authors claim to shed light on the neural processes that build structured linguistic interpretations. The authors apply modern analysis techniques, and use state-of-the-art large language models in order to facilitate this investigation. They apply this to a cleverly designed experimental paradigm of EMEG data, and compare neural responses of human participants to the activation profiles in different layers of the BERT language model.

      We thank Reviewer #2 (hereinafter referred to as R2) for the overall positive remarks on our study.

      Strengths:

      (1) The study aims to investigate an under-explored aspect of language processing, namely syntactic operations during speech processing

      (2) The study is taking advantage of technological advancements in large language models, while also taking linguistic theory into account in building the hypothesis space

      (3) The data combine EEG and MEG, which provides a valuable spatio-temporally resolved dataset

      (4) The use of behavioural validation of high/low transitive was an elegant demonstration of the validity of their stimuli

      We thank R2 for recognizing and appreciating the motivation and the methodology employed in this study.

      Weaknesses:

      (1) The manuscript is quite hard to understand, even for someone well-versed in both linguistic theory and LLMs. The questions, design, analysis approach, and conclusions are all quite dense and not easy to follow.

      To address this issue, we have made dedicated efforts to clarify the key points in our study. We also added figures to visualize our experimental design and methods (see Figure 1, Figure 3C and Figure 5 in the revised main text). We hope that these revisions have made the manuscript more comprehensible and straightforward for the readers.

      (2) The analyses end up seeming overly complicated when the underlying difference between sentence types is a simple categorical distinction between high and low transitivity. I am not sure why tree depth and BERT are being used to evaluate the degree to which a sentence is being processed as active or passive. If this is necessary, it would be helpful for the authors to motivate this more clearly.

      Indeed, as pointed by R2, the only difference between LoTrans and HiTrans sentences is the first verb (V1), whose transitivity is crucial for establishing an initial preference for either an Active or a Passive interpretation as the sentence unfolds. Nonetheless, in line with the constraint-based approach to sentence processing and supported by previous research findings, a coherent structured interpretation of a sentence is determined by the combined constraints imposed by all words within that sentence. In our study, the transitivity of V1 alone is insufficient to fully explain the interpretative preference for the sentence structure. The overall sentence-level interpretation also depends on the thematic role preference of the subject noun – its likelihood of being an agent performing an action or a patient receiving the action.

      This was evident in our findings, as shown in Author response image 1 above, where the V1 transitivity based on corpus or behavioural data did not fit to the neural data during the V1 epoch. In contrast, BERT structural measures [e.g., BERT parse depth vector (up to V1) and BERT V1 parse depth] offered contextualized representations that are presumed to integrate various lexical constraints present in each sentence. These BERT metrics exhibited significant model fits for the same neural data in the V1 epoch. Besides, a notable feature of BERT is its bi-directional attention mechanism, which allows for the dynamic updating of an earlier word’s representation as more of the sentence is heard, which is also changeling to achieve with corpus or behavioural metrics. For instance, the parse depth of the word “found” in the BERT parse depth vector for “The dog found…” differs from its parse depth in the vector for “The dog found in…”. This feature of BERT is particularly advantageous for investigating the dynamic nature of structured interpretation during speech comprehension, as it stimulates the continual updating of interpretation that occurs as a sentence unfolds (as shown by Figure 7 in the main text). We have elaborated on the rationale for employing BERT parse depth in this regard in the revised manuscript.

      (3) The main data result figures comparing BERT and the EMEG brain data are hard to evaluate because only t-values are provided, and those, only for significant clusters. It would be helpful to see the full 600 ms time course of rho values, with error bars across subjects, to really be able to evaluate it visually. This is a summary statistic that is very far away from the input data

      We appreciate this suggestion from R2. In the Appendix 1 of the revised manuscript, we have provided individual participants’ Spearman’s rho time courses for every model RDM tested in all the three epochs (see Appendix 1-figures 8-10 & 14-15). Note that RSA was conducted in the source-localized E/MEG, it is infeasible to plot the rho time course for each searchlight at one of the 8196 vertices on the cortical surface mesh. Instead, we plotted the rho time course of each ROI reported in the original manuscript. These plots complement the time-resolved heatmap of peak t-value in Figures 6-8 in the main text.

      (4) Some details are omitted or not explained clearly. For example, how was BERT masked to give word-by-word predictions? In its default form, I believe that BERT takes in a set of words before and after the keyword that it is predicting. But I assume that here the model is not allowed to see linguistic information in the future.

      In our analyses, we utilized the pre-trained version of BERT (Devlin et al. 2019) as released by Hugging Face (https://github.com/huggingface). It is noteworthy that BERT, as described in the original paper, was initially trained using the Cloze task, involving the prediction of masked words within an input. In our study, however, we neither retrained nor fine-tuned the pre-trained BERT model, nor did we employ it for word-by-word prediction tasks. We used BERT to derive the incremental representation of a sentence’s structure as it unfolded word-by-word.

      Specifically, we sequentially input the text of each sentence into the BERT, akin to how a listener would receive the spoken words in a sentence (see Figure 3C in the main text). For each incremental input (such as “The dog found”), we extracted the hidden representations of each word from BERT. These representations were then transformed into their respective BERT parse depths using a structural probing model (which was trained using sentences with annotated dependency parse tress from the Penn Treebank Dataset). The resulting BERT parse depths were subsequently used to create model RDMs, which were then tested against neural data via RSA.

      Crucially, in our approach, BERT was not exposed to any future linguistic information in the sentence. We never tested BERT parse depth of a word in an epoch where this word had not been heard by the listener. For example, the three-dimensional BERT parse depth vector for “The dog found” was tested in the V1 epoch corresponding to “found”, while the fourdimensional BERT parse depth vector for “The dog found in” was tested in the PP1 epoch of “in”.

      How were the auditory stimuli recorded? Was it continuous speech or silences between each word? How was prosody controlled? Was it a natural speaker or a speech synthesiser?

      Consistent with our previous studies (Kocagoncu et al. 2017; Klimovich-Gray et al. 2019; Lyu et al. 2019; Choi et al. 2021), all auditory stimuli in this study were recorded by a female native British English speaker, ensuring a neutral intonation throughout. We have incorporated this detail into the revised version of our manuscript for clarity.

      It is difficult for me to fully assess the extent to which the authors achieved their aims, because I am missing important information about the setup of the experiment and the distribution of test statistics across subjects.

      We are sorry for the previously omitted details regarding the experimental setup and the results of individual participants. As detailed in our responses above, we have now included the necessary information in the revised manuscript.

      Reviewer #3 (Public Review):

      Syntactic parsing is a highly dynamic process: When an incoming word is inconsistent with the presumed syntactic structure, the brain has to reanalyze the sentence and construct an alternative syntactic structure. Since syntactic parsing is a hidden process, it is challenging to describe the syntactic structure a listener internally constructs at each time moment. Here, the authors overcome this problem by (1) asking listeners to complete a sentence at some break point to probe the syntactic structure mentally constructed at the break point, and (2) using a DNN model to extract the most likely structure a listener may extract at a time moment. After obtaining incremental syntactic features using the DNN model, the authors analyze how these syntactic features are represented in the brain using MEG.

      We extend our thanks to Reviewer #3 (referred to as R3 below) for recognizing the methods we used in this study.

      Although the analyses are detailed, the current conclusion needs to be further specified. For example, in the abstract, it is concluded that "Our results reveal a detailed picture of the neurobiological processes involved in building structured interpretations through the integration across multifaceted constraints". The readers may remain puzzled after reading this conclusion.

      Following R3’s suggestion, we have revised the abstract and refined our conclusions in the main text to explicitly highlight our principal findings. These include: (1) a shift from bihemispheric lateral frontal-temporal regions to left-lateralized regions in representing the current structured interpretation as a sentence unfolds, (2) a pattern of sequential activations in the left lateral temporal regions, updating the structured interpretation as syntactic ambiguity is resolved, and (3) the influence of lexical interpretative coherence activated in the right hemisphere over the resolved sentence structure represented in the left hemisphere.

      Similarly, for the second part of the conclusion, i.e., "including an extensive set of bilateral brain regions beyond the classical fronto-temporal language system, which sheds light on the distributed nature of language processing in the brain." The more extensive cortical activation may be attributed to the spatial resolution of MEG, and it is quite well acknowledged that language processing is quite distributive in the brain.

      We fully agree with R3 on the relatively low spatial resolution of MEG. Our emphasis was on the observed peak activations in specific regions outside the classical brain areas related to language processing, such as the precuneus in the default mode network, which are unlikely to be artifacts due to the spatial resolution of MEG. We have revised the relevant contents in the Abstract.

      The authors should also discuss:

      (1) individual differences (whether the BERT representation is a good enough approximation of the mental representation of individual listeners).

      To address the issue of individual differences which was also suggested by R2, we added individual participants’ model fits in ROIs with significant effects of BERT representations in Appendix 1 of the revised manuscript (see Appendix 1-figures 8-10 & 14-15).

      (2) parallel parsing (I think the framework here should allow the brain to maintain parallel representations of different syntactic structures but the analysis does not consider parallel representations).

      In the original manuscript, we did not discuss parallel parsing because the methods we used does not support a direct test for this hypothesis. In our analyses, we assessed the preference for one of two plausible syntactic structures (i.e., Active and Passive interpretations) based on the BERT parse vector of an incremental sentence input. This assessment was accomplished by calculating the mismatch between the BERT parse depth vector and the context-free dependency parse depth vector representing each of the two structures. However, we only observed one preferred interpretation in each epoch (see Figures 6D-6F) and did not find evidence supporting the maintenance of parallel representations of different syntactic structures in the brain. Nevertheless, in the revised manuscript, we have mentioned this possibility, which could be properly explored in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Consider fitting the behavioral data from the continuation pre-test to the brain data in order to illustrate the claimed advantage of using a computational model beyond more traditional methods.

      Following R1’s suggestion, we conducted additional RSA using more behavioural and corpusbased metrics. We then directly compared the fits of these traditional metrics to brain data with those of BERT metrics in the same epoch to provide empirical evidence for the advantage of using a computational model like BERT to explain listeners’ neural data (see Appendix 1figures 11-13).

      Clarify the use of "neural representations: For a clearer assessment of the results, please discuss your results (especially the fits with BERT parse depth) in terms of the potential effects of distinct sentence structure expectations on working memory demands and make clear where these can be disentangled from neural representations of an unfolding sentence's structure.

      In the revised manuscript, we have noted the working memory demands associated with the online construction of a structured interpretation during incremental speech comprehension. As mentioned in our response to the relevant comment by R1 above, our experimental paradigm is not suitable for quantitatively assessing working memory demands since it is difficult to determine the exact number of open nodes for our stimuli with syntactic ambiguity before the disambiguating point (i.e., the main verb) is reached. Therefore, while we can speculate the potential contribution of varying working memory demands (which might correlate with BERT V1 parse depth) to RSA model fits, we think it is not possible to disentangle their effects from the neural representation of an unfolding sentence’s structure modelled by BERT parse depths in our current study.

      Please add in methods a description of how the uniqueness point was determined.

      In this study, we defined the uniqueness point of a word as the earliest point in time when this word can be fully recognized after removing all of its phonological competitors. To determine the uniqueness point for each word of interest, we first identified the phoneme by which this word can be uniquely recognized according to CELEX (Baayen et al. 1993). Then, we manually labelled the offset of this phoneme in the auditory file of the spoken sentence in which this word occurred. We have added relevant description of how the uniqueness point was determined in the Methods section of the revised manuscript.

      I found the name "interpretative mismatch" very opaque. Maybe instead consider "preference".

      We chose to use the term “interpretative mismatch” rather than “preference” based on the operational definition of this metric, which is the distance between a BERT parse depth vector and one of the two context-free parse depth vectors representing the two possible syntactic structures, so that a smaller distance value (or mismatch) signifies a stronger preference for the corresponding interpretation.

      In the abstract, the authors describe the cognitive process under investigation as one of incremental combination subject to "multi-dimensional probabilistic constraint, including both linguistic and non-linguistic knowledge". The non-linguistic knowledge is later also referred to as "broad world knowledge". These terms lack specificity and across studies have been operationalized in distinct ways. In the current study, this "world knowledge" is operationalized as the likelihood of a subject noun being an agent or patient and the probability for a verb to be transitive, so here a more specific term may have been the "knowledge about statistical regularities in language".

      In this study, we specifically define “non-linguistic world knowledge” as the likelihood of a subject noun assuming the role of an agent or patient, which relates to its thematic role preference. This type of knowledge is primarily non-linguistic in nature, as exemplified by comparing nouns like “king” and “desk”. Although it could be reflected by statistical regularities in language, thematic role preference hinges more on world knowledge, plausibility, or real-world statistics. In contrast, “linguistic knowledge” in our study refers to verb transitivity, which focuses on the grammatically correct usage of a verb and is tied to statistical regularities within language itself. In the revised manuscript, we have provided clearer operational definitions for these two concepts and have ensured consistent usage throughout the text.

      Please spell out what exactly the "constraint-based hypothesis" is (even better, include an explicit description of the alternative hypothesis?).

      The “constraint-based hypothesis”, as summarized in a review (McRae and Matsuki 2013), posits that various sources of information, referred to as “constraints”, are simultaneously considered by listeners during incremental speech comprehension. These constraints encompass syntax, semantics, knowledge of common events, contextual pragmatic biases, and other forms of information gathered from both intra-sentential and extra-sentential context. Notably, there is no delay in the utilization of these multifaceted constraints once they become available, neither is a fixed priority assigned to one type of constraint over another. Instead, a diverse set of constraints is immediately brought into play for comprehension as soon as they become available as the relevant spoken word is recognized.

      An alternative hypothesis, proposed earlier, is the two-stage garden path model (Frazier and Rayner 1982; Frazier 1987). According to this model, there is an initial parsing stage that relies solely on syntax. This is followed by a second stage where all available information, including semantics and other knowledge, is used to assess the plausibility of the results obtained in the first-stage analysis and to conduct re-analysis if necessary (McRae and Matsuki 2013). In the Introduction of our revised manuscript, we have elaborated on the “constraint-based hypothesis” and mentioned this two-stage garden path model as its alternative.

      Fig1 B&C: In order to make the data more interpretable, could you estimate how many possible grammatical structural configurations there are / how many different grammatical structures were offered in the pretest, and based on this what would be the "chance probability" of choosing a random structure or for example show how many responded with a punctuation vs alternative continuations?

      In our analysis of the behavioural results, we categorized the continuations provided by participants in the pre-test at the offset of Verb1 (e.g., “The dog found/walked …”) into 6 categories, including DO (direct object), INTRANS (intransitive), PP (prepositional phrase), INF (infinitival complement), SC (sentential complement) and OTHER (gerund, phrasal verb, etc.).

      Author response table 1.

      Similarly, we categorized the continuations that followed the offset of the prepositional phrase (e.g., “The dog found/walked in the park …”) into 7 categories, including MV (main verb), END (i.e., full stop), PP (prepositional phrase), INF (infinitival complement), CONJ (conjunction), ADV (adverb) and OTHER (gerund, sentential complement, etc.).

      Author response table 2.

      It is important to note that the results of these two pre-tests, including the types of continuations and their probabilities, exhibited considerable variability between and within each sentence type (see also Figures 2B and 2C).

      Typo: "In addition, we found that BERT structural interpretations were also a correlation with the main verb probability" >> correlated instead of correlation.

      We apologize for this typo. We have conducted a thorough proofreading to identify and correct any other typos present in the revised manuscript.

      "In this regard, DLMs excel in a flexible combination of different types of features embedded in their rich internal representations". What are the "different types", spell out at least some examples for illustration.

      We have rephrased this sentence to give a more detailed description.

      Fig 2 caption: "Same color scheme as in (A)" >> should be 'as in (B)'?, and later A instead of B.

      We are sorry for this typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      My biggest recommendation is to make the paper clearer in two ways: (i) writing style, by hand-holding the reader through each section, and the motivation for each step, in both simple and technical language; (ii) schematic visuals, of the experimental design and the analysis. A schematic of the main experimental manipulation would be helpful, rather than just listing two example sentences. It would also be helpful to provide a schematic of the experimental setup and the analysis approach, so that people can refer to a visual aid in addition to the written explanation. For example, it is not immediately clear what is being correlated with what - I needed to go to the methods to understand that you are doing RSA across all of the trials. Make sure that all of the relevant details are explained, and that you motivate each decision.

      We thank R2 for these suggestions. In the revised manuscript, we have enhanced the clarity of the main text by providing a more detailed explanation of the motivation behind each analysis and the interpretation of the corresponding results. Additionally, in response to R2’s recommendation, we have added a few figures, including the illustration of the experimental design (Figure 1) and methods (see Figure 3C and Figure 5).

      Different visualisation of neural results - The main data result figures comparing BERT and the EMEG brain data are hard to evaluate because only t-values are provided, and those, are only for significant clusters. It would be helpful to see the full 600 ms time course of rho values, with error bars across subjects, to really be able to evaluate it visually.

      In the original manuscript, we opted to present t-value time courses for the sake of simplicity in illustrating the fits of the 12 model RDMs tested in 3 epochs. Following R2’s suggestion, we have included the ROI model fit time courses of each model RDM for all individual participants, as well as the mean model fit time course with standard error in Appendix 1figures 8-10 & 14-15.

      How are the authors dealing with prosody differences that disambiguate syntactic structures, that BERT does not have access to?

      All spoken sentence stimuli were recorded by a female native British English speaker, ensuring a neutral intonation throughout. Therefore, prosody is unlikely to vary systematically between different sentence types or be utilized to disambiguate syntactic structures. Sample speech stimuli have been made available in the following repository: https://osf.io/7u8jp/.

      A few writing errors: "was kept updated every time"

      We are sorry for the typos. We have conducted proof-reading carefully to identify and correct typos throughout the revised manuscript.

      Explain why the syntactic trees have "in park the" rather than "in the park"?

      The dependency parse trees (e.g., Figure 3A) were generated according to the conventions of dependency parsing (de Marneffe et al. 2006).

      Why are there mentions of the multiple demand network in the results? I'm not sure where this comes from.

      The mention of the multiple demand network was made due to the significant RSA fits observed in the dorsal lateral prefrontal regions and the superior parietal regions, which are parts of the multiple demand network. This observation was particularly notable for the BERT parse depth vector in the main verb epoch when the potential syntactic ambiguity was being resolved. It is plausible that these effects observed are partly attributed to the varying working memory demands required to maintain the “opening nodes” in the different syntactic structures being considered by listeners at this point in the sentence.

      Reviewer #3 (Recommendations For The Authors):

      The study first asked human listeners to complete partial sentences, and incremental parsing of the partial sentences can be captured based on the completed sentences. This analysis is helpful and I wonder if the behavioral data here are enough to model the E/MEG responses. For example, if I understood it correctly, the parse depth up to V1 can be extracted based on the completed sentences and used for the E/MEG analysis.

      The behavioural data alone do not suffice to model the E/MEG data. As we elucidated in our responses to R1, we employed three behavioural metrics derived from the continuation pretests. These metrics include the V1 transitivity and the PP probability, given the continuations after V1 (e.g., after “The dog found…”), as well as the MV probability, given the continuations after the prepositional phrase (e.g., after “The dog found in the park…”). These metrics aimed to capture participants’ prediction based on their structured interpretations at various positions in the sentence. However, none of these behavioural metrics yielded significant model fits to the listeners’ neural activity, which sharply contrasts with the substantial model fits of the BERT metrics in the same epochs. Besides, we also tried to model V1 parse depth as a weighted average based on participants’ continuations. As shown in Figure 3A, V1 parse depth is 0 in the active interpretation, 2 in the passive interpretation, while the parse depth of the determiner and the subject noun does not differ. However, this continuation-based V1 parse depth [i.e., 0 × Probability(active interpretation) + 2 × Probability(passive interpretation)] did not show significant model fits.

      Related to this point, I wonder if the incremental parse extracted using BERT is consistent with the human results (i.e., parsing extracted based on the completed sentences) on a sentence-bysentence basis.

      In fact, we did provide evidence showing the alignment between the incremental parse extracted using BERT and the human interpretation for the same partial sentence input (see Figure 4 in the main text and Appendix 1-figures 4-6).

      Furthermore, in Fig 1d, is it possible to calculate how much variance of the 3 probabilities is explained by the 4 factors, e.g., using a linear model? If these factors can already explain most of the variance of human parsing, is it possible to just use these 4 factors to explain neural activity?

      Following R3’s suggestion, we have conducted additional linear modelling analyses to compare the extent to which human behavioural data can be explained by corpus metrics and BERT metrics separately. Specifically, for each of the three probabilities obtained in the pretests (i.e., DO, PP, and MV), we constructed two linear models. One model utilized the four corpus-based metrics as regressors (i.e., SN agenthood, V1 transitivity, Passive index, and Active index), while the other model used BERT metrics as regressors (i.e., BERT parse depth of each word up to V1 from layer 13 for DO/PP probability and BERT parse depth of each word up to the end of PP from layer 14 for MV probability, consistent with the BERT layers reported in Figure 6).

      As shown in the table below, corpus metrics demonstrate a more effective fit than BERT metrics for predicting the DO/PP probability. The likelihood of a DO/PP continuation is chiefly influenced by the lexical syntactic property of V1 (i.e., transitivity), and appears to rely less on contextual factors. Since V1 transitivity is explicitly included as one of the corpus metrics, it is thus expected to align more closely with the DO/PP probability compared to BERT metrics, primarily reflecting transitive versus intransitive verb usage.

      Author response table 3.

      Actually, BERT V1 parse depth was not correlated with V1 transitivity when the sentence only unfolds to V1 (see Appendix 1-figure 6). This lack of correlation may stem from the fact that the BERT probing model was designed to represent the structure of a (partially) unfolded sentence, rather than to generate a continuation or prediction. Moreover, V1 transitivity alone does not conclusively determine the Active or Passive interpretation by the end of V1. For instance, both transitive and intransitive continuations after V1 are compatible with an Active interpretation. Consequently, the initial preference for an Active interpretation (as depicted by the early effects before V1 was recognized in Figure 6D), might be predominantly driven by the animate subject noun (SN) at the beginning of the sentence, a word order cue in languages like English (Mahowald et al. 2023).

      In contrast, when assessing the probability of a MV following the PP (e.g., after “The dog found in the park ...”), BERT metrics significantly outperformed corpus metrics in terms of fitting the MV probability. Although SN thematic role preference and V1 transitivity were designed to be the primary factors constraining the structured interpretation in this experiment, we could only obtain their context-independent estimates from corpora (i.e., considering all contexts). Additionally, despite Active/Passive index (a product of these two factors) are correlated with the MV probability, it may oversimplify the task of capturing the specific context of a given sentence. Furthermore, the PP following V1 is also expected to influence the structured interpretation. For instance, whether “in the park” is a more plausible scenario for people to find a dog or for a dog to find something. Thus, this finding suggests that the corpus-based metrics are not as effective as BERT in representing contextualized structured interpretations (for a longer sentence input), which might require the integration of constraints from every word in the input.

      In summary, corpus-based metrics excel in explaining human language behaviour when it primarily relies on specific lexical properties. However, they significantly lag behind BERT metrics when more complex contextual factors come into play at the same time. Regarding their performance in fitting neural data, among the four corpus-based metrics, we only observed significant model fits for the Passive index in the MV epoch when the intended structure for a Passive interpretation was finally resolved, while the other three metrics did not exhibit significant model fits in any epoch. Note that subject noun thematic role preference did fit neural data in the PP and MV epochs (Figure 8A and 8B). In contrast, the incremental BERT parse depth vector exhibited significant model fits in all three epochs we tested (i.e., V1, PP1, and MV).

      To summarize, I feel that I'm not sure if the structural information BERT extracts reflect the human parsing of the sentences, especially when the known influencing factors are removed.

      Based on the results presented above and, in the manuscript, BERT metrics align closely with human structured interpretations in terms of both behavioural and neural data. Furthermore, they outperform corpus-based metrics when it comes to integrating multiple constraints within the context of a specific sentence as it unfolds.

      Minor issues:

      Six types of sentences were presented. Three types were not analyzed, but the results for the UNA sentences are not reported either.

      In this study, we only analysed two out of the six types of sentences, i.e., HiTrans and LoTrans sentences. The remaining four types of sentences were included to ensure a diverse range of sentence structures and avoid potential adaption the same syntactic structure.

      Fig 1b, If I understood it correctly, each count is a sentence. Providing examples of the sentences may help. Listing the sentences with the corresponding probabilities in the supplementary materials can also help.

      Yes, each count in Figure 2B (Figure 1B in the original manuscript) is a sentence. All sentence stimuli and results of pre-tests are available in the following repository https://osf.io/7u8jp/.

      "trajectories of individual HiTrans and LoTrans sentences are considerably distributed and intertwined (Fig. 2C, upper), suggesting that BERT structural interpretations are sensitive to the idiosyncratic contents in each sentence." It may also mean the trajectories are noisy.

      We agree with R3 that there might be unwanted noise underlying the distributed and intertwined BERT parse depth trajectories of individual sentences. Meanwhile, it is also important to note that the correlation between BERT parse depths and lexical constraints of different words at the same position across sentences is statistically supported.

      References

      Baayen RH, Piepenbrock R, van H R. 1993. The {CELEX} lexical data base on {CD-ROM}. Baroni M, Dinu G, Kruszewski G. 2014. Don't count, predict! A systematic comparison of contextcounting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol 1.238-247.

      Caucheteux C, King JR. 2022. Brains and algorithms partially converge in natural language processing. Communications Biology. 5:134.

      Choi HS, Marslen-Wilson WD, Lyu B, Randall B, Tyler LK. 2021. Decoding the Real-Time Neurobiological Properties of Incremental Semantic Interpretation. Cereb Cortex. 31:233-247.

      de Marneffe M-C, MacCartney B, Manning CD editors. Generating typed dependency parses from phrase structure parses, Proceedings of the 5th International Conference on Language Resources and Evaluation; 2006 May 22-28, 2006; Genoa, Italy:European Language Resources Association. 449-454 p.

      Devlin J, Chang M-W, Lee K, Toutanova K editors. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 June 2-7, 2019; Minneapolis, MN, USA:Association for Computational Linguistics. 4171-4186 p.

      Frazier L. 1987. Syntactic processing: evidence from Dutch. Natural Language & Linguistic Theory. 5:519-559.

      Frazier L, Rayner K. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology. 14:178-210.

      Klimovich-Gray A, Tyler LK, Randall B, Kocagoncu E, Devereux B, Marslen-Wilson WD. 2019. Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context. Journal of Neuroscience. 39:519-527.

      Kocagoncu E, Clarke A, Devereux BJ, Tyler LK. 2017. Decoding the cortical dynamics of soundmeaning mapping. Journal of Neuroscience. 37:1312-1319.

      Lyu B, Choi HS, Marslen-Wilson WD, Clarke A, Randall B, Tyler LK. 2019. Neural dynamics of semantic composition. Proceedings of the National Academy of Sciences of the United States of America. 116:21318-21327.

      Mahowald K, Diachek E, Gibson E, Fedorenko E, Futrell R. 2023. Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages. Cognition. 241:105543.

      McRae K, Matsuki K. 2013. Constraint-based models of sentence processing. Sentence processing. 519:51-77.

      Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, Kanwisher N, Tenenbaum JB, Fedorenko E. 2021. The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America. 118:e2105646118.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The brain-machine interface used in this study differs from typical BMIs in that it's not intended to give subjects voluntary control over their environment. However, it is possible that rats may become aware of their ability to manipulate trial start times using their neural activity. Is there any evidence that the time required to initiate trials on high-coherence or low-coherence trials decreases with experience?

      This is a great question. First, we designed the experiment to avoid this possibility. Rats were experienced on the sequence of the automatic maze both pre and post implantation (totaling to weeks of pre-training and habituation). As such, the majority of the trials ever experienced by the rat were not controlled by their neural activity. During BMI experimentation, only 10% of trials were triggered during high coherence states and 10% for low coherence states, leaving ~80% of trials not controlled by their neural activity. We also implemented a pseudo-randomized trial sequence. When considered together, we specifically designed this experiment to avoid the possibility that rats would actively use their neural activity to control the maze.

      Second, we had a similar question when collecting data for this manuscript and so we conducted a pilot experiment. We took 3 rats from experiment #1 (after its completion) and we required them to perform “forced-runs” over the course of 3-4 days, a task where rats navigate to a reward zone and are rewarded with a chocolate pellet. The trajectory on “forced-runs” is predetermined and rats were always rewarded for navigating along the predetermined route. Every trial was initiated by strong mPFC-hippocampal theta coherence. We were curious as to whether time-to-trial-onset would decrease if we repeatedly paired trial onset to strong mPFC-hippocampal theta coherence. 1 out of 3 rats (rat 21-35) showed a significant correlation between time-to-trial onset and trial number, indicating that our threshold for strong mPFC-hippocampal theta coherence was being met more quickly with experience (Figure R1A). When looking over sessions and rats, there was considerable variability in the magnitude of this correlation and sometimes even the direction (Figure R1B). As such, the degree to which rat 21-35 was aware of controlling the environment by reaching strong mPFC-hippocampal theta coherence is unclear, but this question requires future experimentation.

      Author response image 1.

      Strong mPFC-hippocampal theta coherence was used to control trial onset for the entirety of forced-navigation sessions. Time-to-trial onset is a measurement of how long it took for strong coherence to be met. A) Time-to-trial onset was averaged across sessions for each rat, then plotted as a function of trial number (within-session experience on the forced-runs task). Rat 21-35 showed a significant negative correlation between time-to-trial onset and trial number, indicating that time-to-coherence reduced with experience. The rest of the rats did not display this effect. B) Correlation between trial-onset and trial number (y-axis; see A) across sessions (x-axis). A majority of sessions showed a negative correlation between time-to-trial onset and trial number, like what was seen in (A), but the magnitude and sometimes direction of this effect varied considerably even within an animal.

      Is there any evidence that rats display better performance on trials with random delays in which HPC-PFC coherence was naturally elevated?

      This question is now addressed in Extended Figure 5 and discussed in the section titled “strong prefrontal-hippocampal theta coherence leads to correct choices on a spatial working memory task”.

      The introduction frames this study as a test of the "communication through coherence" hypothesis. In its strongest form, this hypothesis states that oscillatory synchronization is a pre-requisite for inter-areal communication, i.e. if two areas are not synchronized, they cannot transfer information. Recent experimental evidence shows this relationship is more likely inverted-coherence is a consequence of inter-areal interactions, rather than a cause. See Schneider et al. (DOI: 10.1016/j.neuron.2021.09.037) and Vinck et al. (10.1016/j.neuron.2023.03.015) for a more in-depth explanation of this distinction. The authors should expand their treatment of this hypothesis in light of these findings.

      Our introduction and discussions have sections dedicated to these studies now.

      Figure 6 - It would be much more intuitive to use the labels "Rat 1", "Rat 2", and "Rat 3"; the "21-4X" identifiers are confusing.

      This was corrected in the paper.

      Figure 6C - The sub-plots within this figure are rather small and difficult to interpret. The figure would be easier to parse if the data were presented as a heatmap of the ratio of theta power during blue vs. red stim, with each pixel corresponding to one channel.

      This suggestion was implemented in the paper. See Fig 6C. Extended Fig. 8 now shows the power spectra as a function of recording shank and channel.

      Ext. Figure 2B - What happens during an acquisition failure? Instead of "Amount of LFP data," consider using "Buffer size".

      Corrected.

      Ext. Figure 2D-E - Instead of "Amount of data," consider using "Window size"

      Referred to as buffer size.

      Ext. Figure 2E - y-axis should extend down to 4 Hz. Are all of the last four values exactly at 8 Hz?

      Yes. Values plateau at 8Hz. These data represent an average over ~50 samples.

      Ext. Figure 2F - consider moving this before D/E, since those panels are summaries of panel F

      Corrected.

      Ext. Figure 4A - ANOVA tells you that accuracy is impacted by delay duration, but not what that impact is. A post-hoc test is required to show that long delays lead to lower accuracy than short ones. Alternatively, one could compute the correlation between delay duration and proportion correctly for each mouse, and look for significant negative values.

      We included supplemental analyses in Extended Fig. 4

      Reviewer #2 (Recommendations For The Authors):

      The authors should replace terms that suggest a causal relationship between PFC-HPC synchrony and behavior, such as 'leads to', 'biases', and 'enhances' with more neutral terms.

      Causal implications were toned down and wherever “leads” or “led” remains, we specifically mean in the context of coherence being detected prior to a choice being made.

      The rationale for the analysis described in the paragraph starting on line 324, and how it fits with the preceding results, was not clear to me. The authors also write at the start of this paragraph "Given that mPFC-hippocampal theta coherence fluctuated in a periodical manner (Extended Fig. 5B)", but this figure only shows example data from 2 trials.

      The reviewer is correct. While we point towards 3 examples in the manuscript now, we focused this section on the autocorrelation analysis, which did not support our observation as we noticed a rather linear decay in correlation over time. As such, the periodicity observed was almost certainly a consequence of overlapping data in the epochs used to calculate coherence rather than intrinsic periodicity.

      Shortly after the start of the results section (line 112), the authors go into a very detailed description of how they validated their BMI without first describing what the BMI actually does. This made this and the subsequent paragraphs difficult to follow. I suggest the authors start with a general description of the BMI (and the general experiment) before going into the details.

      Corrected. See first paragraph of “Development of a closed-loop…”.

      In Figure 2C, as expected, around the onset of 'high' coherence trials, there is an increase in theta coherence but this appears to be very transient. However, it is unclear what the heatmap represents: is it a single trial, single session, an average across animals, or something else? In Figure 3F, however, the increase appears to be much more sustained.

      The sample size was rats for every panel in this figure. This was clarified at the end of Fig. 3.

      In Figure 2D, it was not clear to me what units of measurement are used when the averages and error bars are calculated. What is the 'n' here? Animals or sessions? This should be made clear in this figure as well as in other figures.

      The sample size is rats. This is now clarified at the end of Fig 2.

      Describing the study of Jones and Wilson (2005), the authors write: "While foundational, this study treated the dependent variable (choice accuracy) as independent to test the effect of choice outcome on task performance." (line 83) It was not clear to me what is meant by "dependent" and "independent" here. Explaining this more clearly might clarify how the authors' study goes beyond this and other previous studies.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Reviewer #3 (Recommendations For The Authors):

      As explained in the public review, my comments mainly concern the interpretation of the experimental paradigm and its link with previous findings. I think modifying these in order to target the specific advance allowed by the paradigm would really improve the match between the experimental and analytical data that is very solid and the author's conclusions.

      Concerning the paradigm, I recommend that the authors focus more on their novel ability to clearly dissociate the functional role of theta coherence prior to the choice as opposed to induced by the choice. Currently, they explain by contrasting previous studies based on dependent variables whereas their approach uses an independent variable. I was a bit confused by this, particularly because the task variable is not really independent given that it's based on a brain-driven loop. Since theta coherence remains correlated with many other neurophysiological variables, the results cannot go beyond showing that leading up to the decision it correlates with good choice accuracy, without providing evidence that it is theta coherence itself that enhances this accuracy as they suggest in lines 93-94.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Regarding previous results with muscimol inactivation, I recommend that the authors expand their discussion on this point. I think that their correlative data is not sufficient to conclude as they do that despite "these structures being deemed unnecessary" (based on causal muscimol experiments), they "can still contribute rather significantly" since their findings do not show a contribution, merely a correlation. This extra discussion could include possible explanations of the apparent, and thought-provoking discrepancies that they uncover such as: theta coherence may be a correlate of good accuracy without an underlying causal relation, theta coherence may always correlate with good accuracy but only be causally important in some tasks related to spatial working memory or, since muscimol experiments leave the brain time to adapt to the inactivation, redundancy between brain areas may mask their implication in the physiological context in certain tasks (see Goshen et al 2011).

      The second paragraph of the discussion is now dedicated to this.

      Possible further analysis :

      • In Extended 4A the authors show that performance drops with delay duration. It would be very interesting to see this graph with the high coherence / low coherence / yoked trials to see if the theta coherence is most important for longer trials for example.

      This is a great suggestion. Due to 10% of trials being triggered by high coherence states, our sample size precludes a robust analysis as suggested. Given that we found an enhancement effect on a task with minimal spatial working memory requirements (Fig. 4), it seems that coherence may be a general benefit or consequence of choice processes. Nonetheless, this remains an important question to address in a future study.

      • Figure 6: The authors explain in the text that although the effect of stimulation of VMT is variable, overall VMT activation increased PFC-HPC coherence. I think in the figure the results are only shown for one rat and session per panel. It would be interesting to add a figure including their whole data set to show the overall effect as well as the variability.

      The reviewer is correct and this comment promoted significant addition of detail to the manuscript. We have added an extended figure (Ext. Fig. 9) showing our VMT stimulation recording sessions. We originally did not include these because we were performing a parameter search to understanding if VMT stimulation could increase mPFC-hippocampal theta coherence. The results section was expanded accordingly.

      Changes to writing / figures :

      • The paper by Eliav et al, 2018 is cited to illustrate the universality of coupling between hippocampal rhythms and spikes whereas the main finding of this paper is that spikes lock to non-rhythmic LFP in the bat hippocampus. It seems inappropriate to cite this paper in the sentence on line 65.

      We agree with the reviewer and this citation was removed.

      • Line 180 when explaining the protocol, it would help comprehension if the authors clearly stated that "trial initiation" means opening the door to allow the rat to make its choice. I was initially unfamiliar with the paradigm and didn't figure this out immediately.

      We added a description to the second paragraph of our first results section.

      • Lines 324 and following: the analysis shows that there is a slow decay over around 2s of the theta coherence but not that it is periodical (as in regularly occurring in time), this would require the auto-correlation to show another bump at the timescale corresponding to the period of the signal. I recommend the authors use a different terminology.

      This comment is now addressed above in our response to Reviewer #2.

      • Lines 344: I am not sure why the stable theta coherence levels during the fixed delay phase show that the link with task performance is "through mechanisms specific to choice". Could the authors elaborate on this?

      We elaborated on this point further at the end of “Trials initiated by strong prefrontal-hippocampal theta coherence are characterized by prominent prefrontal theta rhythms and heightened pre-choice prefrontal-hippocampal synchrony”

      • Line 85: "independent to test the effect of choice outcome on task performance." I think there is a typo here and "choice outcome" should be "theta coherence".

      The sentence was removed in the updated draft.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the Authors):

      (1) Within the section on "optimized antigen retrieval", the authors mentioned that weak immunolabelling and strong non-specific labelling may be due to inadequate antigen retrieval. I wonder whether this interpretation is accurate. Could it also be due to inadequate antibody penetration?

      We appreciate the reviewer's comment and have revised our text to improve clarity. Regarding the SDS-electrophoresed sample (Figure S1a right), we acknowledge that the brain-surrounding background noise indicates insufficient antibody penetration. However, in the FLASH-processed sample (Figure S1a left), the background signal is uniformly distributed throughout the entire brain. Therefore, we conclude that incomplete antibody penetration is unlikely under this condition. Below is the revised paragraph:

      Revised manuscript, line 62-66: “We observed that both FLASH-processed and SDS-electrophoresed samples showed weak tyrosine hydroxylase (TH, a marker of dopaminergic neurons) signal (Figure S1a, Supporting Information). Additionally, we noticed that the FLASH-processed samples had almost no signal of NeuN, a marker of neuronal nuclei (Figure S1b left, Supporting Information), and exhibited strong non-specific background noise (Figure S1a left, Supporting Information). The presence of this background noise is considered an indicator of inadequate antigen retrieval.[48]”

      • Also, the authors mentioned the use of FLASH protocol and SDS-based electrophoresis for delipidation which were not described in the methods section.

      We have included the information in the revised Materials and Methods.

      Revised manuscript, line 418-426: S”HIELD processing, SDS-electrophoretic delipidation and FLASH delipidation. PFA-fixed specimens were incubated in SHIELD-OFF solution at 4 °C for 96 hours, followed by incubation for 24 hours in SHIELD-ON solution at 37 °C. All reagents were prepared using SHIELD kits (LifeCanvas Technologies, Seoul, South Korea) according to the manufacturer's instructions. For SDS-electrophoretic delipidation, SHIELD-processed specimens were placed in a stochastic electro-transport machine (SmartClear Pro II, LifeCanvas Technologies, Seoul, South Korea) running at a constant current of 1.2 A for 5-7 days. For FLASH delipidation, the SHIELD-processed specimens were placed in FLASH reagent (4% w/v SDS, 200 mM borate) and then incubated at 54 ℃ for 18 hours.[47] The delipidated specimens were washed with PBST at room temperature for at least 1 day.”

      • In addition, tyrosine hydroxylase (TH) should be a marker of "monoaminergic" neurons rather than specifically "dopaminergic" neurons.

      We appreciate the reviewer's correction. It is true that tyrosine hydroxylase (TH) is a marker for neurons that contain dopamine, norepinephrine, and epinephrine (catecholamines). However, the adrenergic and noradrenergic neurons are relatively few and are mostly located in the medulla and brain stem. Since we only monitoring the brain in this study, we wish to keep TH as an indicator of dopaminergic neurons.

      (2) It was mentioned that tissue integrity was retained following heating treatment during the MOCAT protocol. It would be useful to demonstrate any differences in structural distortion, if any, with before and after images with different delipidation agents.

      We have provided an additional supplementary figure (Figure S5 in the revised manuscript) to display the mouse brain at different stages of the MOCAT protocol, including pre-delipidation, post-delipidation, and post-RI-matching, to demonstrate the tissue integrity.

      Revised manuscript, line 135-137: “Figure S5 shows the gross views of the same mouse brain after undergoing 4% PFA fixation, paraffin processing, optimized antigen retrieval, and RI-matching, demonstrating intactness of the brain shape and preservation of tissue integrity.”

      (3) In this study, the authors have demonstrated the protocol could be successfully applied to FFPE specimens up to 15 years old. However, archival brain bank materials often have brain tissues with extended formalin fixation time. It may be useful to demonstrate that this technique can be utilised on FFPE tissues with long formalin fixation times.

      We appreciate the reviewer's suggestions. We have included an additional supplementary figure (Figure S6) to demonstrate the application of MOCAT to 3-month fixed mouse brain hemispheres. Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.

      The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. This indicates that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.

      Revised manuscript, line 163 to 167: “We also applied MOCAT to 3-month fixed mouse brain hemispheres (Figure S6). Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed clear details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.”

      Revised manuscript, line 346-351: “In the demonstration of MOCAT to 3-month fixed specimens, we observed that pontine reticular nucleus (Figure S6A, yellow arrowheads) lose TH-positive signals after long-term fixation. The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. The results indicate that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.”

      (4) Whilst it is encouraging to see this protocol enables multi-round immunolabelling, further work is required to demonstrate there is no cross-reactivity in subsequent rounds of immunostaining following bleaching (e.g. Non-specific secondary antibody binding).

      We appreciate the reviewer for noting their concern and providing suggestions. To address this issue, we have examined the results of the second to fourth rounds of multi-round staining, as shown in Figure 3. In all three sequential rounds, we utilized rabbit primary antibodies and the same secondary antibodies. Our observations under a 3.6x objective (NA = 0.2) did not reveal any colocalization with the staining from the previous round. Hence, we conclude that cross-reactivity is not significant. However, we acknowledge the need for more comprehensive testing to completely rule out the possibility of cross-reactivity, such as employing antibodies from different hosts or utilizing different types of secondary antibodies (e.g., IgG, Fab2).

      Revised manuscript line 189-191: “The brain shape and structural integrity remained after 4 rounds of immunolabeling, and there is no cross-reactivity in subsequent rounds of immunostaining following bleaching. (Figure S11).”

      • Also, how was the structural integrity maintained for tissues after multiple rounds of heat-induced epitope retrieval?

      We have provided an additional supplementary figure (Figure S11 in the revised manuscript) to demonstrate the structural integrity after 4 rounds of immunolabeling.

      Revised manuscript line 189-191: “The brain shape and structural integrity remained after 4 rounds of immunolabeling, and there is no cross-reactivity in subsequent rounds of immunostaining following bleaching (Figure S11).”

      (5) It may be useful to have a side-by-side comparison in staining quality with equivalent sizes of rodent and human brain tissues as there appeared to be a reduction in clarity and staining quality at greater imaging depth for human tissues.

      We have provided an additional supplementary figure (Figure S12) to show the fluorescent images of TH- and Lectin-labeling in 1mm-thick human and mouse brain tissues at depths of 100 um, 500 um, and 900 um. For millimeter-sized samples, both human and mouse brains showed comparable levels of transparency, with no noticeable reduction in fluorescence signal at varying depths. In our forthcoming studies, we plan to conduct a more comprehensive comparison of centimeter-sized human and mouse brain tissues.

      (6) Lectin staining is used throughout this study to label vasculature of the brain. How specific is this as compared with other vasculature markers such as CD31?

      We appreciate the reviewer for addressing their concern. Lectins are nonimmune-origin carbohydrate-binding proteins that have been utilized to label the surface of the blood vessel lumen. On the other hand, CD31, CD34, etc. are immunomarkers of vascular endothelial cells. Numerous references have confirmed that lectin staining consistently co-localizes with CD31 immunoreactivity (Battistella et al. 2021; Miyawaki et al. 2020). However, in tumors, blood vessels lacking a lumen may display CD31 positive/Lectin negative conditions (Morikawa et al. 2002).

      (7) When discussing the applicability of MOCAT on the astrocytoma mouse model, there is a bit of confusion with regard to the terminology. As astrocytoma by default will be comprised of astrocytes, it may be useful to describe the tumour astrocytes as ASTS1CI-GFP positive astrocytes and immunolabelled astrocytes as GFAP-positive astrocytes.

      We thank the reviewer for their suggestions. To avoid confusion for readers, we have made modifications to the content and labeling of Figure 6A.

      Revised manuscript, line 213-219: “…we subjected an intact FFPE brain from an astrocytoma mouse model (see Materials and Methods) to the MOCAT pipeline to label tumor cells (ASTS1CI-GFP positive astrocytes) and GFAP-positive astrocytes (Figure 6A, C). Accordingly, we could segment GFAP-positive astrocytes surrounding the tumor (Figure 6B, D, and E) and classify them according to their distances from the tumor cells. Statistical analysis (Figure 6F) revealed that nearly half of the GFAP-positive astrocytes were within the tumor, with 63.9% being located near the tumor surface (±200 μm).”

      (8) Within the methods section, further details of the antibodies such as the clonality and immunogen should be included in the supplementary table.

      We appreciate the reviewer for their suggestions. In the revised version, we have included these details in Supplementary Table 1.

      • Furthermore, there is inadequate detail regarding multi-round immunolabelling and the precise timing of immunolabelling including lectin staining, various imaging parameters including the working distance of the lens and excitation laser used.

      We have added the experimental details of multi-round staining for Figure 3 in Supplementary Table 3. This table now includes information about the amounts and types of chemicals and antibodies used, as well as the laser wavelengths used for each round. The staining conditions (including labeling time, temperature, and buffer used) have been disclosed in Materials and Methods (see MOCAT pipeline/Electrophoretic immunolabeling). Furthermore, we have included the working distance and NA value of the objective lens used in MOCAT pipeline/Volumetric imaging and 3D visualization subsection.

      Revised manuscript, line 464-479: “Electrophoretic immunolabeling (active staining). The procedure was modified from the previously published eFLASH protocol[15] and was conducted in a SmartLabel System (LifeCanvas Technologies, Seoul, South Korea). The specimens were preincubated overnight at room temperature in sample buffer (240 mM Tris, 160 mM CAPS, 20% w/v D-sorbitol, 0.9% w/v sodium deoxycholate). Each preincubated specimen was placed in a sample cup (provided by the manufacturer with the SmartLabel System) containing primary, corresponding secondary antibodies and lectin diluted in 8 mL of sample buffer. Information on antibodies, lectin and their optimized quantities is detailed in Supplementary Table 1. The specimens in the sample cup and 500 mL of labeling buffer (240 mM Tris, 160 mM CAPS, 20% w/v D-sorbitol, 0.2% w/v sodium deoxycholate) were loaded into the SmartLabel System. The device was operated at a constant voltage of 90 V with a current limit of 400 mA. After 18 hours of electrophoresis, 300 mL of booster solution (20% w/v D-sorbitol, 60 mM boric acid) was added, and electrophoresis continued for 4 hours. During the labeling, the temperature inside the device was kept at 25 ℃. Labeled specimens were washed twice (3 hours per wash) with PTwH (1× PBS with 0.2% w/v Tween-20 and 10 μg/mL heparin),[23] and then post-fixed with 4% PFA at room temperature for 1 day. Post-fixed specimens were washed twice (3 hours per wash) with PBST to remove any residual PFA.”

      Revised manuscript, line 483-490: “Volumetric imaging and 3D visualization. For centimeter-scale specimens, images were acquired using a light-sheet microscope (SmartSPIM, LifeCanvas Technologies, Seoul, South Korea) with a 3.6x customized immersion objective (NA = 0.2, working distance = 1.2 cm). For samples <3 mm thick, imaging was performed using a multipoint confocal microscope (Andor Dragonfly 200, Oxford Instruments, UK) with objectives that were UMPLFLN10XW (10x, NA = 0.3, working distance = 3.5 mm), UMPLFLN20XW (20x, NA = 0.5, working distance = 3.5 mm), UMPLFLN40XW (40x, NA = 0.8, working distance = 3.3 mm). 3D visualization was performed using Imaris software (Imaris 9.5.0, Bitplane, Belfast, UK).”

      • Also, since refractive index homogenisation is an important step in tissue-clearing experiments, it may be useful to describe the components of NFC1 and NFC2 solutions used and provide images of the "cleared" tissues.

      We have included the image of a cleared mouse brain in Figure S5. Additionally, we have provided the refraction index of NFC1 and NFC2 in Materials and Methods (see MOCAT pipeline/Refractive index matching). However, the composition of NFC1 and NFC2, being commercialized products from Nebulem (Taiwan), is non-disclosable.

      Reviewer #2 (Public Review):

      Major Weaknesses:

      • There is no evidence of actual transparency of the entire mouse brain across different treatments. The suggested protocol is very good at removing lipids (as assessed by DiD staining) and by results of fluorescence registration deep within the brain. BUT, since in many places of the manuscript authors speak of "transparency" the reader will expect the typical picture in which control and processed brains are on top of a white graphical pattern that would evidence transparency (see as an example Figure 1 and 2 of Wan et al. 2018 (Neurophotonics. 2018 Jul;5(3):035007. doi: 10.1117/1.NPh.5.3.035007.)

      We thank the reviewer for their suggestions. We have provided an additional supplementary figure (Figure S5 in the revised manuscript) to demonstrate the transparency.

      • The manuscript lacks clarity on the applicability of MOCAT to regular formalin-fixed tissue and tissues other than the brain.

      We appreciate the reviewer's suggestions. We have included an additional supplementary figure (Figure S6) to demonstrate the application of MOCAT to a 3-month regular formalin-fixed mouse brain hemisphere. We observed that the major dopaminergic regions were still labeled, although with reduced intensity and S/N ratio. We also observed that the fluorescence intensity was more affected in formalin, which is methanol-stabilized and stronger, than in PFA, implying that a stronger antigen retrieval method may be possible to rescue the intensity. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.

      Revised manuscript, line 163 to 167: “We also applied MOCAT to 3-month fixed mouse brain hemispheres (Figure S6). Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed clear details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.”

      Revised manuscript, line 346-351: “In the demonstration of MOCAT to 3-month fixed specimens, we observed that pontine reticular nucleus (Figure S6A, yellow arrowheads) lose TH-positive signals after long-term fixation. The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. The results indicate that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.”

      Regular formalin

      We agree with the reviewer and plan to investigate the potential use of MOCAT in tissues other than the brain in our subsequent studies.

      • Insufficient information is provided on the "epoxy treatment" or "hydrogel," and a more detailed explanation is warranted.

      We appreciate the reviewer's question. In response, we have included a paragraph in the Discussion section to clarify the appropriate timing for using epoxy or hydrogel in the MOCAT pipeline. However, the harsh conditions, such as pressure and heat, caused by external forces might damage specimens. To protect specimens from the harsh conditions caused by active staining, specimens could be strengthened by treatment with epoxy or acrylamide monomer to form a tissue-epoxy or tissue-hydrogel hybrid.[29,31] Laboratories that do not have adequate devices or handle small specimens could use passive immunolabeling instead and skip the step of epoxy or hydrogel pretreatment.

      Epoxy and acrylamide hydrogel can both strengthen tissue structures. However, in this study, we only used epoxy for treatment in combination with active electrophoretic staining. To avoid confusion and improve clarity, we have made modifications to Figure 1B and included epoxy processing in the MOCAT pipeline subsection within Materials and Methods.

      Revised manuscript, line 329-340: “In Figure 1B, we propose two staining strategies for samples with thicknesses less than 500 um and greater than 1 mm: passive immunolabeling and active immunolabeling. In passive immunolabeling, antibodies penetrate and reach their targets solely through diffusion, without any additional force. It takes approximately two months to passively stain a whole mouse brain.[26,28] Compared to passive immunolabeling, active immunolabeling uses an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process, reducing the required staining time for a whole mouse brain to one day. However, the harsh conditions, such as pressure and heat, caused by external forces might damage specimens. To protect specimens from the harsh conditions caused by active staining, specimens could be strengthened by treatment with epoxy or acrylamide monomer to form a tissue-epoxy or tissue-hydrogel hybrid.[29,31] Laboratories that do not have adequate devices or handle small specimens could use passive immunolabeling instead and skip the step of epoxy or hydrogel pretreatment.”

      • The differences between passive and active immunolabeling, as well as photobleaching data, should be addressed for a comprehensive understanding.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to explain the differences between passive and active immunolabeling:

      Revised manuscript, line 329-340: “In Figure 1B, we propose two staining strategies for samples with thicknesses less than 500 um and greater than 1 mm: passive immunolabeling and active immunolabeling. In passive immunolabeling, antibodies penetrate and reach their targets solely through diffusion, without any additional force. It takes approximately two months to passively stain a whole mouse brain.[26,28] Compared to passive immunolabeling, active immunolabeling uses an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process, reducing the required staining time for a whole mouse brain to one day.”

      Regarding the effects of photobleaching, we have added Figure S10 to demonstrate the efficiency of using our approach.

      Revised manuscript, line 184-185: After imaging, we photobleached transparent RI-matched samples using a 100W LED white light to quench the previously labeled fluorophores (Figure S10).

      • The assertion that MOCAT can be rapidly applied in hospital pathology departments seems overstated due to the limited availability of light-sheet microscopes outside research labs.

      We thank the reviewer's question. Since the imaging depth primarily relies on the working distance of the objective lens, if a long working distance objective lens (such as UMPLFLN10XW from Olympys Inc.) is available, it is also possible to scan samples up to a thickness of approximately 3.5mm. However, confocal systems require longer scanning times, and in non-optical sectioning wide-field fluorescence microscopes like the Olympus BX series or ZEISS Axio imager series, deconvolution algorithms must be utilized to eliminate out-of-focus signals.

      Additionally, the epifluorescence system may also result in reduced fluorescent intensity in the deeper regions of the sample. If the fluorescent signal of the target is weak or exceeds the working distance of the objective lens, an alternative option is to send the sample to a microscopy or imaging facility core for scanning and further analysis.

      • The compatibility of MOCAT with genetically encoded fluorescent proteins remains unclear and warrants further investigation.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to address this limitation of MOCAT:

      Revised manuscript, line 354-361: “Fourth, MOCAT is not compatible with endogenous fluorescence due to a reduction in fluorescence intensity caused by xylene and alcohol used in paraffin processing. Researchers who need to directly observe genetically encoded fluorescent proteins can utilize tissue-clearing methods such as 3DISCO, X-CLARITY, CUBIC, etc., which have been shown to minimize the decrease in fluorescence intensity. On the other hand, if researchers need to visualize transgenic fluorescent proteins along with other biomarkers, they can use MOCAT for delipidation and boost-immunolabeling to visualize the transgenic fluorescent proteins.”

      • The control of equivalent depths in cryosections for evaluating the intensity of DiD staining should be elaborated upon.

      We have included these information in the section of Materials and Methods:

      Revised manuscript, line 428-430: “Serial 20-µm-thick cryosections were cut from mouse brain slices (2-mm thick) of various treatment conditions for subsequent DiD or Oil red O staining. For DiD staining, cryosections (that were of approximately 0-40 µm depth) were post-fixed with 4% PFA at room temperature for 5 minutes.”

      • The composition of NFC1 and NFC2 solutions for refractive index matching should be provided.

      We have provided the refraction index of NFC1 and NFC2 in Materials and Methods (see MOCAT pipeline/Refractive index matching). However, the composition of NFC1 and NFC2, being commercialized products from Nebulem (Taiwan), is non-disclosable.

      Reviewer #2 (Recommendations for the Authors):

      • A larger readership would benefit from validating imaging depths using fluorescence microscopies commonly found in pathological departments (i.e. Confocal, 2-photon, epifluorescence+deconvolution, etc).

      We thank the reviewer's recommentation. Since the imaging depth primarily relies on the working distance of the objective lens, if a long working distance objective lens (such as UMPLFLN10XW from Olympys Inc.) is available, it is also possible to scan samples up to a thickness of approximately 3.5mm. However, confocal systems require longer scanning times, and in non-optical sectioning wide-field fluorescence microscopes like the Olympus BX series or ZEISS Axio imager series, deconvolution algorithms must be utilized to eliminate out-of-focus signals.

      Additionally, the epifluorescence system may also result in reduced fluorescent intensity in the deeper regions of the sample. If the fluorescent signal of the target is weak or exceeds the working distance of the objective lens, an alternative option is to send the sample to a microscopy or imaging facility core for scanning and further analysis.

      -Investigate the compatibility of MOCAT with genetically encoded fluorescent proteins, a common target in research specimens.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to address this limitation of MOCAT:

      Revised manuscript, line 354-361: “Fourth, MOCAT is not compatible with endogenous fluorescence due to a reduction in fluorescence intensity caused by xylene and alcohol used in paraffin processing. Researchers who need to directly observe genetically encoded fluorescent proteins can utilize tissue-clearing methods such as 3DISCO, X-CLARITY, CUBIC, etc., which have been shown to minimize the decrease in fluorescence intensity. On the other hand, if researchers need to visualize transgenic fluorescent proteins along with other biomarkers, they can use MOCAT for delipidation and boost-immunolabeling to visualize the transgenic fluorescent proteins.” References:

      Battistella, Roberta et al. 2021. “Not All Lectins Are Equally Suitable for Labeling Rodent Vasculature.” International Journal of Molecular Sciences 22(21): 22. /pmc/articles/PMC8584019/ (January23, 2024).

      Miyawaki, Takeyuki et al. 2020. “Visualization and Molecular Characterization of Whole-Brain Vascular Networks with Capillary Resolution.” Nature Communications 2020 11:1 11(1): 1–11. https://www.nature.com/articles/s41467-020-14786-z (January23, 2024).

      Morikawa, Shunichi et al. 2002. “Abnormalities in Pericytes on Blood Vessels and Endothelial Sprouts in Tumors.” The American Journal of Pathology 160(3): 985–1000.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Mice can learn to associate sensory cues (sound and light) with a reward or activation of dopamine neurons in the ventral tegmental area (VTA), and then anticipate the reward from the sensory cue only. Using this paradigm, Harada et al. showed that after learning, the cue is able to induce dopamine release in the projection targets of the VTA, namely the nucleus accumbens and lateral hypothalamus (LH). Within the LH, dopamine release from VTA neurons (either by presentation of the cue or direct optical stimulation of VTA neurons) activates orexin neurons, measured as an increase in intracellular calcium levels.

      Strengths:

      This study utilized genetically encoded optical tools to selectively stimulate dopamine neurons and to monitor dopamine release in target brain areas and the calcium response of orexin neurons. This allowed a direct assessment of the relationship between the behavioral response of the animals, the release of a key neurotransmitter in select brain areas, and its effect on target cells, with a precision previously not possible. The results shed light on the mechanism underlying reward-related learning and expectation.

      Weaknesses: - The Ca increase in orexin neurons in response to optical stimulation of VTA DA neurons is convincing. However, there is an accumulated body of literature indicating that dopamine inhibits orexin neurons through D2 receptors, particularly at high concentrations both directly and indirectly (PMID 15634779, 16611835, 26036709, 30462527; but note that synaptic effects at low conc are excitatory - PMID 30462527, 26036709). There should be a clear acknowledgment of these previous studies and a discussion directly addressing the discrepancy. Furthermore, there are in-vivo studies that investigated the role of dopamine in the LH involving orexin neurons in different behavioral contexts (e.g. PMID 24236888). The statement found in the introduction "whether and how dopamine release modulates orexin neuronal activity has not been investigated vigorously" (3rd para of Introduction) is an understatement of these previous reports.

      We thank the Reviewer for pointing out that we missed several important citations. We added the references mentioned and the discrepancy of concern is addressed in the discussion section

      • Along these lines, previous reports of concentration-dependent bidirectional dopaminergic modulation of orexin neurons suggest that high and low levels of DA would affect orexin neurons differently. Is there any way to estimate the local concentration of DA released by the laser stimulation protocol used in this study? Could there be a dose dependency in the Intensity of laser stimulation and orexin neuron response?

      We agree that this is an interesting point. However, one limitation of our study, and of intensity-based genetically-encoded sensors in general, is that the estimation of the concentration is technically difficult. The sensor effectively reports changes in extra-synaptic levels of neurotransmitters, but to get the absolute value other modalities would be needed such as fast scan voltammetry. This limitation is now included in the discussion section.

      • The transient dip in DA signal during omission sessions in Fig2C (approx 1% decrease from baseline) is similar in amplitude compared to the decrease seen in non-laser trails shown in Fig 1C right panel (although the time course of the latter is unknown as the data is truncated). The authors should clarify whether those dips are a direct effect of the cue itself or indeed reward prediction error.

      Thanks for raising this important point. Indeed, there is a dip of the signal during non-stimulation trials. At day 1, the delivery of the cue triggered a dip and at day 10, there was a slight increase of the signal and followed by the dip. The data is difficult to interpret but our hypothesis is that two components trigger this dip of the signal. One is the aversiveness of the cue. Because a relatively loud sound (90dB) was used for the cue, it would not be surprising if the auditory cue was slightly aversive to the experimental animals. It has been shown that aversive stimuli induce a dip of dopamine in the NAc, although it is specific to NAc subregions. The second component is reward prediction error. Although the non-laser paired cue never triggered the laser stimulation, it is similar to the laser paired one. In a way both are composed of loud tone and same color of the visual cue (spatially different). We think it is possible that reward-related neuronal circuit was slightly activated by the non-laser paired cue. In line with this interpretation, a small increase of the signal was observed at day 10 but not day 1. If our hypothesis is true, since this signal was induced by two components, further analysis is unfortunately difficult.

      • There seem to be orexin-negative-GCaMP6 positive cells (Fig. 4B), suggesting that not all cells were phenotypically orexin+ at the time of imaging.<br /> The proportion of GCaMP6 cells that were ORX+ or negative and whether they responded differently to the stimuli should be indicated.

      While we acknowledge the observation of orexin-negative-GCaMP6 positive cells in Figure 4B, it's important to note that this phenomenon is consistent with the characteristics of the hOX-GCaMP virus used in prior experiments. The virus has undergone thorough characterization, and it has been reported to exhibit over 90% specificity, as demonstrated in prior work conducted in the laboratory of one of our contributing authors (PMID: 27546579). To address the concern raised by the reviewer, we have included Supplemental Figure 4 confirming that all mice consistently exhibited qualitatively similar hOX-GCaMP transients upon dopaminergic terminal stimulation. This additional evidence supports the reliability and specificity of our experimental approach.

      • Laser stimulation of DA neurons at the level of cell bodies (in VTA) induces an increase in DA release within the LH (Fig. 3C, D), however, there is no corresponding Ca signal in orexin neurons (Fig.4C).

      We realized that the figures were not clear and we understood that the reviewer did not see any corresponding Ca signal, but this description is not true. We now added Supplemental Figure 3 to show that there is Ca signal at day 1 already.

      In contrast, stimulating DA terminals within the LH induces a robust, long-lasting Ca signal (> 30s) in orexin neurons (Fig. 5). The initial peak is blocked by raclopride but the majority of Ca signal is insensitive to DA antagonists (please add a positive control or cite references indicating that the dose of antagonists used was sufficient; also the timing of antagonist administration should be indicated).

      This is now included in the discussion section. Also, the timing and dose of the antagonist is now described in the method section.

      Taken together, these results seem to suggest that DA does not directly increase Ca signal in orexin neurons. What could be mediating the remaining component?

      This point has been included in the discussion section.

      • Similarly, there is an elevation of Ca signal in orexin neurons that remains significantly higher after the cue/laser stimulation (Fig. 4F). It appears that it is this sustained component that is missing in omission trials. This can be analyzed further.

      It is true that there is a sustained component in stimulation trials, that is missing in omission trials. Most likely that is evoked by the stimulation of dopamine neurons. We argue that this component is isolated in Fig 5 and analyzed as much as we can.

      • Mice of both sexes were used in this study; it would be interesting to know whether sex differences were observed or not.

      We agree that this is an important point. However, our sample number is not high enough to make a meaningful comparison between male and female.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written study assessing the role of dopaminergic inputs from the VTA on orexin cell responses in an opto-pavlovian conditioning task. These data are consistent with a possible role of this system in reward expectation and are surprisingly one of the first demonstrations of a role for dopamine in this phenomenon.

      Strengths:

      The study has used an interesting opto-Pavlovian approach combined with fibre photometry.

      Weaknesses:

      It is unclear what n size was used or analysed, particularly for AUC measures e.g. Figures 1 D/E and 3 G. The number of trials reflected and the animal numbers need clarification.

      The sample size is indicated in the legend section.

      The study focused on opto-stim omissions - this work would be significantly strengthened by a comparison to a real-world examination where animals are trained for a radiation reward (food pellet).

      We agree that this would be an important experiment. This experiment is partially done in one of the contributing authors laboratories (doi.org/10.1101/2022.04.13.488195) and would be one of our follow up study.

      Have the authors considered the role of orexin in the opposing situation i.e. a surprise addition of reward?

      That would be an interesting experiment. To do that, natural reward, not optical stimulation, should be used as a reinforcer. This could be part of our follow up study.

      Similarly, there remains some conjecture regarding the role of these systems in reward and aversion - have the authors considered aversive learning paradigms - fear, or fear extinction - to further explore the roles of this system? There are some (important) discussions about the possible role of orexin in negative reinforcement. Further studies to address this could be warranted.

      It is true that dopamine also plays a significant role in aversive learning. Therefore, this would be an interesting experiment. The discussion section now includes this point.

      I think some further discussion of the work by Lineman concerning the interesting bidirectional actions of d1/d2 r signalling on glutamatergic transmission onto orexin neurons is worthwhile. While this work is currently cited, the nuance and perhaps relevance to d1 and d2 signalling could be contextualised a little more (https://doi.org/10.1152/ajpregu.00150.2018).

      Thanks for the suggestion. The discussion has been expanded.

      Reviewer #3 (Public Review):

      Summary:

      Harada and colleagues describe an interesting set of experiments characterizing the relationship between dopamine cell activity in the ventral tegmental area (VTA) and orexin neuron activity in the lateral hypothalamus (LH). All experiments are conducted in the context of an opto-Pavlovian learning task, in which a cue predicts optogenetic stimulation of VTA dopamine neurons. With training, cues that predict DA stimulation come to elicit dopamine release in LH (a similar effect is seen in accumbens). After training, omission trials (cue followed by no laser) result in a dip (inhibition) of dopamine release in LH, characteristic of reward prediction error observed in the striatum. Across cue training, the activity pattern of orexin neurons in LH mirrors that of LH DA levels. However, unlike the DA signal, orexin neurons do not exhibit a decrease in activity in omission trials. Systemic blockade of D2 but not D1 receptors blocked DA release in LH following VTA DA cell stimulation.

      Strengths: Although much work has been dedicated to examining projections from orexin cells to VTA, less has been done to characterize reciprocal projections and their function. In this way, this paper is a very important addition to the literature. The experiments are technically sound (with some limitations, below) and utilize sophisticated approaches, the manuscript is nicely written, and the conclusions are mostly reasonable based on the data collected.

      Weaknesses:

      I believe the impact of the paper could be enhanced by considering and/or addressing the following:

      Major:

      • I encourage the authors to discuss in the Introduction previous work on DA regulation of orexin neurons. In particular, the authors cite, but do not describe in any detail, the very relevant Linehan paper (2019; Am J Physiol Regul) which shows that DA differentially alters excitatory/inhibitory input onto orexin neurons and that these actions are reversed by D1 vs D2 receptor antagonists. Another paper (Bubser, 2005, EJN) showed that dopamine agonists increase the activity of orexin neurons and that these effects are blocked by D1/D2 antagonists. The current findings should be discussed in the context of these (and any other relevant) papers in the Discussion, too.

      Thanks for the valuable suggestion. This point has been integrated and the introduction and discussion sections have been revised carefully.

      • In the Discussion, the authors provide two (plausible) explanations for why they did not observe a dip in the calcium signal of orexin neurons during omission trials. Is it not possible that these cells do not encode for this type of RPE?

      We completely agree that it is possible. Now our current hypothesis is that dopamine in the LH encodes RPE and that information is transmitted to orexin neurons. Orexin neurons integrate other information and encode something else, we call it ‘multiplexed cognitive information’. It is still open question what this means exactly. This point is now mentioned in the discussion section.

      • Related to the above - I am curious about the authors' thoughts on why there is such redundancy in the system. i.e. why is dopamine doing the same thing in NAC and LH in the context of cue-reward learning?

      Thank you for the question. This is an important point, indeed. Our current hypothesis is described in the discussion section.

      ’Our data indicate that dopamine in both the NAc and LH encodes reward prediction error (RPE). One open question is the existence of such a redundant mechanism. We hypothesize that dopamine in the LH boosts dopamine release via a positive feedback loop between the orexin and dopamine systems. It has already been established that some orexin neurons project to dopaminergic neurons in the VTA, positively modulating firing. On the other hand, our data indicate that dopamine in the LH stimulates orexinergic neurons. These collective findings suggest that when either the orexin or dopamine system is activated, the other system is also activated consequently. Although the current findings align with this idea, the hypothesis should be carefully challenged and scrutinized.’

      • The data, as they stand, are largely correlative and do not indicate that DA recruitment of orexin neurons is necessary for learning to occur. It would be compelling if blocking the orexin cell recruitment affected some behavioral outcomes of learning. Similarly - does raclopride treatment across training prevent learning?

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack behavioral data. However, given the extensive previous research on the crucial role of orexin in motivated behavior, we argue that establishing dopaminergic regulation of the orexin system itself is a valuable contribution. This perspective is thoroughly discussed in the dedicated section of our paper. It's important to note that the injection of D2 antagonists, including raclopride, is known to induce significant sedation. Due to this sedative effect, combining behavioral experiments with these drugs poses considerable challenges.

      • Only single doses of SCH23390 and raclopride were used. How were these selected? It would be nice to use more of a dose range to show that 1) and effect of D1R blockade was not missed, and 2) that the reduction in orexin signal with raclopride was dose-dependent.

      The rationale of the dose has been added to the discussion session. It is reported that these doses block dopamine receptors. We agree that it would be nice to have a dose-response curve, we are reluctant to increase the doses to avoid adverse effect to the experimental animals. The doses we used effectively induced hypo-locomotion, although data is not shown.

      • Fig 1C, could the effect the authors observed be due to movement?

      We argue this is unlikely. We recorded two channels one for the control and the other one for the signal. The motion-related artifact is corrected based on the control channel. One example trace around the laser stimulation is shown below. Please note that a typical motion-related artifact is a fast dip of the signal, normally observed in both 405 and 465 nm channels.

      Relatedly, what was the behavior like when the cue was on? Did mice orient/approach the cue?

      Although it has been reported that rats approach the cue (PMID: 30038277) in a similar task, it was not obvious in our case. It could be because we used both visual and auditory cues. Mice showed a general increase of locomotion during the cue and the stimulation but the direction was not clear to the experimenter.

      Also, when does the learning about the cue occur? Does it take all 10 days of learning or does this learning/cue-induced increase in dopamine signaling occur in less than 10 days?

      It is hard to say when the learning occurs. When we look at the learning curve of Figures 1,3 and 4, it seems the response to the cue plateaus at day 5 but since we don’t have behavioral data, the assessment is relayed only on the neuronal signal.

      • Also related to the above, could the observed dopamine signal be a result of just the laser turning on? It would seem important to include mice with a control sensor.

      We recorded two channels, 405 nm and 465 nm wavelength. 405 nm signal did not show increase of the signal while 465 nm signal did. The example trace is shown. Besides, the sensor has been characterized by the corresponding author already so we argue that this is unlikely.

      Author response image 1.

      Fig 1E, the effect seems to be driven by one mouse which looks like it could be a statistical outlier. The inclusion of additional animals would make these data more compelling.

      We agree that adding more mice would make data more compelling. However, considering the fact that dopamine in the accumbens has been investigated vigorously and our data is in line with the prior studies, we argue that we have enough data to claim our conclusion.

      • For Fig 1C, 3D, 3F, and 4D, could the authors please show the traces for the entire length of laser onset? It would be helpful to see both the rise and the fall of dopamine signals.

      For Fig 1C, one panel has been added. For fig 3, 4, supplemental figure was created to show the signal around laser stimulation.

      • Fig 2C, could the authors comment on how they compared the AUC to baseline? Was this comparison against zero? Because of natural hills and troughs during signals prior to cue (which may not equate to a zero), comparing the omission-induced dip to a zero may not be appropriate. A better baseline might be using the signals prior to the cue.

      The signal immediately before the cue onset was considered as a baseline, and baseline was subtracted. This means zero and baseline would be the same in our way of analysis.

      • Could the authors comment on how they came up with the 4-5.3s window to observe the AUC in Fig 3H?

      Since the kinetic of dopamine in the NAc and LH is different, different time windows have been used to observed a dip of dopamine. The analysis of the kinetics has been added.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific feedback to the authors

      • Sample size for each experiment/group could not be found.

      The sample size is now included in the legends.

      • In most figures, the timing of onset for the cue and laser stimulation is unclear. This makes the data interpretation difficult. They should be labeled as in Fig. 3C, for example.

      Panels have been updated to address this point.

      • Please provide the rationale for selecting the time range for the measurement of AUC for different experiments (e.g. Fig. 2C, 3H, 4A, 5F).

      The kinetics of dopamine in NAc and LH are different. This is now shown in the new Supplemental Figure 2. Based on this difference, the different window was chosen.

      • Fig. 1E, 3G right, 4E right: statistical analysis should use two-way repeated measures ANOVA rather than one-way ANOVA. Fig 1D, 3G left and 4E left panels can also be analyzed by two-way repeated measures ANOVA.

      We realized that those panels were redundant. Some panels have been removed and the analysis has been conducted according to this point.

      Minor comments:

      Fig. 2C can also show non-omission trials as a comparison.

      The panel has been updated.

      • The term "laser cue" is confusing, as the cue itself does not involve a laser.

      ’Laser-paired cue’ is used instead.

      • Color contrast can be improved for some figures, including Fig. 2C right, Fig. 3H right, and green and blue fluorescent fonts.

      The panels have been updated.

      • Figure legends: Tukey's test, rather than Tekey's test.

      This has been fixed.

      • There are some long-winded sentences that are hard to follow.

      Edited.

      • p.2, line 11 from bottom: should read ...the VTA evokes the release of dopamine.

      Edited

      • p.3, line 9: remove e from release.

      This has been addressed.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      • When discussing the understudied role of dopamine in brain regions other than the striatum in the Introduction, it might be helpful to cite this article: https://elifesciences.org/articles/81980 where the authors characterize dopamine in the bed nucleus of stria terminalis in associative behaviors and reward prediction error.

      The discussion session has been updated accordingly.

      • In the Discussion, it might be better to refrain from describing the results as 'measuring dopamine release' in the LH. Since there was no direct detection of dopamine release, rather a dopamine binding to the dLight receptors, referring to the detection as dopamine signaling/binding/transients is a better alternative.

      This point has been addressed.

      • In the Discussion, without measuring tonic dopamine release, it is difficult to say that there was a tonic dopamine release in the LH prior to negative RPE. In addition, I wouldn't describe the negative RPE as silencing of dopamine neurons projecting to the LH since this was not directly measured and it is hard to say for sure if the dip in dopamine is caused by silencing of the neurons. There certainly seems to be a reduction in extra-synaptic dopamine signaling in LH, however, what occurs upstream is unknown.

      We respectfully disagree with this point. In our opinion, the dopamine transient is more important than the firing of dopamine neurons because what matters for downstream neurons is dopamine concentration. For example, administration of cocaine increases the dopamine concentration extra-synaptically via blockade of DAT, while the firing of dopamine neurons go down via activation of D2 receptors expressed in dopamine neurons. Administration of cocaine is not known to induce negative RPE.

      • Typo at multiple places: 'Tekey's multiple comparison test'.

      This has been fixed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1 Comments (PublicReview)

      Point 1: First, the authors should provide more convincing data showing that tor and tapA genes are indeed duplicated genes in A. flavus. The authors appeared to use the A. flavus PTS strain as a parental strain for constructing the tor and tapA mutants. If so, the A. flavus CA14 strain (Hua et al., 2007) should be the parental wild-type strain for the A. flavus PTS strain. I did a BLAST search in NCBI for the torA (AFLA_044350) and tapA (AFLA_092770) genes using the most recent CA14 genome assembly sequence (GCA_014784225.2) and only found one allele for each gene: torA on chromosome 7 and tapA on chromosome 3. I could not find any other parts with similar sequences. Even in another popular A. flavus wild-type strain, NRRL3357, both torA and tapA exist as a single allele. Based on the published genome assembly data for A. flavus, there is no evidence to support the idea that tor and tapA exist as copies of each other. Therefore, the authors could perform a Southern blot analysis to further verify their claim. If torA and tapA indeed exist as duplicate copies in different chromosomal locations, Southern blot data could provide supporting results.

      Response 1: We thank the reviewer for their insightful observation. Based on the southern blot analysis results presented in Figure 1, we have determined that torA and tapA are single-copy genes. Additionally, we conducted protoplast transformation experiments repeated several times. which revealed that both torA and tapA transformants exhibited ectopic mutations. It is plausible that the deletion of torA and tapA genes may lead to the demise of A. flavus, this phenomenon is consistent with previous studies conducted on the fungus Fusarium graminearum[1].To ensure the rigor of the study, we have retracted the previously incorrect conclusion. We once again express our heartfelt appreciation to the experts for their valuable suggestions.

      Author response image 1.

      Fig.1 Southern blot hybridization analyses of WT, torA, and tapA transformants. (A) The structure diagram of the torA gene. (B) The structure diagram of the tapA gene. (C) Southern blot hybridization analyses of torA gene. (D) Southern blot hybridization analyses of tapA gene.

      Point 2: Second, the authors should consider the possibility of aneuploidy for their constructed mutants. When an essential gene is targeted for deletion, aneuploidy often occurs even in a fungal strain without the "ku" mutation, which results in seemingly dual copies of the gene. As the authors appear to use the A. flavus PTS strain having the "ku" mutation, the parental strain has increased genome instability, which may result in enhanced chromosomal rearrangements. So, it will be necessary to Illumina-sequence their tor and tapA mutants to make sure that they are not aneuploidy.

      Response 2: Thank you for your comment. Based on the sequencing results of the torA and tapA mutants, it was determined that the torA and tapA genes were still present in both mutants. In this case, it suggests that the torA and tapA genes may have undergone a genetic rearrangement or insertion at a different site in the mutant strains.

      Point 3: Furthermore, the genetic nomenclature +/- and -/- should be reserved for heterozygous and homozygous mutants in a diploid strain. As A. flavus is not a diploid strain, this type of description could cause confusion for the readers.

      Response 3: Thank you for your suggestion. We acknowledge your concerns about potential confusion caused by using this type of description, and we agree that it is best to avoid any misunderstandings for readers. Therefore, we have decided to remove this part of the content from the manuscript.

      Response to Reviewer 2 Comments (PublicReview)

      Point 1: However, findings have not been deeply explored and conclusions are mostly are based on parallel phenotypic observations. In addition, there are some concerns for the conclusions.

      Response 1: We are grateful for the suggestion. We conduct additional experiments and analyses to provide a more comprehensive understanding and address concerns raised.

      Response to Reviewer 3 Comments (PublicReview)

      Point 1: The paper by Li et al. describes the role of the TOR pathway in Aspergillus flavus. The authors tested the effect of rapamycin in WT and different deletion strains. This paper is based on a lot of experiments and work but remains rather descriptive and confirms the results obtained in other fungi. It shows that the TOR pathway is involved in conidiation, aflatoxin production, pathogenicity, and hyphal growth. This is inferred from rapamycin treatment and TOR1/2 deletions. Rapamycin treatment also causes lipid accumulation in hyphae. The phenotypes are not surprising as they have been shown already for several fungi. In addition, one caveat is in my opinion that the strains grow very slowly and this could cause many downstream effects. Several kinases and phosphatases are involved in the TOR pathway. They were known from S. cerevisiae or filamentous fungi. The authors characterized them as well with knock-out approaches.

      Response 1: Thank you for your comment. The role of the target of rapamycin (TOR) signaling pathway is of fundamental importance in the physiological processes of diverse eukaryotic organisms. Nevertheless, its precise involvement in regulating the developmental and virulent characteristics of opportunistic pathogenic fungi, such as A. flavus, has yet to be fully elucidated. Furthermore, the mechanistic underpinnings of TOR pathway activity specifically in A. flavus remain largely unresolved. Consequently, our study represents a significant contribution as the first comprehensive exploration of the conserved TOR signaling pathway encompassing a majority of its constituent genes in A. flavus.

      Response to Reviewer 1 Comments (Recommendations For The Authors)

      Point 1: In Table S3, the authors indicated that the Δku70 ΔniaD ΔpyrG::pyrG strain is A. flavus wild-type strain. However, this strain is not a wild-type strain because it seems like a control strain after introducing the pyrG gene into the A. flavus PTS strain (Δku70 ΔniaD ΔpyrG). So please indicate the real wild-type A. flavus strain name to help readers find out its original genome sequence data. Also, the reference for this Δku70 ΔniaD ΔpyrG::pyrG strain is "saved in our lab". This is not an eligible reference. If you use this control strain for the first time in this study, it should be described as "In this study". Otherwise, please indicate the proper reference for which the strain was first used.

      Response 1: Thank you for your valuable feedback on our manuscript. We appreciate your attention to detail and the opportunity to clarify the information regarding the strain in Table S3. The A. flavus CA14 strain which produces aflatoxins and large sclerotia was isolated from a pistachio bud in the Wolfskill Grant Experimental Farm (University of Davis, Winters, California, USA)[2]. The A. flavus CA14 strain is the parental wild-type strain for the A. flavus CA14 PTs (Δku70, ΔniaD, ΔpyrG) strain. The recipient strain CA14 PTs has been used satisfactorily in gene knockout and subsequent genetic complementation experiments[3]. In this study, the A. flavus CA14 PTs strain was used as the transformation recipient strain, and the control strain (Δku70, ΔniaD, ΔpyrG::pyrG) created by introducing the pyrG gene into the A. flavus CA14 PTs strain. Refer to previously published literature[4],this control strain (Δku70, ΔniaD, ΔpyrG::pyrG) was named wild-type strain. Therefore, this control strain was also named wild-type strain in this study. As this control strain is indeed used in this study, we will revise the reference to "In this study" Once again, we appreciate your keen attention to detail and thank you for bringing these issues to our attention.

      Response to Reviewer 2 Comments (Recommendations For The Authors)

      Point 1: As in response: However, the tor gene in A. flavus exhibited varying copy numbers, as was confirmed by absolute quantification PCR at the genome level (Table S1). However, it is hard to understand Table S1: Estimation of copy number of tor gene in A. flavus toro and sumoo stand for the initial copy number, and the data are figured as the mean {plus minus} 95% confidence limit. CN is copy number. As indicated in the section of Method, using sumo gene as reference, the tor and tapA gene copy number was calculated by standard curve. In Table S1 of WT, for tor gene, CN value is 1412537 compared to 1698243 in tor+/-, for the reference gene sumo,794328 compared to1584893, how these data could support copy gene numbers of tor?

      Response 1: Thank you for your suggestion. We understand the confusion with the data presented in Table S1 regarding the copy number estimation of the tor gene in A. flavus. We apologize for not providing a clear explanation for the data in the table. Quantitative real-time PCR (qPCR) is widely used to determine the copy number of a specific gene. It involves amplifying the gene of interest and a reference gene simultaneously using specific primers and probes. By comparing the amplification curves of the gene of interest and the reference gene, you can estimate the relative copy number of the gene.

      To address your concern and provide more accurate information, we have re-performed the copy number analysis using southern blot. Southern blot analysis allows for the direct estimation of gene copy number by hybridizing genomic DNA with a specific probe for the gene. This method provides more reliable and accurate results in determining gene copy numbers. The southern blot analysis results are presented in Figure 1.

      We appreciate your input and apologize for any confusion caused by the earlier presentation of the data.

      Point 2: In response: For the knockout of the FRB domain, we used the homologous recombination method, but because tor genes are double-copy genes, there are also double copies in the FRB domain. Despite our efforts, we encountered challenges in precisely determining the location of the other copy of the tor gene. I could not understand these consistent data, why not for using sequencing?

      Response 2: Thank you for your comment. We observed that the torA gene is a single copy. We removed this part of the results to avoid any ambiguity or potential misinterpretation.

      Point 3: Response in Due to the large number of genes involved, we did not perform a complementation experiment. If there were no complementation data, how to demonstrate data are solid?

      Response 3: Thank you for your important suggestion. We understand that complementation experiments are commonly used to validate gene deletions. Therefore, to ensure the reliability of our data, we have conducted supplementary experiments on specific gene deletions, such as ΔsitA-C and Δppg1-C. Thank you again for your positive comments and valuable suggestions to improve the quality of our manuscript.

      References:

      (1) Yu F, Gu Q, Yun Y, et al. The TOR signaling pathway regulates vegetative development and virulence in Fusarium graminearum. New Phytol. 2014; 203(1): 219-32.

      (2) Hua SS, Tarun AS, Pandey SN, Chang L, Chang PK. Characterization of AFLAV, a Tf1/Sushi retrotransposon from Aspergillus flavus. Mycopathologia. 2007 Feb;163(2):97-104.

      (3) Chang PK, Scharfenstein LL, Mack B, Hua SST. Genome sequence of an Aspergillus flavus CA14 strain that is widely used in gene function studies. Microbiol Resour Announc. 2019 Aug 15;8(33):e00837-19.

      (4) Zhu Z, Yang M, Yang G, Zhang B, Cao X, Yuan J, Ge F, Wang S. PP2C phosphatases Ptc1 and Ptc2 dephosphorylate PGK1 to regulate autophagy and aflatoxin synthesis in the pathogenic fungus Aspergillus flavus. mBio. 2023 Oct 31;14(5):e0097723.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Satoshi Yamashita et al., investigate the physical mechanisms driving tissue bending using the cellular Potts Model, starting from a planar cellular monolayer. They argue that apical length-independent tension control alone cannot explain bending phenomena in the cellular Potts Model, contrasting with the vertex model. However, the evidence supporting this claim is incomplete. They conclude that an apical elastic term, with zero rest value (due to endocytosis/exocytosis), is necessary in constricting cells and that tissue bending can be enhanced by adding a supracellular myosin cable. Notably, a very high apical elastic constant promotes planar tissue configurations, opposing bending.

      Strengths:

      • The finding of the required mechanisms for tissue bending in the cellular Potts Model provides a more natural alternative for studying bending processes in situations with highly curved cells.

      • Despite viewing cellular delamination as an undesired outcome in this particular manuscript, the model's capability to naturally allow T1 events might prove useful for studying cell mechanics during out-of-plane extrusion.

      We thank the reviewer for the careful comments and insightful suggestions.

      Weaknesses:

      • The authors claim that the cellular Potts Model is unable to obtain the vertex model simulation results, but the lack of a substantial comparison undermines this assertion. No references are provided with vertex model simulations, employing similar setups and rules, and explaining tissue bending solely through an increase in a length-independent apical tension.

      We did not copy the parameters of the vertex models in the preceding studies because we also found that the apical, lateral, and basal surface tensions must be balanced otherwise the epithelial cell could not maintain the integrity (Supplementary Figure 1), while the ratio was outside of the suitable range in the preceding studies.

      • The apparent disparity between the two models is attributed to straight versus curved cellular junctions, with cells with a curved lateral junction achieving lower minimum energies at steady-state. However, a critical discussion on the impact of T1 events, allowing cellular delamination, is absent. Note that some of the cited vertex model works do not allow T1 events while allowing curvature.

      We appreciate the comment, and will add it to the discussion.

      • The suggested mechanism for inducing tissue bending in the cellular Potts Model, involving an apical elastic term, has been utilized in earlier studies, including a cited vertex model paper (Polyakov 2014). Consequently, the physical concept behind this implementation is not novel and warrants discussion.

      The reviewer is correct but Polyakov et al. assumed “that the cytoskeletal components lining the inside membrane surfaces of the cells provide these surfaces with springlike elastic properties” without justification. We assumed that the myosin activity generated not the elasticity but the contractility based on Labouesse et al. (2015), and expected that the surface elasticity corresponded with the membrane elasticity. Also, in the physical concept, we clarified how the contractility and the elasticity differently deformed the cells and tissue, and demonstrated why the elasticity was important for the apical constriction. We will add it to the discussion.

      • The absence of information on parameter values, initial condition creation, and boundary conditions in the manuscript hinders reproducibility. Additionally, the explanation for the chosen values and their unit conversion is lacking.

      We agree with the comment, and will add them to the methods.

      Reviewer #2 (Public Review):

      Summary:

      In their work, the authors study local mechanics in an invaginating epithelial tissue. The mostly computational work relies on the Cellular Potts model. The main result shows that an increased apical "contractility" is not sufficient to properly drive apical constriction and subsequent tissue invagination. The authors propose an alternative model, where they consider an alternative driver, namely the "apical surface elasticity".

      Strengths:

      It is surprising that despite the fact that apical constriction and tissue invagination are probably most studied processes in tissue morphogenesis, the underlying physical mechanisms are still not entirely understood. This work supports this notion by showing that simply increasing apical tension is perhaps not sufficient to locally constrict and invaginate a tissue.

      We thank the reviewer for recognizing the importance and novelty of our work.

      Weaknesses:

      The findings and claims in the manuscript are only partially supported. With the computational methodology for studying tissue mechanics being so well developed in the field, the authors could probably have done a more thorough job of supporting the main findings of their work.

      We thank the reviewer for the careful assessment and suggestions. However our simulation was computationally expensive, modeling the epithelium in an analytically calculable expression requires a lot of work, and it is beyond the scope of the present study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Expanding the Drosophila toolkit for dual control of gene expression" by Zirin et al. aims to develop resources for simultaneous independent manipulation of multiple genes in Drosophila. The authors use CRISPR knock-ins to establish a collection of T2A-LexA and T2A-QF2 transgenes with expression patterns in a number of commonly studied organs and tissues. In addition to the transgenic lines that are established, the authors describe a number of plasmids that can be used to generate additional transgenes, including a plasmid to generate a dual insert of LexA and QF that can be resolved into a single insert using FLP/FRT-mediated recombination, and plasmids to generate RNAi reagents for the LexA and QF systems. Finally, the authors demonstrate that a subset of the LexA and QF lines that they generated can induce RNAi phenotypes when paired with LexAop or QUAS shRNA lines. In general, the claims of the paper are well supported by the evidence and the authors do a thorough job of validating the transgenic lines and characterizing their expression patterns.

      Strengths:

      • Numerous Gal4 lines allow for highly specific genetic manipulation in a wide range of organs and tissues, however, similar tissue-specific drivers using alternative binary expression systems are not currently well developed. This study provides a large number of tissue and organ-specific LexA and QF2 driver lines that should be broadly useful for the Drosophila community.

      • While a minority of the driver lines do not express the expected pattern (likely due to cryptic regulatory elements in the LexA or QF2 sequences), the ability to generate drivers using two different Gal4 alternatives mitigates this issue (as in nearly all cases at least one of the two systems produces a clean driver line with the expected expression pattern).

      • The use of LexA-GAD provides an additional degree of control as it is subject to Gal80 repression. This could prove to be particularly useful in cases where a researcher wishes to manipulate multiple genes using Gal4 and LexA-GAD drivers as the Gal80(ts) system could be used for simultaneous temporal control of both constructs.

      • The use of Fly Cell Atlas information to generate novel oenocyte-specific driver lines provides a useful proof-of-concept for constructing additional highly tissue-specific drivers.

      Weaknesses:

      • Since these reagents will most commonly be paired with existing Gal4 lines, adding information about corresponding Gal4 lines targeting these tissues and how faithfully the LexA and QF2 lines recapitulate these Gal4 patterns would be highly beneficial.

      It is outside the scope of this paper to analyze the expression patterns of the corresponding publicly available Gal4 lines. It is clear from the tissue specificity of the LexA-GAD and QF2 lines that they are expressed in the expected larval tissues based on the target genes. We have added a sentence in the discussion section noting “Further, we expect that there will also be differences between the expression pattern of corresponding Gal4 and the LexA-GAD/QF lines, as the latter were made by knock-in, while the former are often enhancer traps. However, based on our larval mounts and dissections, the stocks generated in this paper are highly specific to the expression pattern of the targeted genes.”

      • It is not stated in the manuscript if these transgenic lines and plasmids are currently publicly available. Information about how to obtain these reagents through Bloomington, Addgene, or TRiP should be added to the manuscript.

      We have added to the materials section that “All vectors described here that are required to produce new driver lines will be made available at Addgene.” And “All transgenic fly stocks described here will be made available at the Bloomington Drosophila Stock Center.”

      Reviewer #2 (Public Review):

      Zirin, Jusiak, and Lopes et al presented an efficient pipeline for making LexA-GAD and QF2 drivers. The tools can be combined with a large collection of existing GAL4 drivers for a dual genetic control of two cell populations. This is essential when studying inter-organ communications since most of the current genetic drivers are biased toward the expression of the central nervous system. In this manuscript, the authors described the methodology for efficiently generating T2A-LexA-GAD and T2A-QF2 knock-ins by CRISPR, targeting a number of genes with known tissue-specific expression patterns. The authors then validated and compared the expression of double as well as single drivers and found the tissue-specific expression results were largely consistent as expected. Finally, a collection of plasmids for LexA-GAD and QF,2 as well as the corresponding LexAop and QUAS plasmids were generated to facilitate the expansion of these tool kits. In general, this study will be of considerable interest to the fly community and the resources can be readily generalized to make drivers for other genes. I believe this toolkit will have a significant, immediate impact on the fly community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Lines 56-57: Janelia Flylight lines are not necessarily brain-specific - this collection has or could be screened in other tissues.

      Correct. We have altered this sentence to read: However, these lines were developed primarily for brain expression. Although they are often expressed in other tissues, they are not well suited for experiments targeting non-neuronal cell types

      • Line 197 - I don't see the referenced Figure S1 in the reviewer materials. It appears this is actually referencing panels LL and MM in Figure 2.

      Correct. We have fixed this error.

      • No information on the injection efficiency to create the CRISPR knock-in lines is presented. I am guessing the efficiency will be similar to that of other reported HDR-based CRISPR knock-ins, but if this information is available it would be useful to include it so that others know what to expect when injecting these vectors.

      We did not systematically assay the injection efficiency. However, we can say that it was in line with previous descriptions of CRISPR-based plasmid and ‘drop-in’ HDR methods. We have added a note in the methods that “Knock-in efficiencies were comparable to previous reports (Kanca et al. 2019; Kanca et al. 2022).”

      • Demonstration of successful multi-manipulation would strengthen the paper.

      We do not feel that this is necessary as there have been many papers showing combinatorial Gal4+LexA/QF experiments. An example from our lab can be seen in PMID: 37582831.

      • Also, are there approaches for efficiently constructing pairs of UAS/LexAOp or UAS/QUAS shRNA lines that would potentially streamline the genetics for multi-manipulation? Otherwise, this could be rather cumbersome to implement as one needs to combine a Gal4 line, a LexA/QF2 line (which will be constrained as to its chromosomal location by the target gene), and separate UAS-shRNA and LexAop/QUAS-shRNA constructs into the same fly.

      There are some recent innovations that are useful in this respect. We have added a sentence to the discussion that says: “There remains an unmet need for a single vector that would allow for UAS/LexAop/QUAS control of different shRNAs. However, recent innovations in multi module vectors and multiplexed drug-based genetics allow researchers to more efficiently generate UAS/QUAS/lexAop transgenic fly strains (Matinyan et al. 2021; Wendler et al. 2022).”

      • In Figure 5 - is the difference for the hh inserts attributable to the driver line or the GFP/mCherry construct (or differential ability to detect GFP/mCherry)? One could try visualizing hhL(-Q) with the LexAop-GFP line. I guess that the correspondence between the nubbin and hh result suggests that maybe QF2 is suppressed in the wing pouch, but this could also be the difference in the reporter constructs and it would be interesting to know if this difference is truly attributable to the driver constructs from the standpoint of knowing how consistent the QF/LexA patterns are expected to be.

      The difference is not attributable to GFP versus mCherry or the specific LexAop and QUAS lines that we used in figure 5. We tested the double knock-in and derivative single knock-ins with various QUAS and lexAop reporters and always observed the same pattern.

      Reviewer #2 (Recommendations For The Authors):

      There are a few points that should be clarified. A list of these specific points is provided below with the view that this could help the preparations of a stronger, improved paper.

      Line 50-51: "There have been no systematic studies comparing the two systems, with only anecdotal evidence to support one system over the other." It is unclear to me what the anecdotal evidence the authors referred to. Could the authors elaborate more on this part?

      Based on an examination of QUAS brains, Potter et al, 2010 (PMID 20434990) makes the claim that “The low basal expression of QUAS and UAS reporters provides significant advantage compared to the lexA binary expression system.”

      Shearin et al., 2014 (PMID: 24451596) compared Gal4/UAS, LexA/LexAop, and QF/QUAS reporter strength with the nompC driver and found that the QF system produced the strongest expression.

      While these observations might be true in the nervous system, it isn’t clear that this extends to other tissues, nor what effect this would have on gene knockdown experiments.

      There have been some reports that have explored swapping out a Gal4 insertion for a LexA or QF at the same locus. For example, Gohl et al. 2011 PMID: (PMID 21473015) mentions that “the majority of the swaps captured most features of the original GAL4 expression patterns. In some cases, however, either prominent features of the GAL4 pattern were lost or we observed new expression patterns. These changes may have resulted from differences in the strength or responsiveness of reporter lines. Alternately, the swap may have modified some combination of enhancer spacing and sequence composition flanking the promoter.”

      Line 61-62: "On average, each StanEx line expresses LexA activity in five distinct cell types, with only one line showing expression in just one tissue..." What's the evidence to support this claim?

      This observation comes from Figure S3 of Kockel et al. 2016 (PMID: 27527793), where the authors “analyzed a subset of 76 StanEx lines that are unambiguously inserted within, or adjacent to, a single known gene.” We cited this reference in the preceding sentence. To clarify, we have added the citation again for line 61-62.

      Line 63-65: "These findings are consistent with prior studies indicating that enhancers very rarely produce expression patterns that are limited to a single cell type in a complex organism (Jenett et al. 2012)." It might be worth expanding on the use of the split system to achieve high cell-type-specificity. Especially, there are growing resources using split-intein and T2A-split-GAL4 with the prediction of genes from single-cell RNA sequencing datasets.

      We agree that the split system is currently the premier method to produce the most specific driver lines. Indeed, our group has recently published a paper on the split-intein Gal4 system (see PMID 37276389). However, the tradeoff is that split systems usually require generation of transgenic lines, which becomes impractical for research involving two independent binary transcriptional systems, as the user would need to combine at least three driver components into single stocks, plus the UAS/QUAS/LexAop insertions. The ideal would be to generate complementary split insertions on the same chromosome, but we think a discussion of this is tangential to the thrust of our work here.

      The authors did not fully discuss the rationale of using LexA-GAD vs LexA-p65 or VP16AD throughout the manuscript. I assumed the main reason for choosing LexA-GAD was to be compatible with GAL80 suppression. It might be worth explicitly stating in the result (e.g., line 123 or in the introduction). Also, did the authors observe weak transcriptional activation using LexA-GAD? It has been shown that the strength of transactional activation is much weaker for GAL4AD than the p65 or VP16AD. This might be worth noting in the manuscript as well.

      We did briefly mention in the introduction that one disadvantage of the Flylight lines is that they “use a p65 transcriptional activation domain and therefore are not compatible with the Gal80 temperature sensitive Gal4 repression system.” We have expanded on this issue in the introduction which now says: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015).”

      We did not have any problems visualizing gene expression with fluorescent reporters. Nor did we have any difficulty obtaining knock-down phenotypes with ubiquitous drivers.

      Line 125-127. Is there a specific reason why the authors chose the SV40 terminator for the double driver construct but the Hsp70 terminator for the single driver construct?

      We found that the Hsp70 terminator gave slightly lower expression and decided to use this for the singles to avoid toxicity. For the doubles we chose the SV40, to compensate for reduced protein expressiojn of the second gene position.

      Line 144-146: "To verify the knock-ins, we PCR-amplified the genomic regions flanking the insertion sites and confirmed that the insertions were seamless and in-frame." Did the authors recover lines with indel introduced, resulting in out-of-frame insertion?

      Yes, we did see indels, which sometimes resulted in out of frame insertions, which were discarded. This result is in line with what we have observed with other CRISPR HDR knock-in experiments.

      The underlying reason might be out of the scope of this manuscript. However, it would still be helpful for the authors to speculate the potential reasons why the T2A-LexA-GAD and T2A-QF2 targeting the same insertion site showed very distinct expressions.

      It is outside the scope of this report to test this issue experimentally. We have a section in the discussion which does speculate as to the reason: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016). Alternatively, differences in the LexA-GAD and QF2 sequences, and sequence length, could impact the function of nearby gene regulatory regions.”

      Regarding the observation that the existence of 3XP3-RFP marker can interfere with the expression of T2A-LexA-GAD and T2A-QF2 expression in a case-by-case manner, it might be worth emphasizing in the discussion that the proper removal of 3XP3-RFP marker by Cre/LoxP recombination is important.

      We have added the following to the discussion: “Importantly, our knock-in constructs contain the 3XP3-RFP cassette for screening transformants. Perhaps due to interaction between the 3XP3 promoter and the regulatory regions of the target gene, we occasionally saw misexpression of the LexA-GAD/QF2 in the 3XP3 domain. We have therefore prioritized Cre-Lox removal of the 3XP3-RFP cassette from our knock-in stocks, and advise that users of the plasmids described here likewise remove the marker, following successful knock-in.”

      For Fig. 5B, 5F-G, the authors should elaborate more in the result section. For example, lines 215-217: "We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B)." It is unclear what the authors mean by "robust generation". Also, there is no description of the results in Fig. 5F-G.

      We have expanded this section for figure 5B, which now reads: “We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B). In the case of the hh line, 15 out of 36 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 14% recombinant offspring per parent. 20 out of 36 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent. In the case of the dpp line, 31 out of 32 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 30% recombinant offspring per parent. 17 out of 32 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent.

      We have also added a description for Figure 5F-G, which reads: “Recombinants were also independently verified by PCR of the insertions (Figure 5F-G), where we observed the expected smaller band sizes in the derivative T2A-QF2 and T2A-LexA-GAD relative to the parental double driver.”

      Line 229, minor error: "Into these vectors, ..."

      We have edited this to read: “We cloned shRNAs targeting forked (f) and ebony (e) genes into these vectors and assayed their phenotypes when crossed to ubiquitous LexA-GAD and QF2 drivers.”

      Line 238-240: "Both Tub-LexA-GAD and Tub-QF2 drivers generated knockdown phenotypes in the thorax when crossed to f and e shRNA lines. However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J)." The stated "stronger phenotypes" are not clear to me. It might be worth elaborating more.

      We have further clarified this by changing it to: “However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J). For example, Tub-LexA-GAD produced a fully penetrant f bristle phenotype (Figure 6F) while some wild-type bristles remained on the thoraces of Tub-QF2 f knockdown (Figure 6G). Neither Tub-LexA-GAD or Tub-QF2 was able to achieve the strength of phenotype generated by the T2A-LexA-GAD da knock-in line (compare the darkness of the cuticle caused by e knockdown in Figure 6H-J).”

      Line 257-250: "Our collection of T2A-LexA-GAD and T2A-QF2 and double driver vectors can be easily adapted to target any gene for CRISPR knock-in, with a high probability that the resulting line will accurately reflect the expression of the endogenous locus" The authors could refer to the recent gene-specific Trojan GAL4/split-GAL4 work to support the idea that these gene-specific T2A-GAL4/split-GAL4 drivers reflect better than the enhancer-based drivers.

      We have added the following sentence to the discussion: “The specificity achieved with this approach can also be seen in recent efforts to build collections of gene specific T2A-Split-Gal4 and T2A-Gal4 insertions (Kanca et al. 2019; Chen et al. 2023; Ewen-Campen et al. 2023).”

      Line 630: "Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression." It would be helpful to add the annotation on Fig. 3B to show the location of glial cell expression.

      We have added arrowheads on Figure 3 and the legend now reads: “Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression (white arrowheads).

      Line 650-651: "The fat body mCherry expression is also present in the reporter stock and does not indicate LexA-GAD activity." I did not get what the authors were trying to convey. Where did the fat body mCherry expression come from? Please elaborate more.

      We have changed this section to explain that “The fat body mCherry expression (yellow arrowhead) is from leakiness of the reporter stock and does not indicate LexA-GAD activity.”

      Line 679-680: "forked shRNA produced a forked bristles phenotype." Please add the annotation on the figures to show where the phenotypes were.

      We have added arrowheads and asterisks to the figure. The legend now reads: “(E-G) forked shRNA produced a forked bristles phenotype (white arrowheads). Note that some bristles retain a more elongated wild-type morphology with the Tub-QF2 driven forked knockdown (G, yellow asterisk).”

      Fig 1D-E and 4A-B. There is no description throughout the manuscript about QA, QS regulation as well as little GAL80ts regulation. It will confuse readers with a little fly genetic background. Please include the introductions of these regulations of different binary expression systems.

      We have added a section in the introduction, which states: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by the temperature sensitive Gal4 repressor, Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015). Like Gal80-based modulation of LexA-GAD, QF2 activity can also be regulated temporally by expressing QS, a QF repressor. QS repression of QF can be released by feeding flies quinic acid (Riabinina and Potter 2016).”

      Fig. 2, there are several ND in the figure without any explanation in the manuscript (e.g. Mef2 and He). In addition, the expression patterns look quite different between T2A-LexA-GAD and T2A-QF2 for some genes (e.g., mex1, Myo31DF), but the authors did not mention any of them in the manuscript. Please elaborate more.

      We have altered the Figure 2 legend as follows: “(A-KK) T2A-LexA-GAD knock-in lines crossed to a LexAop-GFP reporter and T2A-QF2 knock-in lines crossed to a QUAS-GFP reporter. Panels show 3rd instar larva. GFP shows the driver line expression pattern. RFP shows the 3XP3 transformation marker, which labels the posterior gut and anal pads of the larva. Gene names and tissues are on the left. We failed to obtain LexA-GAD knock-ins for Mef2 (E) and He (DD). (LL-MM) 3rd instar imaginal disc from the insertions in the nubbin (nub) gene. Note that most of the lines are highly tissue-specific and are comparable between the LexA-GAD and QF2 knock-ins. Insertions in the daughterless gene (da) and nub are an exception, as the T2A-LexA-GAD, but not the T2A-QF2, gives the expected expression pattern. Insertions in the gut-specific genes mex1 (X-Y) and Myo31Df (Z-AA) also differed between the LexA-GAD and QF2 drivers.”

      We have also added a note on the inconsistency of mex1 and Myo31Df in the discussion: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. Additionally, we found that the driver expression in the gut-specific genes, mex1 and Myo31Df differed between the LexA-GAD and QF2 transformants. In both cases the LexA-GAD was more broadly expressed along the length of the gut than the QF2. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016).”

      Fig. 4B, it is unclear why the hsp70 is present downstream of the enhancer of interest (upstream of T2A). Is it the molecular mark resulting from the cloning steps? Does it serve any specific purpose?

      This is the Drosophila hsp70 gene minimal promoter and is standard for many expression constructs in Drosophila. In the methods section we described how we made versions of the pMCS-T2A-QF2-T2A-LexA-GAD-WALIUM20 with and without tis minimal promoter: “We used pMCS-T2A-QF2-T2A-lexA0GAD-WALIUM20 for dpp-blk and pMCS-T2A-QF2-T2A-lexGAD-WALIUM20-alt (which lacks the hsp70 promoter) for Ilp2, since dpp-blk does not have a basal promoter, but the Ilp2 enhancer does.”

      Fig 5A. The resulting single T2A-QF2 and T2A LexA-GAD from the double driver parental lines retain the sequence of FRT3 upstream of the QF2 and LexA-GAD. I assume the FRT3 part will be translated and remain attached to QF2 and LexA-GAD. Is that correct? If so, would this cause any adverse effect?

      Correct. The FRT3 sequence is present in both the parental double and single derivatives. We can say that the additional amino acids do not prevent LexA-GAD or QF2 transcriptional activation. We do not know whether there may be other adverse effects, though we did not observe any.

      Fig. 5C-C'. It seems like the images of Fig. 5C-C' were the same as Fig. 4D-D'. If so, the authors should indicate that in the figure legend.

      We have made a note of this in the figure legend.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present a potentially useful model involving Ca2+ signaling in inflammasome activation. As it stands, it was felt that the data were not sufficient to support the model and the claims of the study are inadequately presented.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a complex unclear model involving Ca2+ signaling in inflammasome activation. The experimental approaches used to study the calcium dynamics are problematic and the results shown are of inadequate quality. The major claims of this manuscript are not adequately substantiated.

      Major concerns:

      (1) The analysis of lysosomal Ca2+release is being carried out after many hours of treatment. Such evidence is not meaningful to claim that PA activates Ca2+ efflux from lysosome and even if this phenomenon was robust, it is not doubtful that such kinetics are meaningful for the regulation of inflammasome activation. Furthermore, the evidence for lysosomal Ca2+ release is indirect and relies on a convoluted process that doesn't make any conceptual sense to me. In addition to these major shortcomings, the indirect evidence of perilysosomal Ca2+ elevation is also of very poor quality and from the standpoint of my expertise in calcium signaling, the data are incredulous. The use of GCaMP3-ML1, transiently transfected into BMDMs is highly problematic. The efficiency of transfection in BMDMs is always extremely low and overexpression of the sensor in a few rare cells can lead to erroneous observations. The overexpression also results in gross mislocalization of such membrane-bound sensors. The accumulation of GCaMP3-ML1 in the ER of these cells would prevent any credible measurements of perilysosomal Ca2+ signals. A meaningful investigation of this process in primary macrophages requires the generation of a mouse line wherein the sensor is expressed at low levels in myeloid cells, and shown to be localized almost exclusively in the lysosomal membrane. The mechanistic framework built around these major conceptual and technical flaws is not especially meaningful and since these are foundational results, I cannot take the main claims of this study seriously.

      Ans) We agree with the reviewer’s concern that transfection efficiency could be low in BMDMs together with possible mislocalization of GCAMP3-ML1. However, in our experiment, transfection of BMDM with test plasmids resulted in good expression of test proteins. Below, we present our data showing good transfection efficiency of BMDM cells, while a different plasmid was employed.

      Author response image 1.

      (2) The cytosolic Ca2+ imaging shown in Figure 1C doesn't make any sense. It looks like a snapshot of basal Ca2+ many hours after PA treatment - calcium elevations are highly dynamic. Snapshot measurements are not helpful and analyses of Calcium dynamics requires a recording over a certain timespan. Unfortunately, this technical approach has been used throughout the manuscript. Also, BAPTA-AM abrogates IL-1b secretion because IL-1b transcription is Ca2+ dependent - the result shown in figure 1D does not shed light on anything to do with inflammasome activation and it is misleading to suggest that.

      Ans) We agree with the reviewer’s concern that snapshot could lead to false conclusion. We have not traced cytosolic Ca2+ content after treatment with LPS + PA. However, we have traced lysosomal Ca2+ and ER Ca2+ for more than 15 min, which was presented in Figure 4B. We also agree with the comment that BAPTA-AM might affect transcription of pro-IL-1β. We have conducted immunoblot analysis after treatment with LPS+PA in the presence of BAPTA-AM. Protein band of pro-IL-1β was not affected by BAPTA-AM treatment suggesting no effect of BAPTA-AM on transcription or translation of pro-IL-1β, which was added to Figure 1D, as suggested.

      (3) Trpm2-/- macrophages are known to be hyporesponsive to inflammatory stimuli - the reduced secretion of IL-1b by these macrophages is not novel. From a mechanistic perspective, this study does not add much to that observation and the proposed role of TRPM2 as a lysosomal Ca2+ release channel is not substantiated by good quality Ca2+ imaging data (see point 3 above). Furthermore, the study assumes that TRPM2 is a lysosomal ion channel. One paper reported TRPM2 in the lysosomes but this is a controversial claim, with no replication or further development in the last 14 years. This core assumption can be highly misleading to readers unfamiliar with TRPM2 biology and it is necessary to present credible evidence that TRPM2 is functional in the lysosomal membrane of macrophages. Ideally, this line of investigation should rest on robust demonstration of TRPM2 currents in patch-clamp electrophysiology of lysosomes. If this is not technically feasible for the authors, they should at least investigate TRPM2 localization on lysosomal membranes of macrophages.

      Ans) We agree with the reviewer’s comment that TRPM2. However, we have shown that TRPM2 current was not activated in the plasma membrane of BMDMs after treatment with LPS+PA. We also agree with the reviewer’s comment that inflammatory cytokine release from TRPM2 KO cells or inflammasome response of TRPM2 KO macrophages to ROS or nanoparticles has been reported to be reduced; however, the role of TRPM2 in metabolic inflammation or inflammasome activation in response to lipid stimulators has not been shown, as discussed in the new lines 9-10 from the bottom of page 18. Regarding the role of lysosomal TRPM2 in inflammation, we have shown that bafilomycin A1 treatment abrogated increase of cytosolic Ca2+ by LPS+PA (Figure 3-figure supplement 1D), supporting the role of lysosome and lysosomal Ca2+ in inflammasome activation by LPS+PA.

      We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescence staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP-2. This result substantiating TRPM2 expression on lysosome of macrophages was incorporated as Figure 2-figure supplement 1A.

      (4) Apigenin and Quercetin are highly non-specific and their effects cannot be attributed to CD38 inhibition alone. Such conclusions need strong loss of function studies using genetic knockouts of CD38 - or at least siRNA knockdown. Importantly, if indeed TRPM2 is being activated downstream of CD38, this should be easily evident in whole cell patch clamp electrophysiology. TRPM2 currents can be resolved using this technique and authors have Trpm2-/- cells for proper controls. Authors attempted these experiments but the results are of very poor quality. If the TRPM2 current is being activated through ADPR generated by CD38 (in response to PA stimulation), then it is very odd that authors need to include 200 uM cADPR to see TRPM2 current (Fig. 3A). Oddly, even these data cast great doubt on the technical quality of the electrophysiology experiments. Even with such high concentrations of cADPr, the TRPM2 current is tiny and Trpm2-/- controls are missing. The current-voltage relationship is not shown, and I feel that the results are merely reporting leak currents seen in measurements with substandard seals. Also 20 uM ACA is not a selective inhibitor of TRPM2 - relying on ACA as the conclusive diagnostic is problematic.

      Ans) We agree with the reviewer’s comment that effects of apigenin and quercetin could be due to mechanisms other than inhibition of CD38-mediated inflammasome activation. Indeed, that is the reason we have used TRPM2 KO mice and cells. Small TRPM2 current after treatment with high concentrations of cADPr might suggest the minor role of plasma membrane of TRPM2 in macrophage. Regarding concern about ACA, we added data showing inhibition of IL-1β release in response to LPS+PA by ACA as a new Figure 3-figure supplement 1A.

      (5) TRPM2 is expressed in many different cell lines. The broad metabolic differences observed by the authors in the Trpm2-/- mice cannot be attributed to macrophage-mediated inflammation. Such a conclusion requires the study of mice wherein Trpm2 is deleted selectively in macrophages or at least in the cells of the myeloid lineage.

      Ans) We agree with the reviewer’s comment that TRPM2 in cells other than macrophage might have affected the results. Thus, we have conducted in vitro stimulation of TRPM2-KO primary peritoneal macrophages with LPS+PA. We have observed that IL-1β release of TRPM2-KO macrophages in response in vitro treatment with LPS+PA was significantly lower than that from wild-type macrophages (Figure 2C & D), showing the role of TRPM2 in macrophages in inflammasome activation by LPS+PA, which could be independent of TRPM2 in tissues or cells other than macrophages.

      (6) The ER-Lysosome Ca2+ refilling experiments rely on transient transfection of organelle-targeted sensors into BMDMs. See point #1 to understand why I find this approach to be highly problematic. Furthermore, the data procured are also not convincing and lack critical controls (localization of sensors has not been demonstrated and their response to acute mobilization of Ca2+ has not been shown to inspire any confidence in these results).

      Ans) We agree with the reviewer’s comment that transfection or ER-targeted Ca2+ sensor could have artifactual effects. However, we have studied ER-Lysosome Ca2+ experiment using not only GEM-CEPIAer but also using D1ER, a FRET-based ER Ca2+ sensor which has an advantage of short distance of molecular interaction. Thus, we believe that changes of ER Ca2+ after treatment with LPS+PA is not due to an artifactual effect. Multiple contact between VAPA and ORP1L (Figure 4E) also supports ER-lysosome contact, likely facilitating ER-lysosome Ca2+ flux.

      (7) Authors claim that SCOE is coupled to K+ efflux. But there is no credible evidence that SOCE is activated in PA stimulated macrophages. The data shown in Fig 4 supp 1 do not investigate SOCE in a reliable manner - the conclusion is again based on snapshot measurements and crude non-selective inhibitors. The correct way to evaluate SOCE is to record cytosolic Ca2+ elevations over a period of time in absence and presence of extracellular Ca2+. However, even such recordings can be unreliable since the phenomenon is being investigated hours after PA stimulation. So, the only definitive way to demonstrate that Orai channels are indeed active during this process is through patch clamp electrophysiology of PA stimulated cells.

      Ans) We agree with the reviewer’s comment that the final proof of SOCE activation is activation of Orai channel evidenced by electrophysiology. However, we have shown STIM1 aggregation colocalized with Ora1, which is another strong evidence of SOCE channel activation (Vaca L. Cell Calcium 47:199, 2010). Such a paper showing the role of SOCE aggregation in SOCE activation was incorporated in the text (line 4 from the bottom of page 10) and References.

      Reviewer #2 (Public Review):

      In this manuscript by Kang et. al., the authors investigated the mechanisms of K+-efflux-coupled SOCE in NLRP3 inflammasome activation by LP(LPS+PA, and identified an essential role of TRPM2-mediated lysosomal Ca2+ release and subsequent IP3Rs-mediated ER Ca2+ release and store depletion in the process. K+ efflux is shown to be mediated by a Ca2+-activated K+ channel (KCa3.1). LP-induced cytosolic Ca2+ elevation also induced a delayed activation of ASK1 and JNK, leading to ASC oligomerization and NLRP3 inflammasome activation. Overall, this is an interesting and comprehensive study that has identified several novel molecular players in metabolic inflammation. The manuscript can benefit if the following concerns could be addressed:

      (1) The expression of TRPM2 in the lysosomes of macrophages needs to be more definitively established. For instance, the cADPR-induced TRPM2 currents should be abolished in the TRPM2 KO macrophages. Can you show the lysosomal expression of TRPM2, either with an antibody if available or with a fluorescently-tagged TRPM2 overexpression construct?

      Ans) We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescent staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP2. This result was incorporated as Figure 2-figure supplement 1A.

      (2) Can you use your TRPM2 inhibitor ACA to pharmacologically phenocopy some results, e.g., about [Ca2+]ER, [Ca2+]LY, and [Ca2+]i from the TRPM2 knockout? Ans) We agree with the reviewer’s comment that the effect of ACA on other experimental results needs to be shown. We did not study the effect of ACA on Ca2+ flux; however, we have observed that ACA inhibited IL-1β release in response to LPS+PA. This data was incorporated as Figure 3-figure supplement 1A.

      Author response image 2.

      (3) In Fig. S4A, bathing the cells in zero Ca2+ for three hours might not be ideal. Can you use a SOCE inhibitor, e.g, YM-58483, to make the point?

      Ans) We agree with the reviewer’s comment that SOCE inhibitor experiment would be necessary in addition to the experiment employing zero Ca2+. In fact, we have already used two SOCE inhibitors (2-APB and BTP2) (Figure 4-fig. supplement 1 B-D. Particularly, BTP2 experiment could eliminate possible role of ER Ca2+ inhibition that might occur when 2-APB was employed.

      (4) In Fig. 1A, you need a positive control, e.g., ionomycin, to show that the GPN response was selectively reduced upon LP treatment.

      Ans) We did not employ ionomycin as a control in this study. In our previous study using other agents inducing lysosomal Ca2+ efflux, we have observed lysosomal Ca2+ efflux with intact subsequent ionomycin response. While we did not include ionomycin in the current paper, we are positive that ionomycin response would be preserved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      See Public Review.

      Reviewer #2 (Recommendations For The Authors):

      (5) In Fig. 4B, the red label should read "BAPTA-1 Dextran", but not "GAPTA-1 Dextran".

      (6) Writing should be improved in many sections.

    1. Author Response:

      Reviewer #1:

      Summary:

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas elimination of TRPV1 has the largest effect on neuronal responses. These findings are of importance, as there is still substantial discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

      Strengths:

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for the role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior.

      Weaknesses:

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. Here, there are a few aspects that are less convincing.

      (1) The authors study warmth responses using DRG neurons after three days of culturing. They propose that these "more accurately reflect the functional properties and abundance of warm-responsive sensory neurons that are found in behaving animals." However, the only argument to support this notion is that the fraction of neurons responding to warmth is lower after three days of culture. This could have many reasons, including loss of specific subpopulations of neurons, or any other (artificial?) alterations to the neurons' transcriptome due to the culturing. The isolated DRGs are not selected in any way, so also include neurons innervating viscera not involved in thermosensation. If the authors wish to address actual changes in sensory nerves involved in warmth sensing in TRPM2 or TRPV1 KO mice without disturbing the response profile as a result of the isolation procedure, other approaches would be needed (e.g. skin-nerve recordings or in vivo DRG imaging).

      We agree that there could be several reasons as to why the responses of cultured DRGs are reduced compared to the acute/short-term cultures. It is possible ––and likely–– that

      transcriptional changes happen over the course of the culturing period. It is also possible that it is a mere coincidence that the 3-day cultures have a response profile more similar to the in vivo situation than the acute cultures. In the revised manuscript, we will therefore tone down the claim that the 3-day cultures mirror the native conditions more appropriately.

      Nevertheless, our results clearly show that acute cultures have a response profile that is much more similar to damaged/”inflamed” neurons, irrespective of any comparison to the 3 daycultures. Therefore, we believe, it is helpful to include this data to make scientists aware that acute cultures are very different to non-inflamed native/in vivo DRG neurons that many researchers use in their experiments.

      In some experiments not shown in the first version of our manuscript, we applied the TRPchannel agonists Menthol, Capsaicin and AITC (mustard oil) consecutively in a few 3-day

      cultures. We also have Capsaicin responses from overnight cultures. We will attempt to correlate the percentage of the neurons responsive to these TRPV1, TRPM8 and TRPA1

      ion channel agonists in our cultures to the percentages of neurons found to express the respective TRP ion channels (TRPM8, TRPV1 and TRPA1) in vivo. While this type of

      analysis won’t prove that 3-day cultures are similar to the in vivo situation (even if there is good correlation between the in vitro and in vivo results), it might support the usage of 3-day cultures as a model.

      (2) The authors state that there is a reduction in warmth-sensitive DRG neurons in the TRPM2 knockout mice based on the data presented in Figure 2D. This is not convincing for the following reasons. First, the authors used t-tests (with FDR correction - yielding borderline significance) whereas three groups are compared here in three repetitive stimuli. This would require different statistics (e.g. ANOVA), and I am not convinced (based on a rapid assessment of the data) that such an analysis would yield any significant difference between WT and TRPM2 KO. Second, there seems to be a discrepancy between the plot and legend regarding the number of LOV analysed (21, 17, and 18 FOV according to the legend, compared to 18, 10, and 12 dots in the plot). Therefore, I would urge the authors to critically assess this part of the study and to reconsider whether the statement (and discussion) that "Trpm2 deletion reduces the proportion of warmth responders" should be maintained or abandoned.

      Yes, we agree that the statistical tests indicated by the referee are more appropriate/robust for the data shown in Figures 1F, 2D, and 4G.

      When we perform 2-way repeated measures ANOVA and subsequent multiple comparison test (with Dunnets correction) against Wildtype, for data shown in Fig. 2D, both the main effect (Genotype) and the interaction term (Stimulus x Genotype) are significant. The multiple comparison yields very similar result as in the current manuscript, with the difference that the TRPM2-KO data for the 2nd stimulus (~36°C) is borderline significant (with a p-value of p=0.050).

      Due to the possible dependence of the repeated temperature stimuli and the variability of each stimulus between FOVs (Fig. 2C), it is possible that a mixed-effect model that accounts for these effects is more appropriate.

      Similarly, for plots 1F and 4G, Genotype (either as main effect or as interaction with Time) is significant after a repeated measures two-way ANOVA. The multiple comparisons (with Bonferroni correction) only changed the results marginally at individual timepoints, without affecting the overall conclusions. The exception is Fig. 4G at 38°C, where the interaction of Time and Genotype is significant, but no individual timepoint-comparison is significant after Bonferroni correction.

      The main difference between the results presented above and the ones presented in the manuscript is the choice of the multiple comparison correction. We originally opted for the falsediscovery rate (FDR) approach as it is less prone to Type II errors (false negatives) than other methods such as Sidaks or Bonferroni, particularly when correcting for a large number of tests. However, we are mainly interested in whether the genotypes differ in their behavior in each temperature combination and the significant ANOVA tests for Fig. 1F and 4G support that point. The statistical test and comparison used in the current version of the manuscript, comparing behavior at individual/distinct timepoints, are interesting, but less relevant (and potentially distracting), as we do not go into the details about the behavior at any given/distinct timepoint in the assay.

      Therefore, and per suggestion of the reviewer, we will update the statistics in the revised version of the manuscript. Also, we will report the correct number of FOVs in the legend.

      (3) It remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is at all related to TRPM2 functioning as a warmth sensor in sensory neurons. As discussed above, the effects of the TRPM2 KO on the proportion of warmth-sensing neurons are at most very subtle, and the authors did not use any pharmacological tool (in contrast to the use of capsaicin to probe for TRPV1 in Figures S3 and S4) to support a direct involvement of TRPM2 in the neuronal warmth responses. Behavioral experiments on sensory-neuron-specific TRPM2 knockout animals will be required to clarify this important point.

      As mentioned above, we will tone down the correlation between the cellular and behavioral data and further stress the possibility that the Trpm2-KO phenotype is possibly related to the function of the ion channel outside of DRGs.

      (4) The authors only use male mice, which is a significant limitation, especially considering known differences in warmth sensing between male and female animals and humans. The authors state "For this study, only male animals were used, as we aimed to compare our results with previous studies which exclusively used male animals (7, 8, 17, 43)." This statement is not correct: all four mentioned papers include behavioral data from both male and female mice! I recommend the authors to either include data from female mice or to clearly state that their study (in comparison with these other studies) only uses male mice.

      In the studies by Tan et al. And Vandevauw et al. Only male animals were used for the behavioral experiments. Yarmolinsky et al. And Paricio-Montesinons et al. used both males and females while, as far as we can tell, only Paricio-Montesions et al. Reported that no difference was observed between the sexes. This is a valid point though -- when our study started 6-7 years ago, we only used male mice (as did many other researchers) and this we would now do differently. Nevertheless, we included some female mice in these experiments and will reevaluate if the numbers are sufficient so that we can generalize the phenotypes to both sexes or report differences in the revised ms.

      Wildtypes are all C57bl/6N from the provider Janvier. Generally, all lines are backcrossed to C57bl/6 mice and additionally inbreeding was altered every 4-6 generations by crossing to C57bl/6. Exactly how many times the Trp channel KOs have been backcrossed to C57bl/6 mice we cannot exactly state.

      Reviewer #3:

      Summary and strengths:

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior.

      Open questions and weaknesses:

      (1) Differences in the response features of cells expressing TRPM2 and TRPV1 are central and interesting findings but need further validation (Figures 3 and 4). To show differences in the dynamics and the amplitude of responses across different lines and stimulus amplitudes more clearly, the authors should show the grand average population calcium response from all responsive neurons with error bars for all 3 groups for the different amplitudes of stimuli (as has been presented for the thermal stimuli traces). The authors should also provide a population analysis of the amplitude of the responses in all groups to all stimulus amplitudes. Prior work suggests that thermal detection is supported by an enhancement or suppression of the ongoing activity of sensory fibers innervating the skin. The authors should present any data on cells with ongoing activity.

      We will include grand average population analysis of the different groups in the revised version.

      Concerning the point about ongoing activity: We are not sure if it is possible in neuronal cultures to faithfully recapitulate ongoing activity. Ongoing activity has been mostly recorded in skinnerve preparations (or in older studies in other types of nerve recordings) and there are only very few studies that show ongoing activity in cultured experiments and then the ongoing activity only starts in sensory neuron cultures when cultured for even longer time periods than 3 days (Ref.: doi: 10.1152/jn.00158.2018). We have very few cells that show some spontaneous activity, but these are too few to draw any conclusions. In any case, nerve fibers might be necessary to drive ongoing activity which are absent from our cultures.

      (2) The authors should better place their findings in context with the literature and highlight the novelty of their findings. The introduction builds a story of a 'disconnect' or 'contradictory' findings about the role of TRPV1 and TRPM2 in warm detection. While there are some disparate findings in the literature, Tan and McNaughton (2016) show a role for TRPM2 in the avoidance of warmth in a similar task, Paricio et al. (2020) show a significant reduction in warm perception in TRPM2 and TRPV1 knock out lines and Yarmolinksy et al. (2016) show a reduction in warm perception with TRPV1 inactivation. All these papers are therefore in agreement with the authors finding of a role for these channels in warm behavior. The authors should change their introduction and discussion to more correctly discuss the findings of these studies and to better pinpoint the novelty of their own work.

      Paricio-Montesinos et al. argue that TRPM8 is crucial for the detection of warmth, as TRPM8-KO animals are incapable of learning the operant task. TRPM2-KO animals and, to a smaller extent TRPV1-KO animals, have reduced sensitivity in the task, but are still capable of learning/performing the task. However, in our chamber preference assay this is reversed: TRPM2-KO animals lose the ability to differentiate warm temperatures while TRPM8 appears to play no major role. A commonality between the two studies is that while TRPV1 affects the detection of warm temperatures in the different assays, this ion channel appears not to be crucial.

      Similarly, Yarmolinsky et al. show that Trpv1-inactivation only increases the error rate in their operant assay (from ~10% to ~30%), without testing TRPM2. And Tan et al. show the

      importance of TRPM2 in the preference task, without testing for TRPV1.

      More generally, the choice of the assay, being either an operant task (Paricio-Montesinos et al. and Yarmolinsky et al.) or a preference assay without training of the mice (Tan et al. and our data here), might be important and different TRP receptors may be relevant for different types of temperature assays, which we will extend on in the discussion in the revised manuscript. While our results generally agree with the previous studies, they add a different perspective on the analysis of the behavior (with correlation to cellular data). We will adjust the manuscript to highlight the advances more clearly.

      (3) The responses of 60 randomly selected cells are shown in Figure 2B. But, looking at the TRPM2-/- data, warm responses appear more obvious than in WTs and the weaker responders of the WT group appear weaker than the equivalent group in the TRPV1-/- and TRPM2-/- data. This does not necessarily invalidate the results, but it may suggest a problem in the data selection. Because the correct classification of warm-sensitive neurons is central to this part of the study more validation of the classifier should be presented. For example, the authors could state if they trained the classifier using equal amounts of cells, show some randomly selected cells that are warm-insensitive for all genotypes, and show the population average responses of warm-insensitive neurons.

      The classifier was trained on a balanced dataset of 1000 (500 responders and 500 nonresponders), manually labelled traces across all 5 temperature stimuli. The prediction accuracy was 98%. We will describe more clearly how the classifier was trained and include examples and also show the population average responses in the revised manuscript.

      (4) The interpretation of the main behavioral results and justification of the last figure is presented as the result of changes in sensing but differences in this behavior could be due to many factors and this needs clarification and discussion. (i) The authors mention that 'crucially temperature perception is not static' and suggest that there are fluctuating changes in perception over time and conclude that their modelling approach helps show changes in temperature detection. They imply that temperature perceptual threshold changes over time, but the mouse could just as easily have had exactly the same threshold throughout the task but their motivation (or some other cognitive variable) might vary causing them to change chamber. The authors should correct this. (ii) Likewise, from their fascinating and high-profile prior work the authors suggest a model of internal temperature sensing whereby TRPM2 expression in the hypothalamus acts as an internal sensory of body temperature. Given this, and the slow time course of the behavior in chambers with different ambient temperatures, couldn't the reason for the behavioral differences be due to central changes in hypothalamic processing rather than detection by skin temperature? If TRPM2-/- were selectively ablated from the skin or the hypothalamus (these experiments are not necessary for this paper) it might be possible to conclude whether sensation or body temperature is more likely the root cause of these effects but, without further experiments it is tough to conclude either way. (iii) Because the ambient temperature is controlled in this behavior, another hypothesis is that warm avoidance could be due to negative valence associated with breathing warm air, i.e. a result of sensation within the body in internal pathways, rather than sensing from the external skin. Overall, the authors should tone down conclusions about sensation and present a more detailed discussion of these points.

      We are sorry that the statement including the phrase “crucially temperature perception is not static” is ambiguous; what we meant to say is that with the mouse moving across the two chambers, the animal experiences different temperatures over time (not that the perceptual threshold of the mouse changes). We will clarify this stament in the revised version of the manuscript.

      But even so, it could be that some other variable (motivation etc) makes the mouse change the chamber; we hypothesize that this variable (whatever it might be) is still modulated by temperature (at least this would be the likeliest explanation that we see).

      As for the aspect of internal/hypothalamic temperature sensing: we have included this possibility already in the discussion but will further emphasize this possibility in the revised manuscript.

      As for the point of negative valence mediated by breathing in warm air: yes, presumably this could also be possible. The aspect of valence is in interesting aspect by itself: would the mice be rather repelled from the (uncomfortable) hot plate or more attracted to the (more comfortable) thermoneutral plate, or both? Something to elucidate in a different study.

      (5) It is an excellent idea to present a more in-depth analysis of the behavioral data collected during the preference task, beyond 'the mouse is on one side or the other'. However, the drift-diffusion approach is complex to interpret from the text in the results and the figures. The results text is not completely clear on which behavioral parameters are analyzed and terms like drift, noise, estimate, and evidence are not clearly defined. Currently, this section of the paper slightly confuses and takes the paper away from the central findings about dynamics and behavioral differences. It seems like they could come to similar conclusions with simpler analysis and simpler figures.

      We will reassess the description of the drift diffusion model and explain it more clearly. Additionally, we will assess whether we can introduce the drift diffusion model and analysis better at the beginning of the study, subsequent to Figure 1 to have the model and this type of analysis coherent with the first behavior results (instead of introducing the model only at the very end).

      (6) In Figure 2D the % of warm-sensitive neurons are shown for each genotype. Each data point is a field of view, however, reading the figure legend there appear to be more FOVs than data points (eg 10 data points for the TRPV1-/- but 17 FOVs). The authors should check this.

      We check and make sure that in the revised manuscript the number of FOVs mentioned in the legend and the number shown in the Figure 2D are in agreement.

      (7) Can the authors comment on why animals with over-expression of TRPV1 spend more time in the warmest chamber to start with at 38C and not at 34C?

      This is an interesting observation that we did not consider before. A closer look at Figure 4H reveals that the majority of the TRPV1-OX animals, have a proportionally long first visit to the 38°C room. We can only speculate why this is the case. We cannot rule out that this a technical shortcoming of the assay and how we conduced it – but we don’t observe this for the wildtype mice, thus it is rather unlikely a technical problem. It is possible that this is a type of “freezing-” (or “startle-“) behavior when the animals first encounter the 38°C temperature. Freezing behaviors in mice can be observed when sudden/threatening stimuli are applied. It is possible that, in the TRPV1-overexpressing animals, the initial encounter with 38°C leads to activation of a larger proportion of cells (compared to WT ctrls), possibly signaling a “painful” stimulus, and thus leading to this startle effect. It is noteworthy, however, that with more stringent repeated measure statistics applied as suggested by the referees, the difference at the first measured time point in Fig. 4G is not significantly different anymore (see comment #2 above. This does not rule out that this might be a true effect, but such a claim would benefit from additional experiments that test such and hypothesis more rigorously.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1.1: The distinction of PIGS from nearby OPA, which has also been implied in navigation and ego-motion, is not as clear as it could be.

      Response1.1: The main “functional” distinction between TOS/OPA and PIGS is that TOS/OPA responds preferentially to moving vs. stationary stimuli (even concentric rings), likely due to its overlap with the retinotopic motion-selective visual area V3A, for which this is a defining functional property (e.g. Tootell et al., 1997, J Neurosci). In comparison, PIGS does not show such a motion-selectivity. Instead, PIGS responds preferentially to more complex forms of motion within scenes.

      Moreover, PIGS and TOS/OPA are located in differently relative to the retinotopic visual areas. Briefly, PIGS is located adjacent to areas IPS3-4 while TOS/OPA overlaps with areas V3A/B and IPS0 (V7). This point is now highlighted in the new experiment 3b and the new Figure 6. In this revision, we also tried to better highlight these point in sections 4.3, 4.4 and 4.5. (see also the response to the first comment from Reviewer #2).

      Reviewer 2:

      Comment 2.1: First, the scene-selective region identified appears to overlap with regions that have previously been identified in terms of their retinotopic properties. In particular, it is unclear whether this region overlaps with V7/IPS0 and/or IPS1. This is particularly important since prior work has shown that OPA often overlaps with v7/IPS0 (Silson et al, 2016, Journal of Vision). The findings would be much stronger if the authors could show how the location of PIGS relates to retinotopic areas (other than V6, which they do currently consider). I wonder if the authors have retinotopic mapping data for any of the participants included in this study. If not, the authors could always show atlas-based definitions of these areas (e.g. Wang et al, 2015, Cerebral Cortex).

      Response 2.1: We thank the reviewers for reminding us to more clearly delineate this issue of possible overlap, including the information provided by Silson et al, 2016. The issue of possible overlap between area TOS/OPA and the retinotopic visual areas, both in humans and non-human primates, was also clarified by our team in 2011 (Nasr et al., 2011). As you can see in Figure 6 (newly generated), and consistent with those previous studies, TOS/OPA overlaps with visual areas V3A/B and V7. Whereas PIGS is located more dorsally close to IPS3-4. As shown here, there is no overlap between PIGS and TOS/OPA and there is no overlap between PIGS and areas V3A/B and V7.

      To more directly address the reviewer’s concern, in this revision, we have added a new experiment (Experiment 3b) in which we have shown the relative position of PIGS and the retinotopic areas in two individual subjects (Figure 6). All the relevant points are also discussed in section 4.3.

      Comment 2.2: Second, recent studies have reported a region anterior to OPA that seems to be involved in scene memory (Steel et al, 2021, Nature Communications; Steel et al, 2023, The Journal of Neuroscience; Steel et al, 2023, biorXiv). Is this region distinct from PIGS? Based on the figures in those papers, the scene memory-related region is inferior to V7/IPS0, so characterizing the location of PIGS to V7/IPS0 as suggested above would be very helpful here as well. If PIGS overlaps with either of V7/IPS0 or the scene memory-related area described by Steel and colleagues, then arguably it is not a newly defined region (although the characterization provided here still provides new information).

      Response 2.2: The lateral-place memory area (LPMA) is located on the lateral brain surface, anterior relative to the IPS (see Figure 1 from Steel et al., 2021 and Figure 3 from Steel et al., 2023). In contrast, PIGS is located on the posterior brain surface, also posterior relative to the IPS. In other words, they are located on two different sides of a major brain sulcus. In this revision we have clarified this point, including the citations by Steel and colleagues in section 4.3.

      Comments 2.3: Another reason that it would be helpful to relate PIGS to this scene memory area is that this scene memory area has been shown to have activity related to the amount of visuospatial context (Steel et al, 2023, The Journal of Neuroscience). The conditions used to show the sensitivity of PIGS to ego-motion also differ in the visuospatial context that can be accessed from the stimuli. Even if PIGS appears distinct from the scene memory area, the degree of visuospatial context is an alternative account of what might be represented in PIGS.

      Response 2.3: The reviewer raises an interesting point. One minor confusion is that we may be inadvertently referring to two slightly different types of “visuospatial context”. Specifically, the stimuli used in the ego-motion experiment here (i.e. coherently vs. incoherently changing scenes) represent the same scenes, and the only difference between the two conditions is the sequence of images across the experimental blocks. In that sense, the two experimental conditions may be considered to have the same visuospatial “context”. However, it could be also argued that the coherently changing scenes provide more information about the environmental layout. In that case, considering the previous reports that PPA/TPA and RSC/MPA may also be involved in layout encoding (Epstein and Kanwisher 1998; Wolbers et al. 2011), we expected to see more activity within those regions in response to coherently compared incoherently changing scenes. These issues are now more explicitly discussed in the revised article (section 4.6).

      Reviewer 3:

      Comment 3.1: There are few weaknesses in this work. If pressed, I might say that the stimuli depicting ego-motion do not, strictly speaking, depict motion, but only apparent motion between 2s apart photographs. However, this choice was made to equate frame rates and motion contrast between the 'ego-motion' and a control condition, which is a useful and valid approach to the problem. Some choices for visualization of the results might be made differently; for example, outlines of the regions might be shown in more plots for easier comparison of activation locations, but this is a minor issue.

      Response 3.1: We thank the reviewer for these constructive suggestions, and we agree with their comment that the ego-motion stimuli are not smooth, even though they were refreshed every 100 ms. However, the stimuli were nevertheless coherent enough to activate areas V6 and MT, two major areas known to respond preferentially to coherent compared to incoherent motion.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed reading this article. I have a few suggestions for improvement:

      (1) Delineation from OPA: The OPA has been described in quite similar terms as PIGS, with its involvement in ego-motion (e.g., crawling, walking) and navigation in general (e.g., Dilks' recent work; Bonner and Epstein). The authors address the distinction in section 4.4. Unlike Kamps et al. (2016) and Jones et al. (2023), the authors found weak or no evidence for ego-motion in OPA. They explain this discrepancy with differences in refresh rates and different levels of spatial smoothing of the fMRI data. It is not clear why these fairly small methodological differences would lead to different findings of ego-motion in the OPA. Arguably, the OPA is the closest of the "established" scene areas to PIGS, both in anatomical location and in function. I would therefore appreciate a more detailed discussion of the differences between these two areas.

      Response: Jones et al. have also shown that ego-motion TOS/OPA activity when compared to scrambled scenes. This is fundamentally different than what we have shown here, which coherently vs. incoherently changing scenes (i.e. not a small difference). Also, Kamps et al. used static scenes as a control which, considering TOS/OPA motion-selectivity, have a large impact on TOS/OPA response.

      (2) Random effects analysis: The authors mention using a "random effects analysis" for several of their experiments. I would ask them to provide more details on what statistical models were used here. Were they purely random-effects models or actually mixed-effects models? What were the factors that entered into the analysis? Providing more detail would make the analysis techniques more transparent.

      Response: This point is now clarified in the Methods section.

      (3) Data and code availability: The authors write that data and code "are ready to be shared upon request." (section 2.5) In the spirit of transparency and openness, I strongly encourage the authors to make the data publicly available, e.g., on OSF or OpenNeuro. In particular, having probabilistic maps of PIGS available will allow other researchers to include PIGS in their analysis pipelines, making the current work more impactful.

      Response: We have made the probabilistic labels available to the public. This point is now highlighted in section 2.5.

      (4) Minor comments on the writing that caught my eye while reading the article:

      • Line 27: "in the human brain".

      Response: Done.

      -Line 30: I don't agree with the characterization of the previous model of scene perception as "simplistic." Adding one additional ROI makes it no less simplistic. Perhaps the authors can rephrase to make this slightly less antagonistic?

      Response: Done.

      • Line 71: it is not clear why NHPs are relevant here.

      Response: We decided to keep the text intact.

      • Line 138" "were randomized".

      Response: Done.

      • Line 152: "consisting".

      Response: Done.

      • Line 155: "sets" (plural).

      Response: Done.

      • Lines 253-255: Why were the 3T spatially smoothed but not the 7T data? This seems odd.

      Response: We kept the text intact.

      • Line 481: "we found strong motion selectivity" (remove "a").

      Response: Done.

      • Line 564: a word is missing, probably: "a stronger effect of ego-motion".

      Response: Done.

      • Line 591: "controlling spatial attention" (remove "the").

      Response: Done.

      • Line 591 and 594: Both sentences start with "However". I think the first of these should not because it is setting up the contrast for the second sentence.

      Response: Done.

      • Line 607: "higher-level" (hyphen).

      Response: Done.

      • Throughout the manuscript: adverbial phrases such as "(in)coherently changing" or "probabilistically localized" do not get a hyphen.

      Response: Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that "All data, codes and stimuli are ready to be shared upon request". Ideally, these materials should be deposited in appropriate repositories (e.g. OpenMRI, GitHub) and not require readers to contact the authors to obtain such materials.

      Other Comments:

      (a) The title ("A previously undescribed scene-selective site is the key to encoding ego-motion in natural environments") is potentially misleading - the work was not conducted in a natural environment. At best, you could say they are 'naturalistic stimuli'. Also, in what sense is PIGS "key" to encoding ego-motion - the study just shows sensitivity to this factor.

      Response: We changed the title to “naturalistic environments”.

      (b) Figure 1 - I'm not sure what point the authors are trying to make with Figure 1. The comparison is between a highly smoothed, group fixed-effects analysis and a less-smoothed individual subject analysis. The differences between the two could reflect group vs. individual, highly-smoothed (5 mm) versus less-smoothed (2 mm), or differences in thresholding. If the thresholding were lower for the group analysis, it would probably start to look more similar to the individual subject. As it stands, this figure isn't particularly informative, it seems redundant with Figure 2, and Figure 1A is not even referenced in the main text. Further, fixed effects analyses are relatively uncommon in the recent literature, so their inclusion is unusual.

      Response: Figure 1A is a replication of the data/method used in Nasr et al., 2011 and it will help the readers see the difference between the “traditional” scene-selectivity maps generated based on group-averaging” vs. data from individual subjects. In this case, we decided not to change the Figure.

      (c) Figure 3 - why are the two sets of maps shown at different thresholds? For 3B given the larger sample size, it is expected that the extent of the significant activations will increase. Currently the higher threshold for 3B and the smaller range for 3A is making the sets of maps look more comparable.

      Response: As the reviewer noticed, the number of subjects is larger in Figure 3B compared to 3A. The main point of this figure is to show that the PIGS activity center does not vary across populations. Considering this point, we decided not to change this figure.

      (d) Figure 10 - why is the threshold lower than used for other figures? It would be helpful if there was consistent thresholding across figures.

      Response: Experiment 6 and Experiment 1 are based on different stimuli (see Methods). Also, among those subjects who participated in Experiment 1, two subjects did not participate in Experiment 6. These points are already highlighted in the text.

      (e) Figures - how about the AFNI approach of thresholding and showing sub-threshold data at the same time? (Taylor et al, 2023, Neuroimage).

      Response: We highly appreciate the methodology suggested by Taylor and colleagues. However, our main point here is to show the center of PIGS activity. In this condition, showing an unthresholded activity map doesn’t have any advantage over the current maps. Considering these points, we decided not to change the figures.

      (f) Coherent versus incoherent scenes - there are many differences between the coherent and incoherent scenes. Arguing that it must be ego-motion seems a little premature without further investigation. Activity anterior to OPA has been associated with the construction of an internal representation of a spatial environment (Steel et al., 2023, The Journal of Neuroscience). Could it be that this is the key effect, not really the ego-motion?

      Response: In this revision, we discussed the study by Steel et al., 2021 and 2023 in section 4.3.

      Reviewer #3 (Recommendations For The Authors):

      Overall, I think this is already an excellent contribution. The suggestions I have are minor and may help with the clarity of the results.

      (1) My main request of the authors would be to provide more points of reference in some of the figures with cortical maps. In many cases, the authors use arrows to point to the locations of activations of interest. However, the arrows in adjacent figures are often not placed in exactly the same places on maps that are meant to be compared. It would very much help the viewer to compare activations if the arrows pointing to activations or regions of interest were placed in identical locations for the same brains appearing in different sub-panels (e.g. in panels A and B of Figure 1). The underlying folds of the cortical surface provide some points of reference, but these are often occluded to different extents by data in figures that are meant to be compared.

      Response: To address the reviewer’s concern, we regenerated Figure 8 (Figure 7 in the previous submission) and we tried to put arrowheads in identical locations, as much as possible. Especially for PIGS, this point was also considered in Figures 2 and 3.

      (2) Outlines (such as those in Figure 5) are also very useful, and I would encourage broader use of them in other figures (e.g. Figures 7, 10, and 12). Figures 10 and 12 are on the fsaverage surface, so the same outlines could be used for them as for Figure 5.

      To be clear, it's possible to apprehend the results with the figures as they are, but I think a few small changes could help a lot.

      Response: In this revision, we added outlines to Figures 11 and 13 (Figure 10 and 12 in the previous submission). We did not add the outline to Figure 8 because it made it hard to see PIGS. Rather we used arrows (see the previous comment).

      Other minor points:

      In the method for Experiment 4, the authors write: "Other details of the experiment were similar to those in Experiment 1.". Similar or the same? The authors should clarify this statement, e.g. "the number of images per block, the number of blocks, the number of runs were the same as Experiment 1" - with any differences noted.

      Response: This point is now addressed in the Methods section.

      In Figure 8, it would be better to have the panel labels (A, B, C, D) in the upper left of each panel rather than the lower left.

      Response: We tried to keep the panels arrangement consistent across the figures. That is why letters are positioned like this.

      A final gentle suggestion: pycortex (http://github.com/gallantlab/pycortex) provides a means to visualize the flattened fsaveage surface with outlines for localized regions of interest and overlaid lines for major sulci. Though it is by no means necessary for publication, It would be lovely to see these results on that surface, which is freely available and downloadable via a pycortex command (surface here: https://figshare.com/articles/dataset/fsaverage_subject_for_pycortex/9916166)

      Response: We thank the reviewer for bringing pycortex to our attention. We will consider using it in our future studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings characterising the genomic features of E. coli isolated from neonatal meningitis from seven countries, and documents bacterial persistence and reinfection in two case studies. The genomic analyses are solid, although the inclusion of a larger number of isolates from more diverse geographies would have strengthened the generalisability of findings. The work will be of interest to people involved in the management of neonatal meningitis patients, and those studying E. coli epidemiology, diversity, and pathogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses whole genome sequencing to characterise the population structure and genetic diversity of a collection of 58 isolates of E. coli associated with neonatal meningitis (NMEC) from seven countries, including 52 isolates that the authors sequenced themselves and a further 6 publicly available genome sequences. Additionally, the study used sequencing to investigate three case studies of apparent relapse. The data show that in all three cases, the relapse was caused by the same NMEC strain as the initial infection. In two cases they also found evidence for gut persistence of the NMEC strain, which may act as a reservoir for persistence and reinfection in neonates. This finding is of clinical importance as it suggests that decolonisation of the gut could be helpful in preventing relapse of meningitis in NMEC patients.

      Strengths:

      The study presents complete genome sequences for n=18 diverse isolates, which will serve as useful references for future studies of NMEC. The genomic analyses are high quality, the population genomic analyses are comprehensive and the case study investigations are convincing.

      We agree

      Weaknesses:

      The NMEC collection described in the study includes isolates from just seven countries. The majority (n=51/58, 88%) are from high-income countries in Europe, Australia, or North America; the rest are from Cambodia (n=7, 12%). Therefore it is not clear how well the results reflect the global diversity of NMEC, nor the populations of NMEC affecting the most populous regions.

      The virulence factors section highlights several potentially interesting genes that are present at apparently high frequency in the NMEC genomes; however, without knowing their frequency in the broader E. coli population it is hard to know the significance of this.

      We acknowledged the limitations of our NMEC collection in the Discussion. We agree the prevalence of virulence factors in our collection is interesting. The limited size of our collection prevented further evaluation of the prevalence of these virulence factors in a broader E. coli population.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a robust genomic dataset profiling 58 isolates of neonatal meningitis-causing E. coli (NMEC), the largest such cohort to be profiled to date. The authors provide genomic information on virulence and antibiotic resistance genomic markers, as well as serotype and capsule information. They go on to probe three cases in which infants presented with recurrent febrile infection and meningitis and provide evidence indicating that the original isolate is likely causing the second infection and that an asymptomatic reservoir exists in the gut. Accompanying these results, the authors demonstrate that gut dysbiosis coincides with the meningitis.

      Strengths:

      The genomics work is meticulously done, utilizing long-read sequencing.

      The cohort of isolates is the largest to be sampled to date.

      The findings are significant, illuminating the presence of a gut reservoir in infants with repeating infection.

      We agree

      Weaknesses:

      Although the cohort of isolates is large, there is no global representation, entirely omitting Africa and the Americas. This is acknowledged by the group in the discussion, however, it would make the study much more compelling if there was global representation.

      We agree. In the Discussion we state this is likely a reflection of the difficulty in acquiring isolates causing neonatal meningitis, in particular from countries with limited microbiology and pathology resources.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Schembri et al performed a molecular analysis by WGS of 52 E. coli strains identified as "causing neonatal meningitis" from several countries and isolated from 1974 to 2020. Sequence types, virulence genes content as well as antibiotic-resistant genes are depicted. In the second part, they also described three cases of relapse and analysed their respective strains as well as the microbiome of three neonates during their relapse. For one patient the same E. coli strain was found in blood and stool (this patient had no meningitis). For two patients microbiome analysis revealed a severe dysbiosis.

      Major comments:

      Although the authors announce in their title that they study E. coli that cause neonatal meningitis and in methods stipulate that they had a collection of 52 NMEC, we found in Supplementary Table 1, 29 strains (therefore most of the strains) isolated from blood and not CSF. This is a major limitation since only strains isolated from CSF can be designated with certainty as NMEC even if a pleiocytose is observed in the CSF. A very troubling data is the description of patient two with a relapse infection. As stated in the text line 225, CSF microscopy was normal and culture was negative for this patient! Therefore it is clear that patient without meningitis has been included in this study.

      We have reviewed the clinical data for our 52 NMEC isolates, noting that for some of the older Finish isolates we relied on previous publications. This data is shown in Table S1. To address the Reviewer’s comment, we have added the following text to the methods section (new text underlined).

      ‘The collection comprised 42 isolates from confirmed meningitis cases (29 cultured from CSF and 13 cultured from blood) and 10 isolates from clinically diagnosed meningitis cases (all cultured from blood).’

      Patient 2 was initially diagnosed with meningitis based on a positive blood culture in the presence of CSF pleocytosis (>300 WBCs, >95% polymorphs). We understand there may be some confusion with reference to a relapsed infection, which we now more accurately describe as recrudescent invasive infection in the revised manuscript.

      Another major limitation (not stated in the discussion) is the absence of clinical information on neonates especially the weeks of gestation. It is well known that the risk of infection is dramatically increased in preterm neonates due to their immature immunity. Therefore E. coli causing infection in preterm neonates are not comparable to those causing infection in term neonates notably in their virulence gene content. Indeed, it is mentioned that at least eight strains did not possess a capsule, we can speculate that neonates were preterm, but this information is lacking. The ages of neonates are also lacking. The possible source of infection is not mentioned, notably urinary tract infection. This may have also an impact on the content of VF.

      We agree. In the Discussion we now note the following (new text underlined):

      ‘… we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants.’

      Submission to Medrxiv, a requirement for review of our manuscript at eLife, necessitated the removal of some patient identifying information, including precise age and detailed medical history.

      Sequence analysis reveals the predominance of ST95 and ST1193 in this collection. The high incidence of ST95 is not surprising and well previously described, therefore, the concluding sentence line 132 indicating that ST95 E. coli should exhibit specific virulence features associated with their capacity to cause NM does not add anything. On the contrary, the high incidence of ST1193 is of interest and should have been discussed more in detail. Which specific virulence factors do they harbor? Any hypothesis explaining their emergence in neonates?

      We compared the virulence factors of ST95 and ST1193 and summarized this information in Figure 4. We also discussed how the K1 polysialic acid capsule in ST95 and ST1193 could contribute to the emergence of these STs in NM. Specifically, we stated the following: ‘We speculate this is due to the prevailing K1 polysialic acid capsule serotype found in ST95 and the newly emerged ST1193 clone [22, 37] in combination with other virulence factors [15, 28, 29] (Figure 4) and the immature immune system of preterm infants.’

      In the paragraph depicted the VF it is only stated that ST95 contained significantly more VF than the ST1193 strains. And so what? By the way "significantly" is not documented: n=?, p=?

      We compared the prevalence of known virulence factors between ST95 and ST1193, and showed that ST95 strains in our collection contained significantly more virulence factors than the ST1193 strains. The P-value and the statistical test used were included in Supplementary Figure 3. To address the reviewers concern, we have now also added this to the main manuscript text as follows (new text underlined):

      ‘Direct comparison of virulence factors between ST95 and ST1193, the two most dominant NMEC STs, revealed that the ST95 isolates (n = 20) contained significantly more virulence factors than the ST1193 isolates (n=9), p-value < 0.001, Mann-Whitney two-tailed unpaired test (Supplementary Table 1, Supplementary Figure 3).’

      The complete sequence of 18 strains is not clear. Results of Supplementary Table 2 are presented in the text and are not discussed.

      NMEC isolates that were completely sequenced in this study are indicated in bold and marked with an asterisk in Figure 1. This information is indicated in the figure legend and was provided in the original submission. All information regarding genomic island composition and location, virulence genes and plasmid and prophage diversity is included in Supplementary Table 2. This information is highly descriptive and thus we elected not to include it as text in the main manuscript.

      46 years is a very long time for such a small number of strains, making it difficult to put forward epidemiological or evolutionary theories. In the analysis of antibiotic resistance, there are no ESBLs. However, Ding's article (reference 34) and other authors showed that ESBLs are emerging in E. coli neonatal infection. These strains are a major threat that should be studied, unfortunately, the authors haven't had the opportunity to characterize such strains in their manuscript.

      We agree 46 years is a long time-span. The study by Ding et al examined 56 isolates comprised of 25 different STs isolated in China from 2009-2015, with ST1193 (n=12) and ST95 (n=10) the most common. Our study examined 58 isolates comprised of 22 different STs isolated in seven different geographic regions from 1974-2020, with ST1193 (n=9) and ST95 (n=20) the most common. Thus, despite differences in the geographic regions from which isolates in the two studies were sourced, there are similarities in the most common STs identified. The fact that we observed less antibiotic resistance, including a lack of ESBL genes, in ST1193 is likely due to the different regions from which the isolates were sourced. We acknowledged and discussed the potential of ST1193 harbouring multidrug resistance including ESBLs in our manuscript as follows:

      ‘Concerningly, the ST1193 strains examined here carry genes encoding several aminoglycoside-modifying enzymes, generating a resistance profile that may lead to the clinical failure of empiric regimens such as ampicillin and gentamicin, a therapeutic combination used in many settings to treat NM and early-onset sepsis [35, 36]. This, in combination with reports of co-resistance to third-generation cephalosporins for some ST1193 strains [22, 34], would limit the choice of antibiotic treatment.’

      Second part of the manuscript:

      The three patients who relapsed had a late neonatal infection (> 3 days) with respective ages of 6 days, 7 weeks, and 3 weeks. We do not know whether they are former preterm newborns (no term specified) or whether they have received antibiotics in the meantime.

      As noted above, patient ages were not disclosed to comply with submission to Medrxiv, a requirement for review of our manuscript at eLife.

      Patient 1: Although this patient had a pleiocytose in CSF, the culture was negative which is surprising and no explanation is provided. Therefore, the diagnosis of meningitis is not certain. Pleiocytose without meningitis has been previously described in neonates with severe sepsis. Line 215: no immunological abnormalities were identified (no details are given).

      We respectfully disagree with the reviewer. The diagnosis of meningitis is made unequivocally by the presence of a clearly abnormal CSF microscopy (2430 WBCs) and an invasive E. coli from blood culture. This does not seem controversial to the authors. We had believed it unnecessary to include this corroborative evidence, but have added the following to support our assertion:

      ‘The child was diagnosed with meningitis based on a cerebrospinal fluid (CSF) pleocytosis (>2000 white blood cells; WBCs, low glucose, elevated protein), positive CSF E. coli PCR and a positive blood culture for E. coli (MS21522).’

      On the contrary, the authors are surprised by the statement that CSF pleocytosis occurs in neonatal sepsis ‘without meningitis’ and do not know of any definitions of neonatal meningitis that are not tied to the presence of a CSF pleocytosis. Furthermore, the later isolation of E. coli from the CSF during the relapsed infection re-enforces the initial diagnosis.

      Patient 2: This patient had a recurrence of bacteremia without meningitis (line 225: CSF microscopy was normal and culture negative!). This case should be deleted.

      In a similar vein to the previous comment, we respectfully assert that this patient has clear evidence of meningitis (330 WBCs in the CSF, taken 24h after initiation of antibiotic treatment). In this case, molecular testing was not performed as, under the principle of diagnostic stewardship, it was not considered necessary by the clinical microbiologists and treating clinicians following the culture of E. coli in the bloodstream. We agree that this is not a case of recurrent meningitis, but our intention was to highlight the recrudescence of an invasive infection (urinary sepsis requiring admission to hospital and intravenous antibiotics) which we hypothesise has arisen from the intestinal reservoir. We did not state that all patients suffered from relapsed meningitis.

      Despite this, to address this reviewers concern, we have changed all reference to ‘relapsed infection’ to now read ‘recrudescent invasive infection’ in the revised manuscript.

      Patient 3: This patient had two relapses which is exceptional and may suggest the existence of a congenital malformation or a neurological complication such as abscess or empyema therefore, "imaging studies" should be detailed.

      This patient underwent extensive imaging investigation to rule out a hidden source. This included repeated MRI imaging of head and spine, CT imaging of head and chest, USS imaging of abdomen and pelvis and nuclear medicine imaging to detect a subtle meningeal defect and CSF leak. All tests were normal, and no abscess or empyema found.

      We have modified the text to include this information:

      Text in original submission: ‘Imaging studies and immunological work-up were normal.’

      New text in revised manuscript (underlined): ‘Extensive imaging studies including repeated MRI imaging of the head and spine, CT imaging of the head and chest, ultrasound imaging of abdomen and pelvis, and nuclear medicine imaging did not show a congenital malformation or abscess. Immunological work-up did not show a known primary immunodeficiency. At two years of age, speech delay is reported but no other developmental abnormality.’

      The authors suggest a link between intestinal dysbiosis and relapse in three patients. However, the fecal microbiomes of patients without relapse were not analysed, so no comparison is possible. Moreover, dysbiosis after several weeks of antibiotic treatment in a patient hospitalized for a long time is not unexpected. Therefore, it's impossible to make any assumption or draw any conclusion. This part of the manuscript is purely descriptive. Finally, the authors should be more prudent when they state in line 289 "we also provide direct evidence to implicate the gut as a reservoir [...] antibiotic treatment". Indeed the gut colonization of the mothers with the same strain may be also a reservoir (as stated in the discussion line 336). Finally, the authors do not discuss the potential role of ceftriaxone vs cefotaxime in the dysbiosis observed. Ceftriaxone may have a major impact on the microbiota due to its digestive elimination.

      We addressed the limitations of our study in the Discussion, including that we did not have access to urine or stool samples from the mother of the infants that suffered recrudescence, and thus cannot rule out mother-to-child transmission as a mechanism of reinfection. We have now added that we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants. The limitations of our study are summarised as follows in the Discussion (new text underlined):

      ‘This study had several limitations. First, our NMEC strain collection was restricted to seven geographic regions, a reflection of the difficulty in acquiring strains causing this disease. Second, we did not have access to a complete set of stool samples spanning pre- and post-treatment in the patients that suffered NM and recrudescent invasive infection. This impacted our capacity to monitor E. coli persistence and evaluate the effect of antibiotic treatment on changes in the microbiome over time. Third, we did not have access to urine or stool samples from the mother of the infants that suffered recrudescence, and thus cannot rule out mother-to-child transmission as a mechanism of reinfection. Finally, we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants.’

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It would be useful to mention the sample size (number of genomes analysed, n=58) in the abstract to give readers a sense of the scale of the analysis.

      We have added the number of genomes in the abstract as suggested (new text underlined).

      ‘Here we investigated the genomic relatedness of a collection of 58 NMEC strains spanning 1974-2020 and isolated from seven different geographic regions.’

      The term 'strain' is used throughout, it would be clearer to use 'isolates' to describe the biological material and 'genomes' when the unit being referred to is genome sequences. For example, lines 108-111 use 'strain' to mean the collection of 52 isolates but also uses 'strain' to mean the collection of 58 genomes including those of the 52 isolates that the authors sequenced plus a further 6 genomes of isolates that they do not have in their isolate collection.

      We have changed the term ‘strain’ to ‘isolate’ or ‘genome’ as suggested.

      Figure 1 (annotated phylogeny) is hard to read and interpret, as so much data is presented. It would assist readers if the authors could provide an interactive form of the phylogeny and metadata/genomic feature data discussed in the text, e.g. using microreact.org, so that details can be explored more easily.

      This is an excellent suggestion, and we created a project on microreact.org. This information has been added to the Figure 1 legend.

      https://microreact.org/project/oNfA4v16h3tQbqREoYtCXj-high-risk-escherichia-coli-clones-that-cause-neonatal-meningitis-and-association-with-recrudescent-infection.

      It would be useful to provide information on the frequency and/or distribution of the virulence factors in the broader E. coli population, to provide context for readers and to better understand the importance/significance of the high frequency of the reported virulence factors within NMEC.

      As noted above, we agree the prevalence of virulence factors in our collection is interesting. We discussed the prevalence of these virulence factors in our collection, and the detailed data is presented in Table S1. However, we also note a limitation in our study is the number of isolates, and thus we would prefer to avoid evaluation of the prevalence of these virulence factors in the context of a broader E. coli population. There are other studies that have examined NMEC virulence factors in the past; some examples are noted below, and we have now referenced these in our manuscript (note Ref 15 was suggested by Reviewer 3 in a comment below; PMID: 11920295).

      Ref 15: Johnson JR, Oswald E, O'Bryan TT, Kuskowski MA, Spanjaard L. Phylogenetic distribution of virulence-associated genes among Escherichia coli isolates associated with neonatal bacterial meningitis in the Netherlands. J Infect Dis 2002; 185(6): 774-84.

      Ref 28: Wijetunge DS, Gongati S, DebRoy C, et al. Characterizing the pathotype of neonatal meningitis causing Escherichia coli (NMEC). BMC Microbiol 2015; 15: 211.

      Ref 29: Bidet P, Mahjoub-Messai F, Blanco J, et al. Combined Multilocus Sequence Typing and O Serogrouping Distinguishes Escherichia coli Subtypes Associated with Infant Urosepsis and/or Meningitis. J Infect Dis. 2007; 196(2):297-303.

      I suggest avoiding the term 'global' to describe the collection, given that there are only seven countries included in the collection and two of the most populous continents (Africa and South America) are not represented at all.

      We agree, and now refer to our collection as ‘an NMEC strain collection from geographically diverse locations.’

      Reviewer #2 (Recommendations For The Authors):

      This is a suggestion regarding discussion/food for thought: This study sheds information on genomic features and indicates the presence of a reservoir in the infected infant. Previous studies have demonstrated the presence of a reservoir in the vaginas of women with recurrent UTIs. Is there any information as to whether the mothers of these infants, especially the three with recrudescent infection, had a UTI or recurrent UTI in their life? It may be worthwhile discussing the potential of testing for E. coli in expecting mothers, if they have a history of UTI.

      We do not have such data, and as indicated above we note this as a limitation of our study.

      It is unclear as written in the main text, as to whether all three cases of recrudescent infection come from the same geographical location. It would be easier to have this information in the corresponding main text, in addition to the supplement.

      The three cases of recrudescent invasive infection were from 3 different locations. We have added the information as following (new text underlined):

      ‘These patients were from different regions in Australia.’

      Reviewer #3 (Recommendations For The Authors):

      Line 48 and 67 change the word "devasting".

      Changed as suggested.

      Line 49 second most in full-term infants.

      Changed as suggested.

      Line 56 delete the sentence "antibiotic resistance genes occurred infrequently".

      We changed the sentence, which now reads (new text underlined):

      ‘Antibiotic resistance genes occurred infrequently in our collection’.

      Line 76 reference 10 is inappropriate.

      Reference 10 reported that 5/24 infants treated for neonatal Gram-negative bacillary meningitis over a 10-year period had a relapse of meningitis after the initial course of treatment. Four of the isolates that caused these relapsed infections were E. coli.

      To address the reviewers concern, we have altered the text as follows (new text underlined):

      ‘Moreover, NMEC is an important cause of relapsed infections in neonates [10]’.

      Line 83 several references related to serotypes are missing, notably doi.org/10.1086/339343.

      We have added this reference.

      Line 171 significantly? n=?, p=?

      The numbers and P-value were provided in the Supplementary Figure 3 legend. We have now added these to the text as follows:

      ‘Direct comparison of virulence factors between ST95 and ST1193, the two most dominant NMEC STs, revealed that the ST95 isolates (n = 20) contained significantly more virulence factors than the ST1193 isolates (n = 9); P-value < 0.001, Mann-Whitney two-tailed unpaired test (Supplementary Table 1, Supplementary Figure 3).”

      Figure 4 is not necessary.

      We respectfully disagree. Figure 4 provides an illustrative comparison of virulence factors between the two most dominant NMEC sequence types, ST95 and ST1193. We believe this will be informative for many readers.

      Line 311 "We speculate....of preterm infants" This sentence does not add anything to the discussion.

      We respectfully disagree and have kept the sentence. This reflects our opinion.

      Line 320 "clear clinical risk factors to explain... ». Term of neonates is missing.

      Updated as follows (new text underlined):

      ‘Although reported rarely, recrudescent invasive E. coli infection in NM patients, including several infants born pre-term, has been documented in single study reports [39, 40]. In these reports, infants received appropriate antibiotic treatment based on antibiogram profiling and no clear clinical risk factors to explain recrudescence were identified, highlighting our limited understanding of NM aetiology.’

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of catalytic self-replication of polymers is an important question in the context of the origin of life. Tkachenko and Maslov present a model in which such a catalytic polymer sequence emerges from a random pool of replicating polymers.

      Strengths:

      The model is part of a theme from many previous papers from the same authors and their colleagues. The model is interesting, technically correct, and demonstrates qualitatively new phenomena. It is good that the paper also makes a connection with possible experimental scenarios -- specifically, concrete proposals are made for testing the core ideas of the model. It would indeed be an exciting demonstration when such an experiment does indeed materialize.

      Weaknesses:

      Unlike the rest of the paper which is very tight in its arguments, I find that the discussion section is not so. Specifically, sentences such as " In fact, this can be seen as a special case of the classical error catastrophe" are a bit loose and not well substantiated -- although these are in the discussion section, I find this to be a weakness of an otherwise good paper. Tightening some of the arguments here will make it an excellent paper in my opinion.

      We followed the reviewer's recommendations by streamlining the discussion and removing the potentially confusing comparison to the classic error catastrophe.

      Reviewer #2 (Public Review):

      Summary:

      The replication of information-coding polymers and the emergence of catalytic ribozymes pose significant challenges, both experimentally and theoretically, in the study of the RNA world hypothesis. In this context, Tkachenko et al. put forth a novel hypothesis regarding a replication oligomer system based on a cleavage ribozyme. They initially highlighted that the breakage of oligomers could contribute to self-replication, provided that these fragments function as primers for subsequent replications. Next, they proposed a self-replicating system of oligomers founded on a hammerhead structure that catalyzes cleavage. By a simple dynamical model, they demonstrated that such a system is self-sustainable in certain parameter regimes. Furthermore, they delved into discussions regarding the potential emergence of such a system and the evolution toward further optimized ribozymes.

      Strengths: Although the cleavage (hammerhead) ribozyme has been discussed in the context of the origins of life, the authors are the first to discuss how they could be selected using a mathematical model as far as I know. The idea is simple: ribozyme activity creates fragments by breakage of an oligomer, which works as a primer for the ribozyme itself, resulting in a positive feedback system (i.e., autocatalytic sets in a broader sense). This potentially enables us to resolve at the same time problems on the (i) supply of new primers (but note that there is a major concern on this as described in the 'weakness'), and (ii) the sustaining of the cleavage ribozyme.

      Weaknesses:

      The major weakness of their theory is that the ends of the new primers, formed through the breakage/cleavage of polymers, must be chemically active (as the authors have already emphasized in the last paragraph of their discussion) to enable further elongation. Reactivating the ends of preexisting oligomers without enzymes, to the best of our current knowledge, could be a challenging task. Although their model heavily relies on this aspect, the authors do not elaborate on it.

      We have added a discussion of the need for chemical activation: "It is important to note that in the context of RNA, such bidirectional elongation requires chemical activation of the phosphate group at the 5' end of the primer to provide free energy for the newly formed covalent bond. Like the polymerization process itself, achieving this without enzymes is biochemically challenging. One might speculate that prebiotic evolution relied on inorganic catalysis, such as on mineral surfaces, or involved polymers other than today's RNA."

      We also included in the discussion a comment on a possible combination of our mechanism and the Virtual Circle Genome model that would avoid the need for bidirectional growth: "It may be possible to incorporate the selection mechanism proposed in this paper into the Virtual Circle Genome model. Such a hybrid approach would avoid the need for the biochemically problematic bidirectional growth while explaining the emergence of early catalytic activity unaffected by sequence scrambling"

      Another weakness is in the setup of their discussion on evolutionary dynamics. While they claim that their model is robust against replication errors, their approach to evolutionary dynamics appears unconventional, and it remains unclear under what conditions their assumptions are founded. They treat a whole set of oligos as a subject of evolution, rather than each individual oligo. This may necessitate more complex assumptions, such as the encapsulation of sets of oligos inside a protocell, to be adequately rationalized. Thus, it remains uncertain whether the system is indeed robust against replication errors in a more natural context. For example, if a mutant oligo, denoted as b', arises due to an error in the replication of oligo b, and if b' has lower catalytic activity but replicates more rapidly than b, it may ultimately come to dominate the system.

      We agree with the reviewer that the evolutionary dynamics in multi-species ecosystems are somewhat complicated and potentially confusing. To this end, we have added the following text and citations to our discussion: "Note that this fitness is defined at the level of the ecosystem, comprising all sequences in the chemostat, and is not necessarily attributable to individual members of that population. Over time, similar to microbial ecosystems, this population changes according to the laws of competitive exclusion [34, 35]". However, we would like to point out that we assume that our model operates in a chemostat-like environment, which can be realized, for example, in a prebiotic pool supplied with a constant flux of monomers. Thus, the evolutionary dynamics described by our equations do not require encapsulation of sets of oligos in a protocell followed by selection of these protocells.

      Reviewer #3 (Public Review):

      Summary:

      Non-enzymatic replication of RNA or a similar polymer is likely to be important for the origin of life. The authors present a model of how a functional catalytic sequence could emerge from a mixture of sequences undergoing non-enzymatic replication.

      Strengths:

      Interesting model describing details of the proposed replication mechanism.

      Weaknesses:

      A discussion of the virtual circular genome idea proposed in [33] is included in the discussion section together with the problem of sequence scrambling faced by this mechanism that was raised in [34]. However, the authors state that sequence scrambling is a special case of the classical error catastrophe. This should be reworded, because these phenomena are completely different. The error catastrophe occurs due to single-point mutational errors in a model that assumes that a complete template is being copied in one cycle. Sequence scrambling arises in models that assume cycles of melting and reannealing, in which case only part of a template is copied in one cycle. Scrambling is due to the many alternative ways in which pairs of sequences can reanneal. Many of these alternatives are incorrect and this leads to the disappearance of the original sequence. This problem exists even in the limit where there is zero mutational error rate. Therefore, it cannot be called a special case of the error catastrophe problem.

      We followed the reviewer's recommendations and removed the potentially confusing comparison to the classic error catastrophe.

      The authors seem to believe that their model avoids the scrambling problem. If this is the case, a clear explanation should be added about why this problem is avoided. Two possible points are mentioned.

      (i) Replication is bidirectional in this model. This seems like a small detail to me. I don't think it makes any difference to whether scrambling occurs.

      (ii) The functional activity is located in a short sequence region. I can imagine that if the length of a strand that is synthesized in a single cycle is long enough to cover the complete functional region, then sometimes the complete functional sequence can be copied in one cycle. Is this what is being argued? If so, it depends a lot on rates of primer extension and lengths of melting cycles etc, and some comment on this should be made.

      As we now explain in the text, while the scrambling problem itself is not completely avoided in our model, it does not affect the replication of the functionally relevant regions of the oligomers. Our key observation is that, due to the simplicity of the cleaving enzymes, the length of the functionally relevant region is much smaller than the scrambling-free length. This can be seen from a back-of-the-envelope estimate of the scrambling-free length added to the text: "...assuming the minimal hybridization length l_0=6 and random statistics of the master sequence, one gets the scrambling free length \sqrt{2 x 4^l_0}+l_0 ~100. This is an order of magnitude larger than both l_0 and the length of the core region of the hammerhead ribozyme."

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have evaluated that the authors have proposed a novel mechanism potentially relevant to the origins of life, and they have explained it with a sufficiently simple model. However, I recommend that they address the following issues, including those I raised in the public review:

      • Title: I believe that the title "Emergence of catalytic activity in ..." is rather broad. Could it be more specific to accurately represent the system described in the paper? For instance, "Selective advantage (or selection) of the hammerhead cleavage ribozyme in..." may better encapsulate the paper's focus.

      We thank the reviewer for this suggestion. However, our mechanism is not unique to hammerhead ribozymes. So we decided to keep the old title.

      • One theoretically non-trivial aspect is the stability of the cooperative structure. Could the authors provide a more detailed explanation of what drives the instability of the system and what mechanisms restore its stability? For example, in a similar self-reproducing oligomer system with ribozymes and their fragments (Kamimura et al. PLoS Comp. 2019), the symmetry of fragments breaks because they effectively suppress each other's replication. Also, it would be beneficial to clarify the necessary assumptions for stability. (For instance, the authors assumed that a_L can serve as a primer for only a, while a_R can serve for both a and b.).

      We thank the reviewer for bringing this interesting paper to our attention. The cooperative fixed point in our model is intrinsically dynamically stable. It is an interesting point why the replicase in Kamimura et al can be dynamically unstable, while the ligase in our model is always stable. However, it goes beyond the scope of our study. We added the following discussion to the manuscript: "Note that the stability of our cooperative fixed point is a non-trivial result. For example, in a related model by Kamimura et al. [34], the fixed point corresponding to a viable composite replicase is dynamically unstable and requires additional stabilization, e.g., by cell-like compartments."

      • As mentioned in the public review, a critical aspect of the practical applicability of the theory is whether cleaved oligos can be reactivated and further elongated, especially through non-enzymatic pathways. Alternatively, is it possible with the presence of enzymes? While I appreciate the conceptual beauty of their model, I recommend that they at least address the difficulty or feasibility of achieving this.

      We addressed this point in response to the public review

      • As also mentioned, in the section on evolutionary dynamics, it's essential to clarify the unit of evolution and the assumptions made. For a system-level evolution (i.e., all the sets of oligos, a and b can be the unit of evolution), more detailed assumptions are required, such as the presence of compartments whose growth is coupled with the replication of oligos inside, and the competition between these compartments. I recommend the authors clarify these points.

      We addressed this point in response to the public review

      Reviewer #3 (Recommendations For The Authors):

      Assuming that the above points can be addressed, this reviewer would support publication with minor modifications.

      We addressed all points in response to the public review

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1

      Leanza et al. investigated the regulation of Wnt signaling factors in the bone tissue obtained from individuals with or without type 2 diabetes. They showed that typical canonical Wnt ligands and downstream factors (Wnt10b, LEF1) are down-regulated, while Wnt5a and sclerostin mRNA are unregulated in diabetic bone tissue. Further, Wnt5a and sclerostin associated with the content of AGEs and SOST mRNA levels also correlated with glycemic control and disease duration.

      Strengths:

      • A strength of the study is the investigation of Wnt signaling in bone tissue from humans with type 2 diabetes. Most studies measure only serum levels of Wnt inhibitors, but this study takes it further and looks into bone specifically.

      • The measurement of AGEs and its correlation to the Wnt signaling molecules is interesting and important. The correlation of sclerostin and Wnt5a with AGEs and disease duration suggests that inhibited Wnt signaling is paralleled by higher AGE levels and potentially weaker bone.

      • The methodology in terms of obtaining the bone samples and the rigorous evaluation of RNA integrity is great and provides a solid basis for further analyses.

      Weaknesses:

      • A weakness may include the rather limited number of samples. Especially for some sub-analyses (e.g. RNA analyses), only a subset of samples was used.

      • How was the sample size determined? It seems like more samples might have been necessary to obtain significant results for methods with a higher standard deviation (e.g. histomorphometry).

      We apology for the oversight in the description of the statistical analysis and we thank the reviewer for the careful reading. For sample size calculation of bone histomorphometry we used the cohort of the only paper analyzing trabecular bone in T2D postmenopausal women by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test, difference between two independent groups setting. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978. Regarding gene expression analyses, it was performed not in a subset of patients, but in all recruited subjects for this study. Based on the results of gene expression analysis on our main outcome (Wnt signaling), we demonstrated that for SOST gene the effect size was 1.2733824, with a power of 0.9490065, confirming that sample size was sufficient to achieve adequate statistical power.

      • Why is the number of samples different for the mRNA measurements? In most cases, there were 9, but in some 8 and in some 10?

      We sincerely thank the reviewer for the opportunity to clarify such important aspects. The number of samples used for mRNA quantification may differ between the different analyzed genes due to multiple reasons: First, we used for the real-time PCR only samples with high quality ratio (260/280) between 1.8-2.0 as stated in the method section of the manuscript (Page 8, lines 163-164). Moreover, we decided not to use the undetermined values, undetectable after the amplification cycles (40 cycles in total), as specified in the method section (Page 8, line 167).

      Overall, this study validates findings from the group that reported similar findings in 2020. This validates their methodology and shows that alterations in Wnt signaling are reproducible in human bone tissue.

      We thank the reviewer for the positive comment, we really value her/his opinion.

      COMMENTS:

      (1) The authors could provide more details on how much of the bone was analyzed for bone histomorphometry (what area?).

      We truly thank the reviewer for allowing us to explain more in depth our methodology. First, a biopsy containing trabecular bone from the femoral head was fixed in 10% neutral buffered formalin for 24 h prior to storage in 70% ethanol. Tissues were embedded in methylmethacrylate and sectioned sagittally by the Washington University Musculoskeletal Histology and Morphometry Core. Sections were stained with Goldner’s trichrome. Then, a rectangular region of interest containing trabecular bone was chosen below the cartilage-lined joint surface and primary spongiosa. This region had an average dimension of 45 mm2. Tissue processing artifacts, such as folding and edges, were excluded from the ROI. A threshold was chosen using the BIOQUANT software to automatically select trabeculae and measure bone volume. Finally, Osteoid was highlighted in the software and quantified semi-automatically using a threshold and correcting with the brush tool (as shown in the image below).

      We specify that in the methods section (Page 7, lines 146-152).

      Author response image 1.

      (2) Could the number of samples used for histomorphometry be increased? That may also lead to more significant results.

      We sincerely appreciated this suggestion from the reviewer but unfortunately, all available samples for histomorphometry have been analyzed and we are not able to increase the number of recruited participants at this time. Recruitment of people with T2D undergoing hip replacement is extremely difficult giving the limited number of those approved for elective surgery and compliant with our inclusion criteria. Considering also the long time needed to process bone sample for gene expression and histology analysis would require several months to have a consistent increase in recruited subjects. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      (3) It would have been interesting to assess the biomechanical behavior of the bone specimens. While it is known that BMD is often higher in patients with T2D, the resistance to fractures is lower. Ideally, bone strength measures could be correlated with Wnt molecule expression and AGEs.

      We agree with the reviewer that the assessment of biomechanical parameters in our cohort would increase the importance of this study, giving more insights on the effect of downregulation of Wnt signaling on bone strength. Thus, we followed reviewer suggestion, and we performed bone compression tests on trabecular bone core. We found a significant decrease in bone plasticity of T2D compared to controls [Young’s Modulus 21.6 (13.46-30.10 MPa) vs. 76.24 (26.81-132.9 MPa); p=0.0025). We added results of bone compression test in a new paragraph (Page 8, lines 191-194). In order to assess the validity of our results, we performed a post-hoc power calculation using G*Power 3.1.9.7. We demonstrated that effect size was 1.4716626, with a power of 0.9730784, confirming that sample size was sufficient to achieve adequate statistical power. We added methods in the related section and biomechanical data in table 3; we modified the manuscript accordingly (modifications are shown in track changes). Moreover, we also performed correlation analysis between Wnt target genes, AGEs and biomechanical parameters showing significant correlations as reported in the added paragraph in the results section (Page 11, Lines 225-233).

      REVIEWER #2

      This study reports the levels of expression of selected genes implicated in Wnt signaling in trabecular bone from femur heads obtained after surgery from post-menopausal women with (15 women) or without (21 women) type 2 diabetes. They found higher expression levels of SOST and WNT5A, and lower expression levels of LEF-1 and WNT10B in tissues from subjects with T2D, correlating with glycemia and advanced glycation products. No significant differences in bone density were observed. Overall, this is a cross-sectional, observational study measuring a limited set of genes found to vary with glycemia in postmenopausal women undergoing hip surgery.

      Strengths:

      The study demonstrates the feasibility of measuring gene expression in post-surgical trabecular bone samples, and finds differences associated with glycemia despite a relatively small number of subjects. It can form the basis for further research on the causes and consequences of changes in elements of the WNT signaling pathway in bone biology and disease.

      Weaknesses:

      The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations.

      We thank the reviewer for the comment. Replying to his/her concerns we have increased the number of Wnt target genes including more interactors of Wnt/β-catenin pathway. We measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figure 1 panel (Page 10, lines 210-213). Unfortunately, in this paper we were not able to perform experiments on cellular or physiological properties. However, in order to analyze the biological effect of the analyzed genes on the phenotype, we measured bone strength by performing compression tests on trabecular bone cores (Page 10, lines 201-203 and table 3) and used biomechanical parameters for correlation analysis with targeted genes showing significant correlations of bone strength and Wnt genes. We modified adding a new paragraph in the result section and a new figure panel to the main manuscript (Page 11, lines 225-233 and figure 4).

      COMMENTS:

      (1) The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. Given the author's success in obtaining good-quality RNA from trabecular bone, a more comprehensive exploration would greatly improve the quality of the study.

      We agree with the reviewer that increase the transcriptional landscape related to Wnt signaling would be of interest for this work and we really thank for this opportunity. We were able to increase the number of Wnt target genes including more interactors of Wnt/β-catenin pathway, using the same cohort of patients in which we performed the other analysis. We also measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figures panel (Page 10, lines 210-213 and Figure 1).

      (2) The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations. Can the authors perform immunohistochemistry to associate the changes in gene expression with protein expression?

      We sincerely acknowledge this comment for focusing the attention on a such important aspect. We have partially replied to this comment in the previous paragraph. Regarding immunohistochemistry analysis, it is not possible to further use the available samples. This is mainly due to the fact that non-decalcified bones were embedded in plastic to allow for separate analysis of newly formed osteoid and mineralized bone. This process leads to poor antigen preservation and unsuitable detection of most targets. Moreover, antibodies for Wnt are also unreliable due to the secreted nature of the protein. Overall, this approach is unlikely to work efficiently. Similarly, RNAscope is not possible due to the resin. Optimization and validation of these analyses will need to be saved for a future study with fresh specimens.

      REVIEWER #3

      The manuscript by Leanza and colleagues explores the regulation of Wnt signaling and its association with advanced glycation end products (AGEs) accumulation in postmenopausal women with type 2 diabetes (T2D). The paper provides valuable insights into the potential mechanisms underlying bone fragility in individuals with T2D. Overall, the manuscript is well-structured, and the methodology is sound. I would suggest some minor revisions to improve clarity.

      Strengths:

      The study addresses an important and clinically relevant question concerning the mechanisms underlying bone fragility in postmenopausal women with T2D.

      The study's methodology appears sound, and the inclusion of postmenopausal women with and without T2D undergoing hip arthroplasty adds to the clinical relevance of the findings. Additionally, measuring gene expression and AGEs in bone samples provides direct insights into the study's objectives.

      The manuscript presents data clearly, and the results are well-organized.

      Weaknesses:

      Title. The title could be more specific to better reflect the content of the study. Also, the abstract should concisely summarize the study's main findings, providing some figures.

      We thank the reviewer for this suggestion, and we modified the title giving specific information on the main findings of this study. The new title is “Bone canonical Wnt signaling is downregulated in type 2 diabetes and associates with higher Advanced Glycation End-products (AGEs) content and reduced bone strength”. Moreover, we added as suggested a graphical abstract summarizing our study results.

      Introduction: the introduction would benefit from the addition of a clearer, more focused statement of the research questions or hypotheses guiding this study.

      We thank the reviewer for this opportunity and we reformulated the hypothesis of this study based on our data and new findings as follow:” we hypothesized that T2D and AGEs accumulation downregulate Wnt canonical signaling and negatively affect bone strength”. (page 6, lines 116-117).

      Methods: more information is needed on the hystomorphometry analysis. Surgical samples from 8 T2D and 9 non-diabetic subjects were used for histomorphometry analysis. How did these subjects compare with the other subjects in the T2D and control groups? Were they representative? How were they selected?

      We thank the reviewer for the opportunity to clarify this important point. The number of subjects included in the different analysis of the paper differ for multiple reasons. In particular, we used only bone specimen with enough trabecular bone material adequate to perform histomorphometry analysis. Therefore, the samples used in the histomorphometry analysis belong to the same subjects enrolled in the study and analyzed for the other experiments of this paper. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      COMMENTS:

      (1) In the Abstract, values and p-values for comparisons, and Spearman's rho and p-values for correlations should be provided. Most adverbs (thus, accordingly, importantly) could be omitted to improve conciseness and clarity.

      We kindly thank the reviewers for this precise and careful comment. We changed the Abstract accordingly. According to the abstract style of the journal we initially reported only the main findings. We have now modified providing values and p values as requested. We defer to the wishes of the editor as to the format in which the abstract should be reported.

      (2) Result presentation: 25th and 75th percentile should be provided rather than the interquartile range, to better reflect data distribution.

      We thank the reviewer for the opportunity to better clarify this part of the results section. We changed the manuscript accordingly.

      (3) Estimated glomerular filtration rate should be calculated and provided as a marker of renal function, rather than serum creatinine values.

      We thank the reviewer for the comment, and we modify the manuscript accordingly, adding the eGFR values in table 1 and in the result section.

      (4) The manuscript should include a statement confirming compliance with the Declaration of Helsinki, considering that human subjects were involved in the study.

      We thank the reviewer for the comment. The study was conducted in accordance with the Declaration of Helsinki. Ethics Committee of Campus Bio-Medico University approved the present study. Informed consent was obtained from all subjects involved in the study. (Page 6, lines 134-137).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The paper addresses the important question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. However, the logic of the channel concept as employed here, as well as the claims regarding a sensorimotor basis for these channels, is incomplete and thus requires clarification and/or modification.

      Reviewer #1 Public Review

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each was dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis. Some questions do remain in the data, and there are aspects of the presentation that could be adjusted.

      • The use of a binary colormap for the correlation matrix seems unnecessary. Binary colormaps between two opposing colors (with white in the middle) are best for results spanning positive and negative values (say, correlation values between -1 and +1), but the correlations here are all positive, so a uniform colormap should be applied. I can appreciate that the authors were trying to emphasize that a 2+ channel system would lead to lower correlations at larger ratios, but that's emphasized better in the numerical ratio line plots.

      We agree and now changed the colour maps accordingly (Fig 1 and 3, p. 4 and 11). Thank you.

      • In Figure 1, the correlation matrices in Figure 1 appear blurred out. I am not sure if this was intentional but suspect it was not, and so they should appear like those presented in Figure 3.

      Sorry about that, it was a rendering problem. Now fixed.

      • It's notable that the authors also collected data on a timing task to rule out a duration-based strategy in the numerosity task. If possible, it would be great to have the author also conduct the rest of the analyses on the duration task as well; that is, to look at WF correlation matrices/ratios as well as PCA. There is evidence that duration processing is also distinctly sensorimotor, and may also rely on similar channels. Evidence either for or against this would likely be of great interest.

      We agree that investigating the existence of temporal channels would be of great interest, but it is goes beyond the scope of the current study. Out of curiosity, however, we analysed the duration data. Interestingly, signatures of sensorimotor channels (correlation gradient as a function on duration distance) emerge. Interestingly, this does not hold when correlating number against duration data. These results (if confirmed) would indicate the existence of independent mechanisms for the time and numerosity perception. Our research agenda is now proceeding in this direction.

      • For the duration task, there was no fast-tapping condition. Why not? Was this to keep the overall task length short?

      Yes, this was the main reason.

      • The number of subjects/trials seems a bit odd. Why did some subjects perform both and not others? The targets say they were presented "between 25 and 30 times", but why was this variable at all?

      The two experimental conditions were demanding, lasting around 2 hours each. Some participants, unfortunately, were available for just one slot. To make the two conditions similarly powered, we added some extra non-shared participants. Trials were divided into blocks of 55 trials (5 repetitions for each target). Most of the participants performed 6 blocks in both conditions, few of them (again for availability limits) performed 5 blocks.

      • For the PCA analysis, my read of the methods and results is that this was done on all the data, across subjects. If the data were run on individual subjects and the resulting PCA components averaged, would the same results be found?

      We thank the reviewer for giving us the opportunity to clarify the technique.

      In brief: we measured precision (Weber Fraction) in translating digits (target numbers) into corresponding action sequences. This creates a m by n matrix, each column (n) representing a participant, each row (m) a target number. This matrix was then submitted to PCA. The analyses provided two components. Each target number was assigned with two loading scores: one representing the loading on the 1st and one on the 2nd component. These loadings were than displayed as a function of targets, to describe the tunings. This analysis, by its nature, is across-participants and cannot be performed on individual data.

      • For the data presented in Figure 2, it would be helpful to also see individual subject data underlaid on the plots to get a sense of individual differences. For the reproduced number, these will likely be clustered together given how small the error bars are, but for the WF data it may show how consistently "flat" the data are. Indeed, in other magnitude reproduction tasks, it is not uncommon to see the WF decrease as a function of target magnitude (or even increase). It may be possible that the reason for the observed findings is that some subjects get more variable (higher WFs) with larger target numbers and others get less variable (lower WFs).

      We agree and now added individual data, confirming flat WF distributions (Fig 2 B&D).

      • Regarding the two-channel model, I wonder how much the results would translate to different ranges of numerosities? For example, are the two channels supported here specific to these ranges of low and high numbers, or would there be a re-mapping to a higher range (say, 32 to 64 dots) or to a narrower range (say 16 to 32 dots). It would be helpful to know if there is any evidence for this kind of remapping.

      This is the first study measuring sensorimotor channels for the transformation of numbers into action sequences. Whether these channels are modulated by the numerical context is an interesting open question that we are exploring through specific experimental conditions (now discussed at p. 17, lines 451-460).

      Reviewer #2 Public Review

      The authors wish to apply established psychophysical methods to the study of number. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another for encoding larger numbers (around 27).

      Strengths

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously, and for exploring new avenues in the study of numerical cognition.

      Weaknesses

      Inter-subject-correlation

      The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggled to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." As I understood it, the correlations are performed "between participants, for all targets values" - meaning that they are measuring the extent to which different participants' WFs vary together. But why is this a good measure of channels? This analysis seems to assume that if people have channels for numerical estimation, they will have the same channels, tuned to the same numerical ranges. But this is an empirical question - individual participants could have wildly different channels, and perhaps different numbers of channels (even in the tested range). If they do, then this between-subject analysis would mask these individual differences (despite the subtitle).

      Yes, the technique assumes that different individuals have similar channels, and the results confirm this. If everyone had different channels, or different numbers of channels, we would not have found this pattern of results: an ordered scaling of correlations as a function of numerical distance. As specified in the ms, however, this technique (at least as we used it) is not sensitive enough to identify the exact number of channels, so it may have smoothed the results, 'masking' the existence of more than two channels. To avoid possible confounds related to accuracy (reproduction biases), we used Weber Fraction, a standard index of normalized sensory precision (p. 7, lines 182-183).

      Different channels

      I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. However, as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." So I believe this technique does not provide more evidence for the existence of 2 channels as for the existence of 4 or 8 or 11 channels, the upper bound for a task testing 11 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      We recognise that the technique is not particularly intuitive, and we apologize for the lack of clarity.

      To clarify: we measured the precision in translating digit numbers into action sequences. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we calculated the reproduction precision (Weber Fraction). The dataset comprised a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction value. This dataset was then analysed with a simple correlation, across participants. For example, the WFs provided by the N participants when tested at the target number "8" were correlated with those obtained with the target number 10, 11, 13...32. The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers, scales with numerical distance, across participants: implying the existence of channels aggregating similar numbers (i.e. tuning selectivity). On the same dataset we than ran a PCA. This analysis provides two main components. Within each component, each target number is assigned with a loading score: one for the 1st and one for the 2nd component. These loading were plotted as a function of targets, to describe the tunings shape (i.e. channels).

      As stated above, we cannot really say exactly how many channels exist. These results should be interpreted as evidence for the existence of at least two channels for the transformation of numerical symbols into action sequences. This is not an obvious result at-all. There is no evidence in the literature for the existence of such mechanism in humans. In the animal (crow), there were found as many channels as the numbers tested. This does not contrast with our 2-channel results, but (very likely) arises from the different resolution of the techniques. Single cell recording has surely higher resolution compared to our interindividual covariance approach. In short, we believe that the channels revealed here are likely a coarse summary representation of several underlying channels.

      We now tried to make these points clearer (p. 7 lines 186-196; p. 15 lines 382-384; p. 16 lines 401-402):

      Several other questions arose for me when thinking through this technique. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? More broadly, I was unsure what advantages channels would have - that is - how in principle would having distinct channels for processing similar stimuli improve (rather than impede) discrimination abilities?

      This field of study is completely new, with many questions still open, including whether these channels are modulated by the numerical context such as the tested range and their extremes. The channels appear broad because, as stated above, they likely represent a coarse summary representation of several (probably sharper) underlying channels. We are now exploring the effect of numerical range and trying to modulate the tuning widths through ad-hoc experimental conditions. (p. 16 lines 401-402; p. 17 lines 450-459)

      No number perception

      I was uncertain about the analogy to studies of other continuous dimensions like spatial frequency, motion, and color. In those studies, participants view images with different spatial frequency, motion, or color - the analogy would be to see dot arrays containing different numbers of dots. Instead, here participants read written numerals (like "19"), symbols which themselves do not have any numerical properties to perceive. How does that difference change the interpretation of the effects? One disadvantage of using numerals is that they introduce a clear discontinuity: Our base-10 numerical system artificially chunks integers into decades, potentially causing category-boundary effects in people's reproductions.

      We used these sensory analogies to provide a flavour of the technique. The focus of the current study was on the individual differences in the numbers-to-actions transformation process. To this aim we decided to reduce the noise associated with the encoding of the sensory stimulus di per se. Digits encoding, at least with educated adults, is indeed noiseless, eliminating this source of variability. However, we agree that looking at non-symbolic formats would be interesting. We are now collecting data with dots and flash estimations. The results (so far) are largely in line with those found here, ensuring no chunking strategies, and confirming previous literature showing sensory numerosity selective channels in humans and animals. (p. 14 lines 351-355)

      Sensorimotor

      The authors wished to test for "sensorimotor mechanisms selective to numerosity" but it's not clear what makes their effects sensorimotor (or selective to numerosity, see below). It's true they found effects using a tapping task (which like all behaviour is sensorimotor), but it's not clear that this effect is specific to sensorimotor number reproduction. They might find similar effects for numerical comparison or estimation tasks. Such findings would suggest the effect may be a general feature of numerical cognition across modalities.

      Related to the above comment, the task here was to transform noiseless symbols (digits) into (noisy) numerical action sequences. Given that the source of variability is thus mainly driven by the sensory-to-action process, we believe that the task can be safely assumed to be considered sensorimotor in nature. (p. 14 lines 351-355)

      Yes, the same pattern of results might be found for numerical comparison or estimation tasks, but using non-symbolic formats (dots/flashes). Educated adults make no errors in naming or comparing such simple digits, making this covariance analysis impossible to be performed with digit verbal estimation or comparison tasks. However, to anticipate our future results, we have preliminary data for dots and flashes verbal estimation tasks (“how many?”). The data are suggesting similar results, consolidating the technique, and confirming the large literature showing sensory channels for purely visual numerosity. (p. 17 lines 453-455)

      Specific to numbers

      The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. (Given this, I am not sure what we stand to learn by comparing the two tapping speeds.) The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect. If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for numbers to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

      The fast and slow conditions were not meant to control for duration strategies but to test for the generalizability of the results over different tapping temporal dynamics (temporal frequency in this case). The results confirmed this.

      The control for duration strategies is the comparison between precision in reproducing durations or numbers. In the number-to-action task, participants were free to use any cues, including response duration. However, it is safe to assume that the performance is dominated by the most precise feature, number in this case. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 16 lines 418-420)

      Theories of numerical cognition.

      An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets (but see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

      The numbers used in this work were well above the subitizing limit (>N7). Indeed, the WFs found showed no signs of subitizing discontinuities. We believe that discussing the literature on subitizing here is too far from the scope of the current work.

      Additional public comments from the Reviewing Editor:

      (1) What, in the present work, makes the case that the operative mechanism is sensorimotor? The authors frame the discussion around a sensorimotor number system but the evidence here could be seen as using a sensorimotor task as one way to get at an amodal number channel. For example, the authors could do the same experiment but have people watch a circle that flashes on and off for n times, with participants reporting the number of flashes (or shown a number and asked to say more or less). They could then apply the same analyses as used here. If they got the same results, it would seem that this would be an argument against the channels being sensorimotor. I suppose if they did NOT get results in the perceptual task, then they would have (much) stronger evidence that the channels are somehow sensorimotor in nature. Either way, an experiment along these lines would be essential for addressing the nature of the channels (tied to sensorimotor or not).

      We chose to use this task because the perception of simple digits (like those used here), at least in educated adults, is noiseless. This ensures that the inter-individual variability remaining on the table is that related to the motor transformation process. For this reason, we believe that the task can be safely considered sensorimotor (see also Kirschhock & Nieder, Number selective sensorimotor neurons in the crow translate perceived numerosity into number of actions, Nature comm, 2022). (p. 14 lines 351-355)

      This is not true for verbal numerosity estimation of non-symbolic stimuli (such as dots and/or series of events). It is well known that the estimation of the latter stimuli is noisy, and there would be no sensorimotor transformation processing in the task. The inter-individual variability in estimation precision and thus the measurable channels would then reflect sensory numerosity tunings. These have been revealed with various techniques in both humans and animals. However, we are now following this idea and we have preliminary data showing that sensory channels are also detectable by the technique used in the current study. This in not in contrast with the sensorimotor nature of the channels found here, but instead indicating the existence of both sensory and sensorimotor number channels.

      The authors may argue that results from other studies such as the 2016 target article make the case about a sensorimotor basis of these channels. While I don't have a great grasp of this literature, my take on the 2016 target article is that the point was not about sensorimotor channels but about interactions between action and vision. This seems more in line with the idea of amodal number channels and indeed, they speak about a "generalized number sense" in that paper.

      The 2016 paper showed that a short period of hand tapping (adaptation) can distort visual numerosity perception. The results implied the existence of sensorimotor number channels, integrating non-symbolic numerosity (dots/flashes) and actions. The current study goes beyond this, describing (for the first time) sensorimotor channels transforming symbolic numbers into action sequences. Whether these channels are also in charge to encode non-symbolic numerosity is an interesting open question that we are currently investigating with cross-tasks analyses. If the same channels are in charge to respond to non-symbolic numerosity (across space and time: dots and sequences of visual/auditory events) as well as to translate digits into actions, we could than speck about a generalized sensorimotor number sense. At present, this remains a possibility, to be tested. (p. 17 lines 450-459)

      (2) There is a need for clarification on the method for creating the correlation matrices. The authors write that they look at correlations between Weber fractions between participants. By "between" do they mean "across"? That is, they calculate the Weber fraction for each individual for each cell. Then for a given cell, you correlate its Weber fraction with every other cell, using the pairs for each individual. I would call this "across" not "between." Is this just a semantic thing or have I misunderstood the process?

      To make this concrete, consider the correlation for cell 10/11. I assume it is something like

      10 11

      Subj1 .25 .31

      Subj2 .13 .09

      Subj3 .22 .16

      Etc

      And correlation across participants will be the data point for the 10/11 cell in the matrix.

      It is a semantic error; this is exactly what we did: across participants.

      To clarify better: we measured the precision in transforming numbers into sequences of actions. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we than calculated the reproduction precision (Weber Fraction). The dataset then consists of a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction. This dataset was then analysed with a simple correlation, across participants. For example, the WFs of the N participants obtained when testing the target number "8" were correlated with those obtained with the target numbers "10, 11, 13...32". The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers (across participants) scales with numerical distance, in line with the existence of channels that aggregate similar numbers (tunings).

      (p. 7 lines 186-196)

      (3) The duration data should be analysed. While n is small, can't the authors correlate WFs across tasks? Suppose a similar pattern is observed, suggestive of >1 channel in this between-task correlation.

      One of the strengths of this technique is that it is very general, it can be applied to virtually every stimulus feature. We are currently collecting data to test the existence of generalised sensorimotor channels for continuous magnitudes: space, time, and numerosity. The logic is exactly as suggested. These correlational analyses however require (relatively) large samples and ad-hoc experimental conditions. We do not feel confident in providing messages on this with 9 participants. Out of curiosity, however, we analysed the data as requested and the results are interesting: signatures of sensorimotor channels emerge in both the number and duration tasks but NOT when analysed in conjunction (cross-task). If these results will be confirmed, would indicate the existence of separate mechanisms for the encoding of time and numerosity (and perhaps also space?).

      (4) The finding of similar results for fast and slow is quite interesting. And provides good motivation to do the duration control experiment. But two issues related to the control experiment:

      (4a) Why not look at the correlation matrix for the duration task? Was this not done because there were only 9 participants? If so, why the small n here?

      Yes, that is the reason. The aim of this work is not to investigate the existence of duration channels. This experimental condition was designed as a control for the use of non-numerical strategies in the number task. It worked well. The results were already obvious with 9 individuals (confirming Kirschhock & Nieder, Nature comm, 2022); we then did not consider necessary to continue in this direction. However, related to the previous point, we run a preliminary analysis on this small data set and (as mentioned above) signatures of sensorimotor channels (correlation gradients) emerge in both number and duration tasks but NOT when analysed in conjunction (cross-task), indicating different mechanism. We are now pursuing this issue using different number and duration tasks.

      (4b) I don't follow why greater precision on the tapping task compared to the duration task makes a strong case against the duration hypothesis. Is the argument that, if based on duration, there should be greater precision on the duration task since the tapping task would exhibit the variability from duration PLUS added noise from tapping? If this is the argument, this should be spelled out.

      Yes. The more precise feature dominates behaviour. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 18 lines 418-420)

      (4c) Related to point 3 above, one would expect based on things like Rammsayer's study that duration judgments would also engage channels. Is the idea that these are different channels in the tapping task? There seems a good case to have participants do both tapping and duration tasks and then do the correlation matrices, comparing within and between tasks.

      Please see response to 3 and 4a.

      Recommendations for the authors:

      (1) On the logic of the channel concept as applied in the current context:

      While the authors present the numerical channel idea by analogy to how this concept is used for other features such as spatial frequency or orientation, there is no input to activate the channels-just a written numeral. The channel concept would mean that to respond to say, "16", you get output from multiple channels, with each weighted by its "tuning" to 16 such that the aggregate results in approximately 16 taps. This seems a bit odd: it would be like saying to draw, I use the output from my spatial frequency channels to create an image with a particular power spectrum. The logic of the channel concept in the current experimental context needs to be reviewed and clarified.

      The channel here reflects (probably) the activity of noisy neurons in charge to translate sensory information into a numerical motor output, such as those shown by Kirschhock & Nieder (Nature comm, 2022) in the crows. We used digits because their encoding (at least for such simple digits and educated adults) has no associated noise. The interindividual variability left, and analysed, is thus mainly associated with the motor transformation process, revealing sensorimotor channels.

      (2) A more thorough analysis of the duration task would strengthen the paper. The n is small for this interesting control condition and the analyses presented in the current version of the paper are limited. It is recommended to make this a fully powered test with complete analyses. Consider making this a new experiment in which participants do both the tapping and duration tasks to allow cross-modal analyses.

      We run some exploratory analyses on this, described in comments 3 and 4a. We prefer to leave this issue to dedicated future experiments (which are just started).

      (3) Expanded discussion of the limitations of the current study. The authors are clear that the methods don't provide a strong test of whether there are two or more than two channels. It would be useful to also comment on whether the estimated locations of the peaks are robust or if there is some sort of statistical bias for them to be at more extreme values. More generally, use the comments on the reviews to elaborate on various issues related to the channel concept.

      We addressed these issues in the ms (p. 17 lines 450-459).

      (4) Clarify the methods used to calculate the correlation matrix (see reviews).

      We now specified better the correlation analyses (p. 7 lines 186-196).

      (5) What is the basis for arguing that the mechanism under consideration is a "sensorimotor number system?" The data in this paper do not appear to provide evidence that the effects are linked to sensorimotor processes rather than reflect an amodal number system that is being accessed in their task through the motor system. At a minimum, present arguments for what motivates/justifies the sensorimotor claim or modify the paper to be neutral on this point.

      We now specified better the sensorimotor nature of the task used here (p. 14 lines 351-355; see also comment 1).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      Lujan et al make a significant contribution to the field by elucidating the essential role of TGN46 in cargo sorting and soluble protein secretion. TGN46 is a prominent TGN protein that cycles to the plasma membrane and it has been used as a TGN marker for many years, but its function has been a fundamental mystery.

      In parallel, it remains unclear how most secreted proteins are targeted from the Golgi to the cell surface. These molecules do not contain conserved sequence motifs or post-translation modifications such as lysosomal hydrolases. Cargo receptors for these secreted proteins have remained elusive.

      Therefore, these investigations are likely to have a significant influence on the field.

      To gain an insight into the molecular role of TGN46 in sorting, they systematically test the impact of the luminal, transmembrane, and cytosolic domains. Importantly and against the current thinking, they demonstrate that the luminal domain of TGN facilitates sorting. Interestingly, neither the cytosolic nor the length of the transmembrane domain of TGN46 plays a role in cargo export. The effects of TGN46 depletion are specific as membrane- associated VSVG remains unaffected.

      Interestingly, TGN46 luminal domain also plays an important role in the intracellular and intra-Golgi localization of TGN46, and it contains a positive signal for Golgi export in CARTS. Rigorous, well-performed data support the experimental evidence.

      A speculative part of the manuscript, with some accompanying experimental data, proposes that the luminal domain of TGN46 forms biomolecular condensates that help to capture cargo proteins for export.

      One important point to discuss is that the effects of TGN46 KO are partial, suggesting that TGN46 stimulates the Golgi export of PAUF but is not essential for this process. The incomplete block is apparent in Fig 1 and in Fig 5D.

      We thank the reviewers and the editorial team for their assessment and valuable feedback on our manuscript. Their supporting comments reinforce the significance of our findings.

      Regarding the specific point raised about the partial effects observed in the TGN46 KO cell line, we acknowledge the importance of this issue, and we have addressed it in more detail in the revised version of our manuscript. The partial effects observed when using the TGN46 KO cell line are likely caused by several factors:

      (1) It is important to consider the phenomenon of cell adaptation/compensation, which is documented to occur in gene knockout cell lines. Cells often respond to genetic perturbations by adapting to compensate the loss of a specific gene. These compensatory effects could potentially mitigate the full impact of TGN46 depletion and might explain the partial effects observed.

      (2) Our data indicate that the absence of TGN46 reduces PAUF secretion, but does not completely block its export. These results align with our proposed role TGN46 in cargo sorting. In its absence, the secretory proteins likely exit the TGN via alternative routes/mechanisms, such as "bulk flow" or by entering other transport carriers in an uncontrolled manner. The partial redistribution of the TGN46-∆lum mutant into VSVG carriers (Figure 4D) supports this likelihood. Importantly, similar situations are observed when unrelated sorting factors are depleted from the Golgi membranes. For example, when the cofilin/SPCA1/Cab45 sorting pathway is genetically disrupted, the secretion of this pathway's clients is inhibited but not completely halted (e.g., von Blume et al. Dev. Cell 2011; J. Cell Biol. 2012).

      (3) As suggested by the reviewers, it remains possible that TGN46 is not the sole player for cargo sorting. The existence of redundant or alternative mechanisms cannot be ruled out.

      In our revised manuscript, we have now provided a more in-depth discussion of these factors and their potential contributions to the observed partial effects in TGN46 KO cells (lines 447-463). We believe that a comprehensive exploration of these possibilities will improve our understanding of the role(s) of TGN46 in cargo sorting and TGN export.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      The reviewers were unanimously enthusiastic about your work. They felt that the manuscript could be significantly improved mostly through careful re-wording, additional explanations and some figure modifications.

      We thank the reviewers and the editorial team for their enthusiastic assessment of our findings. Their positive feedback is reassuring.

      We have now addressed the reviewers' suggestions to improve the clarity of our manuscript. Specifically, we have improved various aspects of the text that may have lacked clarity in the initial submission. This includes a thorough re-writing of respective sections to ensure that the content is more accessible and reader-friendly (see detailed answers to the additional points below). Furthermore, we have carefully followed the recommendations related to figure modifications.

      Please mention the species (human) in the title.

      We have changed the title according to the suggestion. The revised title now is: "Sorting of secretory proteins at the trans-Golgi network by human TGN46". In addition, we have also added the word "human" in the abstract ("... we identified the human transmembrane protein TGN46 as a receptor for the export of secretory cargo protein PAUF in CARTS ...").

      Additional points:

      The main Figures only show quantifications that are challenging to understand without fluorescence micrographs. We suggest putting the micrographs of the fluorescence images (Figures S2A and B) into the main Figure 2 (before 2B and 2C)-the same in Figures 3 and 4.

      Following the reviewers' suggestion, we have incorporated the fluorescence micrographs (included as figure supplements in the initial submission) into the main figures 2–5. Given that these additions have introduced a significant number of extra figure panels, we have carefully re-designed the figure layout to accommodate all the necessary information. This has involved that the FLIP data from old Figs. 2–4 is now included as a new Fig. 3; and the split of old Fig. 4 in the new Figs. 5,6. The supporting figures have also been rearranged accordingly. In addition, we have changed the color palette of the micrographs, in which now the dual-color images are presented in color-blind-friendly green and magenta, instead of green and red as previously. We believe that in this revised manuscript, all data and micrographs are clearly presented.

      For figures such as Fig. 1B, the mean and SD positions are hard to see for the data plotted as solid black dots. Maybe hollow circles would be better.

      The reviewers are right and we apologize for any difficulty in discerning the mean and SD positions from the figure. In our revised version, we have made the necessary modifications to all the figures where data points were plotted as solid black circles by converting them into empty black circles, as suggested by the reviewers.

      In the right side of Fig. 1A, is the difference in PAUF secretion between WT and KO cells truly significant? The meaning of the number of asterisks should be given in the legend. Only one asterisk is shown, suggesting that the significance is low.

      In our revised manuscript, we have included comprehensive information about the statistical significance, such as the statistical test used, p-values/asterisk meaning, and any other relevant details. In addition, we have included the lines connecting the individual data points corresponding to the different replicates of the secretion assays (WT vs KO).

      Experiments such as the one in Fig. 1C may be better described as iFRAP rather than FLIP.

      We appreciate the reviewers' attention to the experimental methods used, e.g., in Figure 1C. We actually performed FLIP experiments rather than iFRAP, and we acknowledge that this might not have been stated clearly in our initial submission. The distinction between iFRAP and FLIP lies in the frequency of photobleaching. In iFRAP, photobleaching occurs only once at the beginning of the experiment, whereas FLIP involves repeated photobleaching (FLIP is sometimes also referred to as "repeated iFRAP"), which was conducted in our experiments. Specifically, in our experiments we performed repeated photobleaching at a relatively slow rate (approximately once per minute; every two imaging frames).

      We understand the potential source of confusion, which may have arisen from the references we provided to introduce our FLIP experiments (Hirschberg et al. 1998; Patterson et al. 2008). In those papers, almost all results were obtained using iFRAP and not FLIP. In light of this feedback, we have made significant efforts in our revised manuscript to clarify the terminology and procedure used in our experiments (lines 148-154). These revisions have improved the understanding of our findings and we appreciate the reviewers' suggestions.

      When using iFRAP to measure the Golgi residence time of a TGN46 construct that has a cytosolic tail, shouldn't recycling from the plasma membrane be taken into account? Unlike a secreted protein, TGN46 will never show complete loss of signal from the Golgi.

      The reviewers are right: for a TGN46 construct that can efficiently recycle back to the TGN from the cell surface, an iFRAP experiment would not report solely the protein residence time at the Golgi. We concur with the reviewers, and we'd like to clarify that the reason we performed FLIP experiments, as opposed to iFRAP, was precisely to address this concern. In an iFRAP experiment, where photobleaching occurs only once at the beginning, the fluorescence decay within the Golgi area would indeed consist of two components: a decay due to the export of the protein and an increase in fluorescence due to the protein that had been exported (after the initial photobleaching) and then recycled back to the Golgi area. In contrast, our choice of conducting FLIP experiments, with repeated photobleaching of the pool of fluorescent protein outside the Golgi area (approximately once per minute), minimizes the influence of recycling. Consequently, the loss of fluorescence in the Golgi area in our FLIP experiments predominantly reflects the protein's export. We acknowledge that this distinction was not adequately communicated in our initial submission and we have emphasized these points in the revised version of the manuscript (lines 230-234).

      Lines 274 to 285 are confusing and controversial. The author argues that the transmembrane domain does not impact TGN localisation and cargo packaging. Later, they state, "These data further support the idea that the slower Golgi export rate of TGN46 mutants with short TMDs is a consequence of their compromised selective sorting into CARTS".

      We appreciate the reviewers' attention to the potential confusion regarding the impact of the TMD on TGN localization and cargo packaging. Actually, our results indicate that the length of the TMD does not seem to have an impact in intra-Golgi protein localization (Fig. 4B,C) but they do play a role in incorporation into CARTS (Fig. 4D,E). We have now clarified this in the text (lines 283-284; 296-297).

      That being said, these results were also surprising to us initially. However, upon closer examination of the amino acid sequence of the cytosolic domain of TGN46, we noticed a possible side effect of shortening its TMD. Shortening the TMD of TGN46 could lead to the partial burial of highly charged residues from TGN46 cytosolic tail (HHNKRK...) into the membrane, potentially affecting its behavior. For that reason, we constructed the TGN46 ∆cyt ST-TMD mutant, which features a short TMD (ST TMD) and lacks the potential interference from the cytosolic tail (see also lines 307-320). Notably, this mutant showed a phenotype similar to that of TGN46-Δcyt, and to that of full length TGN46, particularly in terms of intra-Golgi localization and CARTS specificity. We acknowledge that the interpretation of these results can be debated, and we have ensured that the revised manuscript captures these nuances. Additionally, we have realized that the organization and presentation of these results may have caused confusion, particularly concerning the placement of the results from the GFP-TGN46 ∆cyt ST-TMD mutant. To address this, we have reorganized old Figures 2 and 3 to ensure that the results of the GFP-TGN46 ∆cyt ST- TMD mutant are presented with the short TMD mutants. These adjustments have greatly improved the overall flow of our manuscript. We thank the reviewers for their valuable feedback.

      In lines 444-446 in the Discussion the argumentation is confusing. The experiment shows that the cytosolic domain of TGN46 has no impact on TGN46 localisation or cargo packaging into a nascent vesicle. At the same time, the authors mention that a cytosolic complex composed of Rab6 and p62 is required to generate CARTS.

      We are grateful for the reviewers' feedback regarding our argumentation in lines 444-446. Indeed, our results indicate that the cytosolic tail of TGN46 does not play a major role in packaging of TGN46 in CARTS and in PAUF secretion. However, it is important to acknowledge that our findings do not rule out the possibility that TGN46 might have a dual function at the TGN. It could potentially play a role in mediating or controlling the export of other cargo proteins by alternative mechanisms/routes, which could, in part, depend on its cytosolic domain.

      This complexity is consistent with the open question regarding the role of the cytosolic Rab6- p62 complex in CARTS biogenesis. Interestingly, in experiments reported in Jones et al. (1993), a Golgi budding assay was used to test the involvement of the cytosolic domains of TGN38 and TGN41 in budding of Golgi-derived carriers that contain the transmembrane cargo protein pIgA-R (polymeric IgA-receptor). The authors showed that the budding of these carriers was blocked upon incubation of the Golgi membranes with peptides against the cytosolic tail of TGN38/41 but not peptides against their lumenal domain. However, in the latter experiment, they used a peptide formed by the 15 N-terminal residues of TGN46, which might not functionally block the entire lumenal domain (>400 residues). Our results with reference to earlier results in the field will serve as a basis for further exploring the role(s) of TGN46 in cargo export beyond the scope of the present study.

      In summary, these are all very important points (we thank again the reviewers for highlighting them), which we have now carefully addressed in the revised version of our manuscript (lines 476-485).

      The phase separation experiments are exciting. However, they are not necessary. They may be more confusing than helpful for the following reasons:

      • The authors use very high protein concentrations and crowding reagents. Any protein would condense under these conditions.

      The protein was produced in bacteria so that it won't have post-translational modifications, especially glycosylation, possibly the most critical drivers of phase separation.

      There was no test of direct binding of PAUF with TGN46

      We appreciate that the reviewers share our excitement about our preliminary phase separation experiments. Likewise, while we initially included these experiments in the "Ideas and speculation" section due to their exciting nature, we concur with the reviewers that their preliminary nature and the experimental conditions used to obtain them raise valid concerns.

      In light of these considerations and to prevent any potential confusion for the readers, we have decided to follow the advice of the reviewers. We have removed the phase separation experiments and data from the revised manuscript. Instead, we have retained a simplified and concise "Ideas and speculation" section, in which we propose condensate formation as a potential mechanism by which TGN46 functions as a cargo sorter at the TGN (lines 580- 620).

      The authors reference S5A as the localisation between TGN46deltaLUM images, however, we believe they are referring to fig. S7.

      We apologize for the oversight in referencing the figure and thank the reviewers for bringing this to our attention. We have amended this in the revised version.

      The authors write "remarkably, the amino acid sequence of rat TGN38 is largely conserved amongst other species, including humans (>80% amino acid identity between rat TGN38 and human TGN46)". To understand if this is remarkable, the authors should use the average identity between rat and human proteins.

      We are grateful for the reviewer's insightful comment. Indeed, as the reviewer hints, the average identity between the rat and human proteomes is of the same order of magnitude as the identity reported between rat TGN38 and human TGN46. We therefore acknowledge that the term "remarkable" may not be suitable in this context and could lead to potential misinterpretation. In the revised version, we have removed the term "remarkably".

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This paper describes the role of WRNIP1 AAA+ ATPase, particularly its UBZ domain for ubiquitinbinding, but not ATPase, to prevent the formation of the R-loop when DNA replication is mildly perturbated. By combining cytological analysis for DNA damage, R-loop, and chromosome aberration with the proximity ligation assay for colocalization of various proteins involved in DNA replication and transcription, the authors provide solid evidence to support the claim. The authors also revealed a distinct role of WRNIP1 in the prevention of R-loop-induced DNA damage from FANCD2, which is inconsistent with the known relationship between WRNIP1 and FANCD2 in the repair of crosslinks.

      One concern is the relationship between WRNIP1 and FANCD2 (Figure 6) in the suppression of Rloop-induced DNA damage. This is different from the relationship in inter-crosslink (ICL) repair (Socha et al. 2020), which shows the epistatic relationship between WRNIP1 as well as its UBZ domain and FANCD2 in the ICL repair. The authors need to re-evaluate the role of FNACD2 in Rloop suppression under mild replication stress (MRS) by analyzing R-loop formation in the FANCD2 knockdown (KD) cells as well as colocalization of FANCD2 with PCNA and RNA polymerase II by the PLA method and restarting the forks by the DNA coming.

      In this line, it is important to show PLA signal between FANCD2 and R-loop depends on WRNIP1 since WRINP1 recruits FANCD2 in ICL repair (Socha et al. 2020).

      In the study referenced by the reviewer, the authors implicated WRNIP1 in repairing interstrand crosslinks (ICLs) induced by agents, such as TMP/UVA, MMC, and Cisplatin (Socha et al., 2020). For the repair of ICLs, the FANCD2/FANCI complex, the central component of the FA pathway, must be recruited to DNA. The study suggests a potential role for WRNIP1 in loading the FANCD2/FANCI complex onto DNA immediately after ICL formation. However, even in the absence of WRNIP1, a residual recruitment of the FANCD2/FANCI complex to DNA was observed, possibly due to alternative mechanisms, as proposed by the authors. Interestingly, the study did not establish a similar relationship between WRNIP1 and FANCD2 after treatments that does not induce ICLs, demonstrating that WRNIP1 and FANCD2 may also play independent roles. Hence, our data demonstrating a distinct role of WRNIP1 from the FA pathway in response to R-loop-associated replication stress are not inconsistent with prior findings. Additionally, considering the UBZ domain ability to interact with ubiquitin in both its free form and when conjugated to other proteins, thereby regulating protein functions, it is not surprising that the UBZ domain of WRNIP1 may also play a role in the response to R-loop accumulation.

      Therefore, to address the reviewer's request for a more in-depth exploration of the role of FANCD2 in the regulation of R-loops, we chose to examine the impact of FANCD2 loss on the accumulation of R-loops in WRNIP1-deficient and WRNIP1 UBZ mutant cells, as well as on the dynamics of stalled forks following aphidicolin-induced MRS. Additionally, we investigated the colocalization between FANCD2 and R-loops in shWRNIP1WT, shWRNIP1 and shWRNIP1D37A cells. Details are provided below.

      In agreement with our observations, the analysis of R-loop formation upon MRS, in WRNIP1deficient cells depleted of FANCD2, revealed a significantly higher accumulation of R-loops in cells with a concomitant loss of both WRNIP1 and FANCD2 compared to those with a single deficiency (see Fig. 6D of the revised manuscript). Similar results were observed in the WRNIP1 UBZ mutant cells in which FANCD2 was abrogated (see Fig. 6D of the revised manuscript). It is important to note that, to eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the binding of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine the proximity between FANCD2 and R-loops more accurately, cells were treated with RNase III, following established protocols (Crossley et al., 2020).

      Furthermore, we examined the interaction of FANCD2 with R-loops using a proximity ligation assay (PLA). Our findings revealed significant colocalization between FANCD2 and R-loops in the absence of WRNIP1 and in WRNIP1 UBZ mutant cells following low-dose aphidicolin treatment and RNase III exposure, showing a significant increase compared to the control counterpart (shWRNIP1WT cells; see Fig. 6B of the revised manuscript). Consequently, we conclude that neither WRNIP1 nor its UBZ domain is necessary for FANCD2 recruitment under conditions of MRS.

      We also performed a DNA fiber assay to evaluate restarting replication forks in shWRNIP1WT, shWRNIP1 and shWRNIP1D37A cells in which FANCD2 was abrogated. Our results show that FANCD2 depletion slightly decreased the ability of the cells to restart forks from MRS (see Fig. 6E of the revised manuscript).

      Given a low number (2-4) of PLA foci for WRNIP1-RNA polymerase II or WRNIP1 and R-loop (Figure 4B and 4D), how does this colocalization reflect the functional significance?

      The data from the PLA of Figures 4B and 4D are reported as the mean of three independent experiments. It is important to note that we have introduced a new Figure 4D. To selectively assess R-loop structures, cells were treated with RNase III, a double-stranded RNA-specific endoribonuclease, following established protocols (Crossley et al., 2020). Our PLA analysis confirms the localization of WRNIP1 at/near R-loops in shWRNIP1 and shWRNIP1D37A cells, and this phenomenon is more evident in WRNIP1 UBZ mutant cells (see Fig. 4D of the revised manuscript). Specifically, the new protocol allows us to visualize a higher number of PLA foci, and we observed that Aph increased the spots per nucleus in shWRNIP1D37A cells compared to the previous experiment.

      Regarding the Fig. 4B, it is not uncommon for a low number of PLA spots per nucleus to correspond to a phenotypic effect. For instance, a similar low average in the colocalization of PCNA or RNA pol II with FANCD2 has been observed in a prior paper as well, suggesting that transcription-replication collisions occur upon Aph-induced MRS (Okamoto et al., 2019). Also, not all R-loops could be “targeted” by WRNIP1.

      It would be helpful to readers if the authors were to provide a summary figure of this paper.

      As suggested by the reviewer, we have developed a model to summarize the findings obtained in our study (see Fig. 6F of the revised manuscript).

      Minor points:

      (1) Most of the cytological images in the paper show only colocalized ones, which makes it hard to see a signal. Please show a single-color image.

      For a better visualization of nuclei signals in the figures, single-color images have been provided for Figs. 2A; 3B; 4A, B, C, D and E; 6B and D; Suppl. Fig. 2A and B of the revised manuscript.

      (2) In Figure 2A, only one or two S9.6 focus(foci) can be seen. Why 1 or 2? This focus marks a specific chromosomal locus such as the centromere or telomere.

      We agree with the reviewer that the observed foci in nuclei may indicate a specific chromosomal locus, such as telomeres or centromeres.

      (3) Figure 3A, graph: Why this graph does not use a dot plot like Figure 1B and Figure 3C?

      The graph in Figure 3A has been represented as a dot plot, as requested.

      (4) Figure 1C: P values between unperturbed conditions should be provided.

      In Figure 1C, P values comparing unperturbed conditions were already included. The results showed no significance between shWRNIP1 and shWRNIP1D37A cells when compared to MRC5SV cells and, similarly, to shWRNIP1T294A cells, as indicated in the corresponding legend.

      (5) Figure 2B: Please provide the quantification or show the reproducibility of the data.

      The quantification of R-loops using the S9.6 monoclonal antibody is not accurate, as the specificity for RNA-DNA hybrids is questionable (Hartono et al., 2018). Therefore, to demonstrate the reproducibility of the findings in Fig. 2B, we conducted a repeat of the dot-blot experiment. We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells (see Fig. 2B of the revised manuscript). Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      (6) Figure 4A: the expression of RNaseH under aphidicolin addition increased colocalization of PCNA and RNA pol II. It is important to mention the result and provide an explanation of why it is increasing in the main text.

      Although the result may appear unexpected, and we lack experiments that explain the nature of this phenotype, a previous study reported that overexpression of RNase H1 in mammalian cells may lead to a dose-dependent reduction of certain proteins of the repair pathway, resulting in a significant accumulation of DNA damage (Shen et al., 2017). Consequently, the observed increase in TRCs upon RNase H1 overexpression in wild-type cells may be attributed to the disruption of proteins that, by impairing the repair process, can potentially cause more fork stalling and, consequently, more conflicts. We have introduced a comment in the text.

      Reviewer #2:

      This paper aims at establishing the role of WRN-interacting protein 1 (WRNIP1) and its UBZ domain (an N-terminal ubiquitin-binding zinc finger domain) on genome instability caused by mild inhibition of DNA synthesis by aphidicolin. The authors used human MRC5 fibroblasts investigated with standard methods in the field. The results clearly showed that WRNIP1 silencing and UBZ-mutation (D37A) increased DNA damage, chromosome aberrations, and transcription-replication conflicts caused by aphidicolin. The conclusions of the paper are overall well supported by results, however, aspects of some data analyses would need to be clarified and/or extended.

      (1) The methods (immunofluorescence microscopy and dot-blots) to determine R-loop levels can lack sensitivity and specificity. In particular, since the S9.6 antibody can bind to other structures besides heteroduplex, dot-blot analyses only grossly assess R-loop levels in cellular samples of purified nucleic acids, which are constituted by many different types of DNA/RNA structures.

      To eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we treated cells with RNase III, following established protocols (Crossley et al., 2020). Under our experimental conditions, RNase III treatment significantly reduced the amount of dsRNA, nearly eliminating it, as evaluated using a specific antibody against dsRNA (see Suppl Fig 2 of the revised manuscript). To better appreciate the effect of the loss of WRNIP1 or its UBZ domain on Rloop accumulation and the amount of DNA damage, we have reproduced key data (see Figs 2B; 3B; 4D and E; 6B of the revised manuscript). Our analysis from immunofluorescence experiments, performed using a dsRNA ribonuclease (RNase III), confirms higher R-loop accumulation in WRNIP1-deficient or WRNIP1 UBZ mutant cells compared to control cells (Fig 3B). Additionally, proximity ligation assay (PLA) data are consistent with those previously presented and, in some cases, are more readily interpretable (see Figs 4D and E; 6B of the revised manuscript). Finally, we performed a new dot-blot experiment (see Fig. 2B of the revised manuscript). We treated with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms a significant accumulation of the S9.6 signal in shWRNIP1 cells compared to shWRNIP1WT cells. Additionally, a graph illustrating the foldchange values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      (2) Experimental plan has analyzed the impact of WRNIP1 lack or mutations at steady-state conditions. Thus, the possible role of WRNIP1 at an early step of the mechanism would require some sort of kinetics analysis of the molecular process, therefore not at steady-state conditions. The findings of a co-localization of R-loops and WRNIP1 have been obtained with the S9.6 antibody, which recognizes DNA-RNA heteroduplexes. Since WRNIP1 is known to be recruited at stalled forks and DNA cleavage sites, it is not surprising that WRNIP1 is very close to heteroduplexes, abundant structures at replication forks and cleavage sites. Similar interpretations may also be valid for Rad51/S9.6 co-localization findings.

      Investigating the potential role of WRNIP1 at an early step in the mechanism is undoubtedly very interesting and requires separate investigation. Our decision to explore the relevance of the loss of WRNIP1 or WRNIP1 mutations under steady-state conditions is based on a preliminary alkaline comet assay (provided below). The comet assay, performed at various exposure times of aphidicolin at a concentration of 0.4 micromolar, clearly indicates that the most significant effect on DNA damage accumulation in WRNIP1-deficient cells occurs after 24 hours of treatment. Therefore, we have chosen to study the transcription-associated genomic instability in our cells by treating them with a low-dose of aphidicolin for 24 hours to maximize the effect.

      Author response image 1.

      We agree that the presence of WRNIP1 or RAD51 in proximity to R-loops is consistent with their roles and may not be surprising. However, these experiments formally demonstrate their proximity to R-loops under our conditions. Notably, the new graphs, obtained from experiments repeated by treating with RNase III to reduce the amount of dsRNA and improve the specificity of the S9.6 antibody, show increased interaction of the mutated form of WRNIP1 in the UBZ domain with Rloops when compared to the wild-type form. Additionally, it is more evident that the presence of RAD51 at/near R-loops is reduced in WRNIP1 UBZ mutant cells both in untreated conditions and after MRS (see Figs 4D and E of the revised manuscript).

      (3) Determination of DNA damage, chromosome aberration, and co-localization data are reported as means of measurements with appropriate statistics. However, the fold-change values relative to corresponding untreated samples are not reported. In some instances, it seems that WRNIP1 silencing or mutations actually reduce or do not affect aphidicolin effects. That leaves open the interpretation of specific results.

      To better evaluate the significance of the data presented in the study, we have introduced the foldchange values calculated with respect to the untreated samples, as requested by the reviewer. This allowed us to conclude that the loss of WRNIP1 or the expression of the UBZ mutant form of WRNIP1 does not reduce in any case the effects of aphidicolin-induced mild replication stress.

      I would suggest some additional experiments or analyses to get more convincing results:

      (1) DNA damage should be verified also with other methods, such as DNA damage markers pH2AX and 53BP1.

      The quantification of DNA damage was also corroborated by determining the percentage of gammaH2AX-positive cells, as reported in Supplementary Figure 1B. This result is consistent with the findings from the comet assay, confirming transcription-dependent DNA accumulation in shWRNIP1 and shWRNIP1D37A cells. Regarding the 53BP1 marker, we believe that the existing data sufficiently demonstrate DNA damage accumulation in the absence of WRNIP1 or when its UBZ domain is mutated, providing comprehensive support to the study without necessitating additional results.

      (2) Repair foci may also be detected with Rad51 foci. That will also provide evidence for increased DNA damage levels under the tested conditions.

      Our prior study identified WRNIP1 as a crucial factor for RAD51 function (Leuzzi et al., 2016). Loss of WRNIP1 indeed results in a defective relocalization of RAD51 to chromatin. Consequently, the analysis of RAD51 foci may be not a useful readout to evaluate DNA damage levels under our conditions.

      (3) WRNIP1 effects should be presented as FC (fold-changes) of DNA damage, PLA results, chromosomal errors, etc, to provide evidence of the level of effects on the tested phenotypes.

      We have introduced the fold-change values calculated with respect to the untreated samples, as requested by the reviewer, for a more comprehensive analysis in the graph of Figs. 1B, C and D; 2A and B; 3A, B and C; 4A, B, C, D and E; 6B, C and D.

      (4) R-loop detection ideally should be performed by one of the several types of immunoprecipitation techniques. Alternatively, dot-blot assays should be performed with a 1:2 dilution series of each sample. Then, heteroduplexes should be detected with S9.6 along with a general aspecific dye for DNA quantity in each spot. Next, densitometric analyses of S9.6 signal should be normalized over DNA quantity.

      We acknowledge that the quantification of R-loops using the S9.6 monoclonal antibody is not accurate, as the specificity for RNA-DNA hybrids is questionable (Hartono et al., 2018). Therefore, to overcome this issue, we repeated the experiment shown in Fig. 2B. We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells (see Fig. 2B of the revised manuscript). Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      (5) A major focus on WRNIP1 D37A and T294A mutations may also make the paper overall more convincing. For instance: do the mutations affect protein recruitment at damaged chromatin? Do they increase repair foci? Do they affect the recruitment of WRN or BLM helicases or specific nucleases at chromatin under the tested conditions of MRS?

      To address this point raised by the reviewer, we performed a chromatin experiment to assess the ability of WRNIP1 and its mutated forms to translocate to chromatin upon MRS. Our analysis shows that the mutated forms of WRNIP1 do not exhibit any defects in recruitment to chromatin, although the levels of the WRNIP1 ATPase mutant appear lower than the others (see Western blotting provided below for the reviewer’s use only, Fig. A). Additionally, we tested the presence of WRN helicase, which does not show any difference between cells lines (see Western blot provided below, Author Response image 2B).

      Author response image 2.

      (6) I suggest revising the text for spelling errors.

      The manuscript has been carefully revised to identify and correct any spelling errors that may have occurred.

      Reviewer #3:

      In the manuscript by Valenzisi et al., the authors report on the role of WRNIP1 to prevent R-loop and TRC-associated DNA damage. The authors claim WRNIP1 localizes to TRCs in response to replication stress and prevents R-loop accumulation, TRC formation, replication fork stalling, and subsequent DNA damage. While the findings are of potential significance to the field, the strength of evidence in support of the conclusions is lacking.

      Weaknesses:

      (1) The authors fail to utilize the proper controls throughout the manuscript in regard to the shWRNIP1, WT, and mutant cell lines. It is unclear why the authors failed to use the shWRNIP1WT line in the comet assay, DNA fiber assay, and the FANCD2 assays. This is a key control for i) the use of only a single shRNA (most studies will use at least 2 different shRNAs) and ii) the use of the mutant WRNIP1 lines. In several figures, the authors only show the effect of the UBZ mutant, but don't include the ATPase mutant or WT for comparison. Including these is essential.

      We agree with the reviewer's criticism that the use of shWRNIP1WT cells as a control is more appropriate. Therefore, all the new experiments presented in the revised version of the manuscript have been performed using the shWRNIP1WT cells. Notably, new results are in line with those obtained using the MRC5SV cells, rendering us confident that our findings are reliable overall. By contrast, we do not feel that including the WRNIP1 ATPase mutant cells is always essential, since our data clearly demonstrate that the loss of ATPase activity of WRNIP1 does not affect transcriptionassociated genome instability.

      (2) The authors use the S9.6 antibody to conclude the loss of WRNIP1 causes more R-loops; however, it has been shown that this antibody detects dsRNA in addition to RNA-DNA hybrids. Accordingly, it cannot be ruled out that the increased S9.6 signal is due to increased dsRNA.

      To eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we treated cells with RNase III, following established protocols (Crossley et al., 2020). Under our experimental conditions, RNase III treatment significantly reduced the amount of dsRNA, nearly eliminating it, as evaluated using a specific antibody against dsRNA (see Suppl Fig 2 of the revised manuscript). To better appreciate the effect of the loss of WRNIP1 or its UBZ domain on Rloop accumulation and the amount of DNA damage, we have reproduced key data (see Figs 3B; 4D and E; 6B, D and E of the revised manuscript). Our analysis from immunofluorescence experiments, performed using a dsRNA ribonuclease, confirms higher R-loop accumulation in WRNIP1-deficient or UBZ WRNIP1 mutant cells compared to control cells (Fig. 3B). Additionally, proximity ligation assay (PLA) data are consistent with those previously presented and, in some cases, are more readily interpretable (see Figs 4D and E; 6B of the revised manuscript).

      (3) Multiple pieces of data do not support the conclusions. For example, Figure 1D shows shWRNIP1 to reduce damage in Aph+DRB cells compared to MRC5SV cells with Aph+DRB. This result suggests that WRNIP1 actually increases DNA damage in stressed cells with transcription blocked. Another result is seen in Figure 4a, where the number of PLA spots (presumably TRCs) increases in the shWRNIP1WT cells with Aph+RNH1 compared to Aph alone. If R-loops are required for TRC accumulation, then the RNH1 should decrease the PLA foci. This result instead suggests that WRNIP leads to increased TRCs in stressed cells with R-loops cleared by RNH1.

      Regarding Figure 1D, in MRC5SV cells, DRB does not significantly increase DNA damage upon Aph treatment. Therefore, it is not correct to conclude that WRNIP1 exacerbates DNA damage in stressed cells with transcription blocked.

      Regarding Figure 4A, while the outcome may appear unexpected, and we do not provide data that explain the nature of this phenotype, a previous study demonstrated that overexpression of RNase H1 in mammalian cells may lead to a dose-dependent reduction of certain proteins of the repair pathway, leading to a significant accumulation of DNA damage (Shen et al., 2017). Accordingly, the observed increase in TRCs upon RNase H1 overexpression in wild-type cells may be attributed to the disruption of proteins that, by impairing the repair process, can potentially cause more fork stalling and, consequently, more conflicts. We have introduced a comment in the text.

      (4) The data are mostly phenomenological and fail to yield mechanistic insight. For example, the authors state that "it remains unclear whether WRNIP1 is directly involved in the mechanisms of Rloop removal/resolution". Unfortunately, the data presented in this manuscript do not provide new insights into this unresolved question.

      We agree with the reviewer that elucidating the mechanism by which WRNIP1 contributes to R-loop suppression would be of interest. Nevertheless, the findings presented here provide compelling evidence of a novel role for WRNIP1 in preventing R-loop accumulation. Investigating how WRNIP1 accomplishes this function will require significant effort, which we are committed to undertaking.

      (5) The authors only show merged images making it impossible to visualize differences in PLA foci.

      For a better visualization of nuclei signals in the PLA panels of Figs 4A, B, C, D and E; 6B, singlecolor images have been provided.

      In addition to including the controls I mentioned in the public review, I recommend investigating the mechanism of how WRNIP1 prevents R-loop accumulation. If it is indeed related to its UBZ domain, then does that mean ubiquitination is an important step in R-loop removal? I believe elucidating this would be a novel and significant contribution. If it's not related to ubiquitination, then how does the UBZ domain regulate R-loops?

      We agree with the reviewer that investigating the precise role of the UBZ domain of WRNIP1 in Rloop prevention would be of interest, and several experiments are required to adequately address this issue. However, as discussed, we hypothesize that the UBZ domain might contribute to directing WRNIP1 to DNA at TRC sites through RAD18.

      I recommend using purified RNH1-dead-GFP to detect R-loops as opposed to the S9.6 antibody. The Cimprich lab has published this recently as a tool for detecting R-loops in fixed cells.

      As explained in point 2), to eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we used treatment with RNase III, following established protocols (Crossley et al., 2020). New experiments are reported in the revised version of the manuscript for R-loops in all cell lines (see Fig. 3B of the revised manuscript).

      Additionally, colocalization by PLA of WRNIP1/R-loops, RAD51/R-loops, FANCD2/R-loops, and R-loop accumulation by anti-S9.6 antibody in cells depleted of FANCD2 are presented (see Figs. 4D and E; 6B and D of the revised manuscript).

      Furthermore, we repeated the dot-blot experiment (see Fig. 2B of the revised manuscript). We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an antidsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells. Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      Importantly, overall, our findings suggest that treatment with RNase III does not substantially change the results obtained without it, but in some cases, such as in Fig. 4D, makes them are more readily interpretable. Specifically, the new protocol allows us to visualize a higher number of PLA foci, and Aph increased the spots per nucleus in shWRNIP1D37A cells compared to the previous experiment (see Fig. 4D of the revised manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Zhao et al. aimed at investigating the relationships between type 2 diabetes, bone mineral density (BMD) and fracture risk using Mendelian Randomization (MR) approach.

      The authors found that genetically predicted T2D was associated with higher BMD and lower risk of fracture, and suggested a mediated effect of RSPO3 level. Moreover, when stratified by the risk factors secondary to T2D, they observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased.

      Strengths:

      • Important question

      • Manuscript is overall clear and well-written

      • MR analyses have been conducted properly, which include the usage of various MR methods and sensitivity analyses, and likely meet the criteria of the MR-strobe checklist to report MR results.

      Response: Thanks.

      Weaknesses:

      • Previous MR studies on that topic have not been discussed

      Response: In the manuscript, we discussed the previous MR findings from Trajanoska et al., BMJ, 2018. This study assessed the effect of 15 clinical risk factors (including type 2 diabetes) on fracture risk. Now we have included the other two studies (Mitchell et al, Diabetologia, 2021; Ahmad et al JBMR, 2016) which took BMD as the exposure in the paragraph when we discussed the effects on BMD.

      • Multivariable MR could have been used to better assessed the mediative effect of BMI or RSPO3 on the relationships between T2D and fracture risk.

      Response: In revision, the inverse weighted multivariable MR model was used to estimate the direct effect of T2D upon the fracture and BMD adjusted for BMI with ‘MVMR’ R package (https://github.com/WSpiller/MVMR). Specifically, we first extracted the overlapping SNPs from the summary data for T2D, BMI and fracture. Then the independent significant SNPs (P<5×10−8 and R2<0.1) for either T2D or BMI were pooled as instruments. Additionally, we performed SNP harmonization to correct the orientation of alleles. Additionally, we performed SNP harmonization to correct the orientation of alleles. The results showed that increased risk of T2D has a direct effect that decreased fracture risk (OR=0.974, 95%CI=0.953-0.995, P=0.017 adjusted BMI), and BMI mediated 9.03% of the protective effect. The multivariable MR analysis suggested that T2D also showed direct effect on increased BMD after adjusting for BMI (β=0.042, 95%CI=0.026-0.057, P=1.92×10-7). We didn’t observe the direct effect of MRI-derived visceral (β=0.02, P=0.831) and abdominal subcutaneous (β=0.03, P=0.57) on fracture risk adjusted for RSPO3 expression. We have updated the Methods and Results accordingly.

      Reviewer #2 (Public Review):

      The authors employed the Mendelian Randomization method to analyze the association between type 2 diabetes (T2D) and fracture using the UK Biobank data. They found that "genetically predicted T2D was associated with higher BMD and lower risk of fracture". Additionally, they identified 10 loci that were associated with both T2D and fracture risk, with the SNP rs4580892 showing the highest signal. While the negative relationship between T2D and fracture has been previously observed, the discovery of these 10 loci adds an intriguing dimension to the findings, although the clinical implications remain uncertain.

      Response: We appreciate the reviewer's thoughtful evaluation of our study. The hypothesis and idea of this study is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the risk association could be observed. However, when stratified by the risk factors secondary to the disease, we observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased, and the association became non-significant if the T2D patients carried none of the risk factors. These results suggested that the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, the clinical implications of our study might lie in the health management of type 2 diabetes patients. We suggest that it is important to manage the complications of type 2 diabetes to prevent the risk of fracture.

      Reviewer #1 (Recommendations For The Authors):

      • Introduction/discussion: findings from MR previously published on that topic have not been discussed in this manuscript (eg, Mitchell et al, Diabetologia, 2021; Ahmad et al JBMR, 2016);

      Response: In the manuscript, we discussed the previous MR findings from Trajanoska et al., BMJ, 2018. The study assessed the effect of 15 clinical risk factors (including type 2 diabetes) on fracture risk. Sorry that we missed the studies you mentioned, these two studies took BMD as the exposure, now we have included them in the paragraph where we discussed the effect of T2D on BMD (Page 14, Line 320-322).

      • In the one-sample MR analysis: I would suggest looking at whether the association between T2D GRS and fracture risk differ across fracture sites; in the hypothesis that BMI might be protective, performing the analysis separately for weight-bearing bones vs not weight-bearing bones would be interesting.

      Response: According to your suggestion, we further categorized fractures into weight-bearing bones (neck, vertebrae, pelvic, femur, tibia) and other bones (detailed codes have been added to Supplementary Table 16). When we regressed the observed fracture on the wGRS, it indicated that there was trend of protective association between T2D wGRS and both weight-bearing bones fracture (OR=0.9772, 95%CI=0.9552-0.9997, P=0.04737, N of fracture=8,992) and other bones fracture (OR=0.9838, 95%CI=0.9688-0.9991, P=0.0386, N of fracture=20,317) (Figure 1). We have updated the Methods and Results accordingly (Page 6, line 129-134 and Page 18, line 408-412).

      In this analysis, I would also suggest verifying the absence of sex interaction with T2D PRS on BMD and fracture risk

      Response: Thanks for your suggestion, we further estimated the effect of sex interaction on BMD and fracture risk with T2D wGRS × sex interaction term in regression model. And you are right, we found no interactions (sex with T2D wGRS) on fracture risk (P=0.5576) and BMD (P=0.66). Moreover, we conducted the stratified analysis by sex. When we regressed the observed fracture on the wGRS in male, we found that the genetically determined type 2 diabetes was also associated with lower risk of fracture (OR=0.977, P=0.015) (adjusting for reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments). In female, the direction of the association remained with no significance (OR=0.986, P=0.139). We tested the heterogeneity between male and female, and found no significant difference (Pheterogeneity= 0.457). Similarly, the genetically determined type 2 diabetes was associated with higher BMD in male (β=0.023, P=8.23×10-14) and female (β=0.022, P<2.0×10-16), and Pheterogeneity=0.6306 (Supplementary Figure 2). We have updated the Methods and Results accordingly (Page 6, line 134-139 and Page 19, line 425-429).

      • In the two-sample MR analysis: I would suggest performing a multivariable MR to look at the effect of T2D adjusted for BMI on BMD and fracture risk (see Burgess et al, AJE, 2016)

      Response: Thanks for your suggestion, in revision, the inverse weighted multivariable MR model was used to estimate the direct effect of T2D upon the fracture and BMD adjusted for BMI with ‘MVMR’ R package (https://github.com/WSpiller/MVMR). Specifically, we first extracted the overlapping SNPs from the summary data for T2D, BMI and fracture. Then the independent significant SNPs (P<5×10−8 and R2<0.1) for either T2D or BMI were pooled as instruments. Additionally, we performed SNP harmonization to correct the orientation of alleles. Additionally, we performed SNP harmonization to correct the orientation of alleles. The final IVs used in MVMR were presented in Supplementary Table 17. The results showed that increased risk of T2D has a direct effect that decreased fracture risk (OR=0.974, 95%CI=0.953-0.995, P=0.017 adjusted BMI) and increased BMD (β=0.042, 95%CI=0.026-0.057, P=1.92×10-7 adjusted BMI). We have updated the Methods and Results accordingly (Page 7, line 155-158, 162-164, and Page 20, line 456-465).

      • In the section "infer the shared genetics". In addition of using waist circumference and waist-hip ratio, it would have been interesting to use GWAS summary statistics for subcutaneous and visceral adiposity (Agrawal, Nat Comm, 2022), and look at through multivariable MR whether RSPO3 mediate the effect of subcutaneous fat on fracture risk.

      Response: Thanks for your suggestion, we downloaded the genetic summary data from Agrawal, Nat Comm, 2022, and performed the same SMR analysis as we did before. We found that higher expression of RSPO3 was associated with higher MRI-derived visceral (β=0.199, P=4.36×10-5). We have updated the Methods and Results accordingly (Page 9, line 206-208 and Page 22, line 494-495).

      We didn’t observe the direct effect of MRI-derived visceral (β=0.02, P=0.831) and abdominal subcutaneous (β=0.03, P=0.57) on fracture risk adjusted for RSPO3 expression.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      Several concerns regarding the study's concept and methodology should be addressed before accepting the findings as credible. I would like to invite the authors to comment on the following points.

      (1) I find the authors' assertion that individuals with type 2 diabetes (T2D) exhibit both higher BMD and an increased risk of fracture to be unconvincing. The BMD measurement they refer to is based on areal BMD, which fails to account for the three-dimensional aspect of bone density. Existing evidence suggests that patients with T2D actually have lower trabecular bone scores (a predictor of fracture risk) compared to those without the condition. Furthermore, there is a lack of a clearly stated hypothesis underlying the study.

      Response: Yes, in this study, the bone mineral density measurement is based on areal BMD. We made this clear in Abstract. And we agree that other measurements, such as trabecular bone score and chest CT texture analysis, could provide additional valuable information in the evaluation of fracture risk, especially in type 2 diabetes patients. We have discussed this in the manuscript (Page 13, line 295-300). Epidemiologic studies from the past decade provided evidence that increased bone fracture risk is one of the complications of type 2 diabetes. but the areal BMD in type 2 diabetes patients could be normal or even higher (Botella Martinez et al., 2016; Romero-Diaz et al., 2021).

      In this study, we employed the mendelian randomization approach to investigate the relationship between type 2 diabetes and fracture/BMD, this method might facilitate the use of genetic data as instrumental variables to alleviate the bias of the unknown confounding factors. We found that the genetically predicted type 2 diabetes was associated with higher BMD and lower risk of fracture. That is to say, by alleviating the bias of the unknown confounding factors through MR analysis, the genetically predicted type 2 diabetes did not show bone paradox.

      We then performed observational analysis in UK Biobank, and found that type 2 diabetes was associated with higher risk of fracture and increased BMD. Further, we stratified the T2D patients with five secondary risk factors (BMI≤25kg/m2, no physical activity, falls in the last year, HbA1c≥47.5mmol/mol and antidiabetic medication treatment), and found that the effect of type 2 diabetes on the risk of fracture decreased when the risk factors secondary to type 2 diabetes decreased, and the association became not significant if the type 2 diabetes patients carried none of the risk factors. That is to say, the diabetic bone paradox might not exist if the secondary risk factors of type 2 diabetes were eliminated.

      The hypothesis and idea we want to deliver is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the association could be observed. However, when stratified by the risk factors secondary to the disease, we observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased, and the association became non-significant if the T2D patients carried none of the risk factors. These results suggested that the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, it is important to manage the complications of type 2 diabetes to prevent the risk of fracture.

      In addition, although we observed type 2 diabetes was observed to be associated with higher risk of fracture, but BMI mediated 30.2% of the protective effect. And the shared genetic architecture between type 2 diabetes and fracture suggested a top signal near RSPO3 gene. Higher expression of RSPO3 was associated with higher waist circumference and higher waist-hip ratio. These results suggested that relatively higher BMI in type 2 diabetes patients might benefit the higher BMD, as our previous study suggested that keeping moderate-high BMI (overweight) might be of benefit to old people in terms of fracture risk(Zhu et al., 2022).

      (2) It is not a good idea to solely concentrate on overall fracture risk as it may obscure the potential relationship between T2D and specific fracture sites, such as hip and vertebral fractures. By solely considering total fracture incidence, important associations at individual fracture sites could be overlooked. I would like to propose that the authors expand their analysis to include the examination of hip and vertebral fractures. By incorporating these specific fracture types into their study, a more comprehensive understanding of the association between T2D and fractures can be achieved.

      Response: This is a good suggestion, incorporating with the comments from another reviewer, and considering the sample size, we classified fractures into weight-bearing fractures (neck, vertebrae, pelvic, femur, tibia) and other bones (skull and facial, ribs, sternum, forearm, wrist and hand, foot and other unspecified body regions) fracture. We identified 6,582 (1.87%) participants with weight-bearing bones fracture and 9,586 (2.72%) participants with other bones fracture within the 352,879 UK Biobank participants. We observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments (weight-bearing bones fracture: HR=1.792, 95%CI 1.555-2.065, P=8.25×10-16; other bones fracture: HR=1.337, 95%CI 1.167-1.531, P=2.85×10−5), and additionally controlled for BMD (weight-bearing bones fracture: HR=1.850, 95%CI 1.602-2.136, P<2×10−16; other bones fracture: HR=1.377, 95%CI 1.199-1.580, P=5.54×10−6). We have updated the manuscript according in Results, Methods and Figures (Page 11, line 245-250; Page 24, line 540-547; Figure 4A).

      (3) I consider that there is an issue with combining data from both males and females in the analysis. It is widely recognized that women generally have a higher risk of fracture compared to men. Moreover, the association between BMD and fracture may vary between genders, and the risk of T2D is typically lower in women than in men. Therefore, I strongly recommend that the analysis be stratified by gender to account for these differences and provide a more accurate understanding of the relationships involved.

      Response: Thanks for your suggestion, we now add the stratified results by sex to each analysis. Briefly, in wGRS analysis, we found that the genetically determined type 2 diabetes was associated with lower risk of fracture in male (OR=0.977, 95%CI=0.958-0.995, P=0.015) (adjusting for reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments). The association in female was not significant, but the direction is the same as the male (OR=0.986, 95%CI=0.969-1.004, P=0.139). Meanwhile, the genetically determined type 2 diabetes was associated with higher BMD in both male (β=0.023, 95%CI=0.017-0.030, P=8.23×10−14) and female (β=0.022, 95%CI=0.017-0.026, P<2×10−16). In observational analysis, we observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments in male (HR=1.587, 95%CI 1.379-1.828, P=1.26×10−10) and female (HR=1.530, 95%CI 1.334-1.756, P=1.27×10−9), respectively. When we additionally controlled for BMD (HR=1.607, 95%CI 1.393-1.853, P=7.21×10−11 in male; HR=1.601, 95%CI 1.393-1.841, P=3.59×10−11 in female), we still observed increased risk of fracture in type 2 diabetes (Page 6, line 136-139; Page 11, line 241-243).

      (4) My understanding is that "BMD" in UK Biobank refers to estimated BMD derived from ultrasound measurements, rather than being directly measured using dual-energy X-ray absorptiometry (DXA). It would be helpful to clarify whether the BMD mentioned in the manuscript refers to estimated BMD or DXA-based BMD to ensure accurate interpretation of the results.

      Response: Yes, we used the BMD estimated from quantitative ultrasound measurement at heel as the outcome. Use of the device generates two variables, including speed of sound (SOS) and BUA (the slope between the attenuation of the sound signal and its frequency as it travels through the bone and soft tissue). Heel BMD was calculated by the following formula: BMD = 0.002592 ×(BUA+SOS)−3.687. We have made this clear in Methods (Page 23, line 526-530).

      (5) The clarification regarding the nature of the 13,817 individuals with T2D mentioned in Supplementary Table 9 is needed. It is unclear whether this figure represents incidence or prevalence. If it refers to incidence, it would be informative to specify the duration of the follow-up period for these individuals.

      Response: The UK Biobank data (application #41376), was applied in our study under a prospective design. We excluded participants if they were identified as follows: 1) ethnically identified as non-European (n =30,481); 2) diagnosed as type 1 diabetes (n=4,455); 3) diagnosed with diseases associated with bone loss (n=21,560); 4) diagnosed as fracture with known primary diseases (n=7,222) (Supplementary Table 15). For the 439,982 UK biobank samples, we focused the participants diagnosed with T2D within the 10-year period from 1 January 2006 to 31 December 2015, leaving 425,772 participants (with 14,860 type 2 diabetes patients). Here, each type 2 diabetes patient had a diagnosis date (i.e., the reference date), we first calculated the onset age, then among the participants who were free of T2D, we selected up to 27 participants (whenever possible) whose age at the reference date (± 3 years) could be matching to the onset age as referents. In total, 363,884 non-T2D referents were individually matched with 6-year age band at the reference date. We prospectively followed these type 2 diabetes patients and referents from the reference date until diagnosis of fracture, death, emigration, 19 April 2021 (diagnose a fracture of the last person in the cohort), whichever came first (with the mean duration of type 2 diabetes 8.34 years). Survival time was calculated based on whether the patient had a fracture. If individuals had a fracture, the survival time is calculated as the time of the first diagnosis of fracture minus the reference date. If individuals did not have a fracture, it was defined as the minimum time of the reference date to diagnose a fracture of the last person in the cohort (19 April 2021), death, or emigration date. We excluded 25,865 participants with fracture diagnosis date, or death or emigration before the reference date, leaving 352,879 participants included in the final analysis (13,817 type 2 diabetes patients and 339,062 referents). We identified 16,147 (4.6%) participants with fracture within the 352,879 UK Biobank participant. We have made this clear in the Methods and Results (Page 18, line 400-406; Page 22-23, line 506-523; Page 10, line 231-233).

      (6) I find the selection of participants for the analysis to be highly problematic. Supplementary Figure 1 suggests that individuals with a history of fracture were excluded from the study. However, it is well established that prior fracture history is a significant predictor of future fractures. Therefore, the exclusion of participants with prior fractures likely introduced selection bias into the analysis, potentially compromising the study's findings.

      Response: Sorry that we used a misleading term “secondary fracture” in the manuscript and figure. What we want to say here is that “the participants diagnosed as fracture with known primary diseases” (n=7,222), because we want to investigate the effect of diabetes on fracture, we should exclude other factures with known reason. We have changed the term in the manuscript and figure accordingly (Page 18, line 405-406; Supplementary Figure 1).

      Since this study is a prospective design, all the participants did not have fracture at the reference date, we prospectively followed these type 2 diabetes patients and referents from the reference date until diagnosis of fracture, death, emigration, 19 April 2021 (diagnose a fracture of the last person in the cohort), whichever came first. Therefore, each study subject either had one fracture or no fracture.

      (7) It is unclear what exactly is meant by "genetically predicted T2D." Could it possibly refer to the polygenic risk score derived from the variants associated with T2D? Clarification is needed regarding the methodology used to determine this "genetically predicted T2D" and its relation to the construction of a polygenic risk score based on T2D-associated variants.

      Response: In this study, we used weighted genetic risk score (wGRS) method and two-sample Mendelian Randomization (MR) method to estimate the effect of genetically predicted T2D on fracture. We constructed the wGRS for the individuals in the UK biobank (294,571 samples with genotypes) as a linear combination of the selected SNPs weighted by their β coefficients on type 2 diabetes: wGRS = β1 SNP1 + β2 SNP2 + … + βn SNPn. n is the number of instrumental variables. To validate the wGRS results, we also performed the two-sample MR analyses that is independent of UK Biobank samples. We used three two-sample MR approaches, the inverse variance weighting (IVW), simple median and MR-PRESSO approaches. Both methods took the genetically predicted type 2 diabetes as the exposure (See Methods Page 18, line 419-422; Page 19, line 439-440).

      (8) My understanding is that the Mendelian Randomization analysis relies on, among others, 2 assumptions: (1) the genetic marker is linked to the exposure (e.g., T2D), and (2) the genetic marker remains independent of the outcome (e.g., fracture) when considering the exposure and all confounding factors. In the authors' study, they identified 10 loci that exhibited associations with both T2D and fracture risk. This finding raises questions about whether the assumptions underlying Mendelian Randomization have been violated?

      Response: You're absolutely right. Because the presence of horizontal pleiotropy could bias the MR estimates, we additionally used the MR pleiotropy residual sum and outlier (MR-PRESSO) method. When we excluded pleiotropic variants using restrictive MR-PRESSO method, the causal association was still detected between type 2 diabetes and fracture (OR=0.967, 95%CI=0.945-0.989, P=0.004) (Page 6, line 146-149).

      (9) The analysis provided in Supplementary Table 10 appears to have certain limitations. From my understanding, the analysis treated fracture and BMD as outcome variables, with T2D regarded as the predictor variable. However, what is of interest is whether the association between T2D and fracture remains significant even after accounting for well-established risk factors for fractures, including BMD. It is crucial to determine whether the association between T2D and fracture is independent of these established risk factors. Therefore, I suggest the authors consider the following 3 models:

      Model 1: fracture ~ age + T2D

      Model 2: fracture ~ age + T2D + BMD

      Model 3: fracture ~ age + T2D + BMD + fracture history + falls

      Response: In our previous analysis, we have adjusted for 7 covariates (including fall history) in the basic model for fracture, i.e.

      fracture ~ T2D + age + sex + BMI + physical activity + HbA1c + medication treatments + fall history (Model 0)

      We have already included “fall history” in the basic model, according to your suggestion, we further considered an additional model for fracture by including BMD as the covariate:

      fracture ~ T2D + age + sex + BMI + physical activity + HbA1c + medication treatments + fall history + BMD (Model 1)

      We cannot include fracture history as the covariate because each study subject either had one fracture or no fracture, as we also answered in Question 6.

      In model 0, we observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the clinical risk factors including reference age, sex, BMI, physical activity, HbA1c, medication treatments and fall history (HR=1.527, 95%CI=1.385-1.685, P<2.0×10-16). When we additionally controlled for BMD (model 1), we still observed increased risk of fracture in type 2 diabetes (model 1: HR=1.574, 95%CI=1.425-1.739, P<2.0×10-16) (Supplementary Table 11).

      We thank for your suggestion, and we have updated accordingly in Methods, Results, and Figures (Page 11, line 243-245; Page 24, line 539-540; Figure 4A).

      (11) The dichotomization of data presented in Figure 4 is not considered ideal, as this approach often leads to a loss of valuable information. It is strongly recommended that the authors reconsider their data analysis strategy and reanalyze the data using continuous variables, such as BMI and HbA1c, to capture a more nuanced understanding of the relationships involved.

      Response: We agree that dichotomization of data would lead to a loss of valuable information. In model 0 and model 1, we used the continuous variables in the analyses, we adjusted for the reference age, sex, BMI (as a continuous variable), physical activity, fall history, HbA1c (as a continuous variable) and medication treatments to analyze the relationship between type 2 diabetes and fracture in the cox proportional hazards regression. We have updated the Figure 4 accordingly.

      In stratified analyses, we took 5 clinical factors secondary to the diseases to classify the individuals at risk, for example, if an individual had BMI≤25kg/m2, no physical activity, falls in the last year, HbA1c≥47.5mmol/mol and antidiabetic medication treatment, this individual was identified to have 5 risk factors, and so forth. Finally, 2,303 patients carried none of the risk factors, 4,128 patients accompanied with one of the risk factors, and 4,252 patients carried at least two risk factors. We found that the effect of type 2 diabetes on the risk of fracture decreased when the risk factors secondary to type 2 diabetes decreased. We have made this clearer in the Methods and Results (Page 11, line 255-257; Page 24, line 548-552).

      (12) The conclusion of the study appears to be somewhat confusing. In the Abstract, the authors initially state that "genetically predicted T2D was associated with higher BMD and lower risk of fracture." However, later on, they write that "the genetically determined T2D might not be associated with a higher risk of fracture." This discrepancy raises uncertainty about the clear take-home message of the study.

      Response: Here we just want to deliver the same message by different statements, avoiding the repeat of writing. The take-home message we want to deliver is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the association could be observed, suggesting the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, it is important to manage the complications of type 2 diabetes to prevent the risk of fracture, especially the 5 factors we investigated in this study.

      (13) Apologies if I offend) It seems that the authors lack comprehensive knowledge of the osteoporosis literature. In the Introduction, their definition of osteoporosis as "an age-related common disease characterized by low bone mass" is inadequate. It would be advisable for the authors to provide a more widely accepted and standard definition of osteoporosis to ensure accuracy and alignment with established definitions in the field.

      Response: Thanks for your suggestion. Now we changed the statement as follow “Osteoporosis is a common chronic disease characterized by low bone mass and disruption of bone microarchitecture. Fragility fracture is the ultimate outcome of poor bone health”.

      (14) There are several instances in which the authors use non-standard terminologies. For example, the use of the word 'effects' (in "the observed effect of T2D on fracture risk") is inappropriate since this study is observational in nature.

      Response: In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population. We have changed some of the words “effect” into “effect size” (whenever appropriate) to refer the Hazard ratio between T2D on fracture.

      (15) Please provide a reference for "diabetic bone paradox".

      Response: We have cited Botella Martínez et al, Endocrinol Nutr. 2016 and Romero-Díaz et al, Diabetes Ther. 2021 in both Introduction and Discussion (Page 3, line 76-77; Page 13, line 295-297).

      References

      Botella Martinez S, Varo Cenarruzabeitia N, Escalada San Martin J, Calleja Canelas A. The diabetic paradox: Bone mineral density and fracture in type 2 diabetes. Endocrinol Nutr. 2016, 63: 495-501.

      Romero-Diaz C, Duarte-Montero D, Gutierrez-Romero SA, Mendivil CO. Diabetes and bone fragility. Diabetes Ther. 2021, 12: 71-86.

      Zhu XW, Liu KQ, Yuan CD et al. General and abdominal obesity operate differently as influencing factors of fracture risk in old adults. iScience. 2022, 25: 104466.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presented convincing single-cell transcriptomic data of hematopoietic cells and immunocytes in zebrafish kidney marrow and showed that these cells have distinctive responses to viral infection. The findings in this study suggest that zebrafish kidney is a secondary lymphatic organ and hematopoietic stem cells in zebrafish may exhibit trained immunity. This represents a valuable discovery of the unique features of the fish immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hu et al. performed sc-RNA-seq analyses of kidney cells with or without virus infection, vaccines, and vaccines+virus infections from pooled adult zebrafish. They compared within these experimental groups as well as kidney vs spleen. Their analyses identified expected populations but also revealed new hematopoietic stem/progenitor cell (HSPC), even in the spleen. Their analyses show that HSPCs in the kidney can respond to virus infection differentially and can be trained to recognize the same infection and argue that zebrafish kidney can serve as a secondary immune organ. The findings are important and interesting. The manuscript is well written and a pleasure to read. However, there are several issues with their figure presentation and figure qualities, as well as the lack of clarity in some of figure legends. Some of the data presentation can be improved for better clarity. It is also important to outline what is conserved and what is unique for fish.

      Major concerns:

      (1) The visualization for several figure panels is very poor. Please provide high resolution images and larger font sizes for gene list or Y and X axis labels. This includes Figure 1B, Figure 1-figure supplement 2, Figure 2B-2C, 3A-3D, 4F, 5B, 6G, Figure 6-figure supplement 1B, Figure 6-figure supplement 2. Figure 7B, 8C-8E, Figure 8-figure supplement 1., 10F, 10G-10J, Figure 10-figure supplement 1.

      Response: We apologize for the issue you have pointed out concerning the inadequate visualization of the graphic panels. It is likely that the formatting of the inserted images was altered during the manuscript upload process, leading to a reduction in resolution. However, the graphics uploaded as separate image files, specifically formatted as vector files in PDF format, preserve their high resolution even when zoomed in. Therefore, we kindly request the reviewer to consult the figures in the submission folder for a more detailed examination. We sincerely apologize for any inconvenience caused.

      (2) What are the figures at the end of the manuscript without any figure legends?

      Response: Thank you for bringing this issue to our attention. The last few figures that lack figure legends are actually supplementary figures included in the text. It is possible that they were automatically and repeatedly generated by the submission system. In the revised manuscript, we will take measures to ensure that this issue is avoided.

      (3) It would be better to use a Table to organize the gene signatures that define each unique population of immune cells such as T, B, NK, etc.

      Response: We greatly appreciate the valuable advice provided by the reviewer. As per the reviewer's recommendation, we have included a comprehensive display of all cell types and corresponding gene signatures in Supplementary File 1 of the revised manuscript.

      (4) What are the similarities for HSPC and immune cell populations between fish and man based on this research? It is better to form a table to compare and discuss.

      Response: Following the valuable suggestion of the reviewer, we have included an additional comparative analysis of HSPC and immune cell populations between zebrafish and humans. This information can be found in Supplementary file 8 and in the "Discussion" section (lines 684-685).

      (5) It is highly likely that sex and age could be the biological variation for how HSPC responds to virus infections and vaccination. The author should clearly state the fish sex and age from their samples and discuss their results taking into consideration of these variations.

      Response: We are grateful for the reviewer's insightful comments. To reduce inter-individual variations, zebrafish samples were selected randomly, with an equal distribution of males and females, during their prime youth period spanning from 3 to 12 months of age. We have included supplementary instructions regarding this selection process in the "Materials and Methods" section (lines 798-799).

      (6) The authors claim that the spleen and kidney share HSPCs. However, their data did not demonstrate this result clearly in Figure 4A. Perhaps they should use different color to make the overlay becoming more obvious? Or include a table to show which HSPCs are shared between the kidney and spleen? Are they sure if these are just HSPCs seeding the spleen to differentiate into B cells or other immune cells?

      Response: We express our gratitude to the reviewer for raising this issue. In this section, we would like to provide detailed explanations regarding this matter. It is important to note that the figures positioned on both the left and right sides of Figure 4A should be interpreted in a corresponding manner. The left-side figure represents the cellular composition from the spleen (depicted in light red) and the kidney (depicted in blue) across various cell types. Each data point in the left-side figure signifies an individual cell, with the two distinct colors indicating the origin of the cell. On the other hand, the right-side figure displays the varied colors representing different cell types. We want to emphasize that the spatial distribution and proportions of diverse cells in the tSNE plot on the right align consistently with the information presented in the left-side figure. This indicates the correspondence between the two plots and reinforces the validity of our findings. When interpreting the figures on the left and right sides of Figure 4A in a corresponding manner, it becomes evident that the overlapping HSPCs shared by both spleen and kidney predominantly reside in the HSPCs1 group (indicated as cluster 5 in the right-side figure). Additionally, there is also a small distribution of the overlapping HSPCs in the HSPCs2 group (cluster 8 in the right-side figure). These observations underline the presence of overlapping HSPCs in both the kidney and spleen. However, further clarification is required to fully comprehend the intricate correlation between the HSPCs in the kidney and spleen.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      (1) Figure 3C: why is 10 listed in between 1 and 2?

      Response: We appreciate the reviewer's comment. It is pertinent to mention that the graphs in Figure 3C underwent an automatic sorting process facilitated by the software during the analysis. It should be emphasized that the assigned positions resulting from this sorting process have no bearing on the outcomes of the analysis.

      (2) Figure 4A: difficult to assess the overlay between the kidney and spleen.

      Response: As mentioned above, the overlapping HSPCs shared by both the spleen and kidney are mainly distributed in the HSPCs1 group (cluster 5 in the right-side figure), with a small amount also found in the HSPCs2 group (cluster 8 in the right-side figure).

      (3) Figure 4C: What is this sample, kidney or spleen? Please specify.

      Response: Figure 4C represents an overlay of the spleen and kidney cells depicted in Figure 4B, which includes all cells of the spleen and kidney to show the differentiation trajectory of the cells. As per reviewer’s suggestion, we have made corresponding modification to the revised figure.

      (4) The manuscript is very long. Consider to focus on the major findings as the main figures and move the rest to the supplementary figures.

      Response: This article aimed to comprehensively understand the hematopoietic and immunological traits of zebrafish kidneys through a systematic study. As a result, a comprehensive presentation of the findings has been provided. Given that the figures currently integrated into the main text play a significant role in illustrating the principal outcomes of each section, we kindly request that these figures remain in the main body of the article. This will contribute to sustaining the structural coherence and readability of the manuscript. Thank you for taking our request into consideration.

      Reviewer #2 (Public Review):

      In this manuscript, the authors have meticulously constructed a comprehensive atlas delineating hematopoietic stem/progenitor cell (HSPC) and immune-cell types within the zebrafish kidney, employing single-cell transcriptome profiling analysis. Notably, these cell populations exhibited distinctive responses to viral infection. Intriguingly, the investigation revealed that HSPCs manifest positive reactivities to viral infection, indicating the effective induction of trained immunity in select HSPCs. Furthermore, the study unveiled the capacity for the generation of antigen-stimulated adaptive immunity within the kidney, suggesting a role for the zebrafish kidney as a secondary lymphoid organ. This research elucidates the distinctive features of the fish immune system and underscores the multifaceted biology of the kidney in ancient vertebrates.

      Response: We would like to express our gratitude to the reviewers for their overall positive feedback on our article.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors propose that zebrafish kidney is a dual-functional entity with functionalities of both primary and secondary lymphoid organs. Do the authors have any insights into the coordination of these two functions in the kidneys?

      Response: We are grateful for the valuable comments provided. We believe that the question raised by the reviewer poses an intriguing research topic, as it explores the intricate interaction between the hematopoietic and adaptive immune systems in the renal organ. This exploration holds significant value in understanding the underlying mechanisms. To accomplish this, advanced techniques such as spatiotemporal single-cell transcriptomics and dynamic cell tracking will be utilized to validate the interplay between hematopoietic and immune cell lineages.

      (2) Previous studies have found that fish IgZ/IgT specificity exists in mucosal immune organs. Is the expression of the zebrafish IgZ gene observed in the kidney? If so, is there any correlation with IgZ in mucosal immune organs?

      Response: Thank you for drawing attention to this matter. In our study, we observed the expression of the IgZ gene (ighz) in the zebrafish kidney, as shown in Figure 6. This discovery aligns with previous research and confirms its presence in B cells. While IgZ is known to function as an antibody in mucosal immunity, it remains unclear whether the development of its secretory cells (IgZ+ B cells) originates from the central immune system, such as the kidney. Our results suggest that IgZ+ B cells may have their origin in the kidney and then migrate through the peripheral circulation to carry out their functions in the local mucosal system. This finding is consistent with our earlier research, which demonstrated that zebrafish IgZ is not limited to mucosal immune organs but is also abundantly present in systemic immunity, including peripheral blood (Immunology. 2021; 162(1): 105-120).

      Reference:

      Ji, J. F. et al. Differential immune responses of immunoglobulin Z subclass members in antibacterial immunity in a zebrafish model. Immunology, 2021;162(1), 105-120.

      (3) Did the authors use the zebrafish genome or transcriptome for gene annotation? If the former, which version is used? Please supplement in the "Materials and methods".

      Response: We appreciate the comments provided by the reviewer. In this study, we utilized the zebrafish genome, specifically the GRCz11 version, to annotate genes. The detailed genome data can be found at http://asia.ensembl.org/Danio_rerio/Info/Index. We have incorporated this information into the "Materials and Methods" section of the revised manuscript (line 873).

      (4) Since the authors performed single-cell sequencing on leukocytes, why did several kidney cells, such as kidney multicellular cells and kidney mucin cells existed in the samples?

      Response: Thanks for the reviewer’s comments. It is important to acknowledge that inadvertent mixing of kidney cells might have occurred during the preparation of single-cell suspensions in our analyzed sample. However, it is pertinent to emphasize that our primary focus was the analysis of immune cells. Therefore, any minor contamination from kidney cells in the analyzed sample is considered negligible and does not significantly affect the main results of our analysis.

      (5) The application of "trained immunity," although currently popular, appears unsuitable in this context, as the current scenario involves a recall with the cognate antigen.

      Response: To our knowledge, trained immunity is generally recognized as the long-term memory of innate immunity based on transcriptional, epigenetic and metabolic modifications of myeloid cells, which are characterized by elevated pro-inflammatory responses to secondary stimuli, whether they are identical or different (Cell Host Microbe. 2012; 12(2): 223-32; Nat Immunol. 2021; 22(1): 2-6; J Clin Invest. 2022;132(7): e158468). Therefore, stimulation of cognate antigens can be considered as a form of training immunity, and we hope that it will be accepted in this context.

      References:

      (1) Quintin, J. et al. Candida albicans infection affords protection against reinfection via functional reprogramming of monocytes. Cell host & microbe, 2012;12(2), 223-232.

      (2) Divangahi, M. et al. Trained immunity, tolerance, priming and differentiation: distinct immunological processes. Nature immunology, 2021;22(1), 2-6.

      (3) Pernet, E. et al. Training can’t always lead to Olympic macrophages. Journal of Clinical Investigation, 2022;132(7), e158468.

      (6) The discovery that HSPC exhibits trained immune characteristics is novel. Do the authors have any insights into the biological significance of trained immunity in HSPCs concerning immune defense?

      Response: We propose that the generation of trained immunity in HSPCs holds significant physiological implications. This process may expedite the differentiation and activation of specific immune cells upon re-infection, thereby bolstering the body's immune defenses and pathogen clearance. Consequently, it may serve as an intelligent strategy for host defense against pathogens. However, additional research is required to confirm this hypothesis.

      (7) In the Figure 13I, the authors used CpG and CpG+TNP-KLH to stimulate zebrafish, but no corresponding experimental method was provided in the "Materials and methods". Please supplement.

      Response: Thanks for the reviewer’s careful reading. We have included corresponding supplementary instructions in the “Materials and methods” section (lines 1011-1018).

      (8) At line 187-190 in "Results", authors state that "It's noteworthy that cluster 11 exhibited high expression of genes ......, resembling a unique serpin-secreting cell population". Noteworthy is the fact that serpins play a role in diverse immunological processes, including coagulation, inflammation, as well as myeloid and lymphoid cell development. Could this renal cell cluster (kidney mucin cells) potentially harbor immunological functions?

      Response: Given the crucial role of serpins in various immunological processes, secreted serpins from this particular cell cluster likely possess significant immunological functions, suggesting the notable immunological capabilities of this cell group. Consequently, our forthcoming research aims to conduct a more comprehensive investigation of this specific cell population.

      (9) At line 171 in "Results", the number "6" in the "cluster 6" should not be italicized, please correct.

      Response: We have addressed this issue in the revised manuscript (line 170).

      (10) At line 937 in "Materials and methods", the authors isolated T/B lymphocytes through magnetic bead sorting. Please provide information on the source of the antibodies (rabbit anti-TCRα/β or mouse anti-IgM Ab).

      Response: We have included corresponding instructions in the “Materials and methods” section (lines 938-939).

    1. Author Response

      Reviewer #1 (Public Review):

      The author's goal was to determine the role of O-GlcNAc modification in associate learning in Drosophila using an odor discriminatory task. In particular, they sought to determine the population of O-GlcNAc modified proteins in a region of the brain critical for memory, the mushroom body. They provide compelling evidence that there are brain-region-specific populations of O-GlcNAc modified proteins and that in the mushroom body, proteins involved in translation represent a sizable, and larger fraction than elsewhere in the central nervous system. Using expression of a bacterial protein that cleaves O-GlcNAc in the mushroom body, they show both reductions in the levels of this modification and effects on associative learning. Further exploration of new protein synthesis in situ supports the hypothesis that O-GlcNAc modification affects the activity of the translational machinery and could provide the basis for learning deficits when O-GlcNAc levels are compromised. Rescue of deficits resulting from reductions in O-GlcNAc was achieved by over-expression of dMyc, a known regulator of ribosome biogenesis and translation. While the critical role of protein synthesis in learning is long established, and that O-GlcNAc modification regulates protein synthesis, this work connects O-GlcNAc modification in a specialized region of the brain to translation regulation and associative learning. The authors also provide a method for identification of O-GlcNAc modified proteins using a tissue-specific and inducible proximity-labelling method. This will provide a useful tool for further functional studies of O-GlcNAc modification.

      Thank you for summarizing our main findings and recognizing the usefulness of the tool reported here.

      Reviewer #2 (Public Review):

      In this report Yu et al. try to demonstrate how O-GlcNAcylation of ribosomal proteins in the mushroom body (MB) is required for protein synthesis and olfactory learning. The authors develop a new method combining the O-GlcNAc binding activity of an OGlcNAcase (OGN) and TurboID for efficient isolation. This novel method is a useful tool for the identification of O-GlcNAc modified proteins and closely interacting partners. Transgenic expression of this binder allows the authors to perform a profiling that can be time and tissue/region/cell specific. This novel tool is thoroughly tested to show it works in cultured cells, whole Drosophila and in a tissue specific manner expressing it pan-neuronally or specific regions of the brain.

      The authors had previously shown that reduced O-GlcNAcylation through transgenic expression of a highly active OGN affected olfactory learning. In this work the same approach is used to reduce O-GlcNAcylation in different brain regions to show that specific reduction in the adult MB reduced olfactory learning performance. As control OGN expression in the ellipsoid body has no effect on olfactory learning. Optic and antennal lobes could not be tested as OGN expression affected olfactory acuity. The most critical part of this finding is time specific expression of OGN in the adult in a tissue specific manner given the developmental defects it induces with earlier expression. The MB has a widely reported role in associative learning, therefore this finding while not unexpected it is satisfying.

      Thank you for recognizing the significance of our work.

      Yu et al. use their TurboID-OGA to identify O-GlcNAcylated proteomes in different brain regions. The authors focus on the MB given its role in associative learning and the effect of reduced O-GlcNAcylation in this region. Among other substrates several ribosomal proteins are found to be specifically O-GlcNAcylated to a greater extent in the MB compared to other brain regions.

      To demonstrate the role of MB O-GlcNAcylated ribosomes in protein synthesis an ex vivo OPP fluorescent assay is used in brains of flies expressing OGN or a mutant form lacking its catalytic and binding activities. The experiment shows reduced protein synthesis in the MB. In addition, the authors can increase protein synthesis inducing ribosomal biogenesis through the expression of dMyc. Flies expressing of dMyc and OGN together do not present the learning deficits of flies carrying only OGN. Protein synthesis in MB has been previously reported to be required for associative learning (for example Wu et al.2017 or Lin et al. 2022) and the present results bring further support. A link between ribosomal O-GlcNAcylation and protein synthesis could be a really interesting finding but, unfortunately the experiments presented in this work are still too preliminary.

      The experiments presented just focus on ribosomal proteins while these are just some of the O-GlcNAcylation substrates in the MB. While a correlation between ribosomal modification and protein synthesis is shown, a demonstration is not provided. Many other mechanisms and O-GlcNAcylation of other substrates could account for the same observations. For example, O-GlcNAcylation has been reported to have a role in protein synthesis affecting different translation initiation factors (Li et al 2018, Shu et al 2022). In vitro experiments where specific O-GlcNAcylation ribosomal components could be targeted are required. In addition, O-GlcNAcylation is also known to modify ribosomal-associated mRNAs. Experiments where specific mutations preventing O-GlcNAcylation in ribosomes could demonstrate a direct link of such ribosomal modifications in olfactory learning.

      We appreciate that you bring up a crucial point that our data fall short for a causal connection between O-GlcNAcylation of ribosomes and translational activity. We have made significant changes to the text throughout the manuscript to make our description more accurate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewer’s comments

      Reviewer #1 (Public Review):

      In this study, the structural characteristics of plant AlaDC and SerDC were analyzed to understand the mechanism of functional differentiation, deepen the understanding of substrate specificity and catalytic activity evolution, and explore effective ways to improve the initial efficiency of theanine synthesis.

      On the basis of previous solid work, the authors successfully obtained the X-ray crystal structures of the precursors of theanine synthesis-CsAlaDC and AtSerDC, which are key proteins related to ethylamine synthesis, and found a unique zinc finger structure on these two crystal structures that are not found in other Group II PLP-dependent amino acid decarboxylases. Through a series of experiments, it is pointed out that this characteristic zinc finger motif may be the key to the folding of CsAlaDC and AtSerDC proteins, and this discovery is novel and prospective in the study of theine synthesis.

      In addition, the authors identified Phe106 of CsAlaDC and Tyr111 of AtSerDC as key sites of substrate specificity by comparing substrate binding regions and identified amino acids that inhibit catalytic activity through mutation screening based on protein structure. It was found that the catalytic activity of CsAlaDCL110F/P114A was 2.3 times higher than that of CsAlaDC. At the same time, CsAlaDC and AtSerDC substrate recognition key motifs were used to carry out evolutionary analysis of the protein sequences that are highly homologous to CsAlaDC in embryos, and 13 potential alanine decarboxylases were found, which laid a solid foundation for subsequent studies related to theanine synthesis.

      In general, this study has a solid foundation, the whole research idea is clear, the experimental design is reasonable, and the experimental results provide strong evidence for the author's point of view. Through a large number of experiments, the key links in the theanine synthesis pathway are deeply studied, and an effective way to improve the initial efficiency of theanine synthesis is found, and the molecular mechanism of this way is expounded. The whole study has good novelty and prospectivity, and sheds light on a new direction for the efficient industrial synthesis of theanine

      Response: Thank you very much for taking time to review this manuscript. We appreciate all your insightful comments and constructive suggestions.

      Reviewer #1 (Recommendations For The Authors):

      (1) If some test methods are not original, references or method basis should be indicated.

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have added references for the enzymatic activity experiments performed to measure the synthesis of theanine in the revised manuscript.

      (2) The conclusion is a little lengthy, and the summary of the whole study is not well condensed.

      Response: Thank you very much for your valuable suggestions. We have refined the conclusion in the revised manuscript, and it is as follows:

      In conclusion, our structural and functional analyses have significantly advanced understanding of the substrate-specific activities of alanine and serine decarboxylases, typified by CsAlaDC and AtSerDC. Critical amino acid residues responsible for substrate selection were identified—Tyr111 in AtSerDC and Phe106 in CsAlaDC—highlighting pivotal roles in enzyme specificity. The engineered CsAlaDC mutant (L110F/P114A) not only displayed enhanced catalytic efficiency but also substantially improved L-theanine yield in a synthetic biosynthesis setup with PsGS or GMAS. Our research expanded the repertoire of potential alanine decarboxylases through the discovery of 13 homologous enzyme candidates across embryophytic species and uncovered a special motif present in serine protease-like proteins within Fabale, suggesting a potential divergence in substrate specificity and catalytic functions. These insights lay the groundwork for the development of industrial biocatalytic processes, promising to elevate the production of L-theanine and supporting innovation within the tea industry.

      Reviewer #2 (Public Review)

      Summary:

      The manuscript focuses on the comparison of two PLP-dependent enzyme classes that perform amino acyl decarboxylations. The goal of the work is to understand the substrate specificity and factors that influence the catalytic rate in an enzyme linked to theanine production in tea plants.

      Strengths:

      The work includes x-ray crystal structures of modest resolution of the enzymes of interest. These structures provide the basis for the design of mutagenesis experiments to test hypotheses about substrate specificity and the factors that control catalytic rate. These ideas are tested via mutagenesis and activity assays, in some cases both in vitro and in plants.

      Weaknesses:

      The manuscript could be more clear in explaining the contents of the x-ray structures and how the complexes studied relate to the reactant and product complexes. The structure and mechanism section would also be strengthened by including a diagram of the reaction mechanism and including context about reactivity. As it stands, much of the structural results section consists of lists of amino acids interacting with certain ligands without any explanation of why these interactions are important or the role they play in catalysis. The experiments testing the function of a novel Zn(II)-binding domain also have serious flaws. I don't think anything can be said at this point about the function of the Zn(II) due to a lack of key controls and problems with experimental design.

      Response: Thank you very much for your thoughtful comments and feedback on our manuscript. We are pleased to hear that the work's strengths, such as the X-ray crystal structures and the mutagenesis experiments tied to the catalytic rate and substrate specificity, align with the goals of our research.

      We recognize the areas identified for improvement and appreciate the suggestions provided. We have emphasized how we use the structural information obtained to infer the roles of key amino acid residues in the reaction. Additionally, we have added a diagram of the reaction mechanism in the Supplementary figure to provide clearer context on reactivity and improve the overall understanding of the catalytic process. Regarding the structural results section, we have included a discussion that contextualizes the list of amino acids and their interactions with the ligands by explaining their significance and roles in catalysis. We acknowledge the weaknesses you've pointed out in the experiments concerning the novel Zn(II)-binding domain, but we would like to clarify that the focus of our study was not primarily on the zinc structure. While we agree that there may be limitations in the experimental design and controls for the zinc binding domain, we believe that these flaws do not significantly impact the overall findings of the study. The experiment served as a preliminary exploration of the potential functionality of the domain, and further studies are required to fully understand its role and mechanism.

      Reviewer #2 (Recommendations For The Authors):

      (1) In addition to the points raised in the public review, it would be ideal to provide some context for the enzymatic characterization. Why are the differences in kinetic parameters for AlaDC and SerDC significant?

      Response: Thank you for your comments and suggestions. The Km values for CsAlaDC and SerDCs are comparable, suggesting similar substrate affinities. However, CsAlaDC exhibits a significantly lower Vmax compared to AtSerDC and CsSerDC. This discrepancy implies that CsAlaDC and SerDCs may differ in the rates at which they convert substrate to product when saturated with substrate. SerDCs may have a faster turnover rate, meaning they convert substrate to product and release the enzyme more quickly, resulting in a higher Vmax. Differences in the stability or correct folding of the enzymes under assay conditions can also affect their Vmax. If SerDCs are more stable, they might maintain their catalytic activity better at higher substrate concentrations, contributing to a higher Vmax. We have added these to the part of “Enzymatic properties of CsAlaDC, AtSerDC, and CsSerDC” in our revised manuscript.

      (2) Why is Phe106/Tyr111 pair critical for substrate specificity? Does the amino acid contact the side chain? It might be helpful to a reader to formulate a hypothesis for this interaction.

      Response: Thank you for the question and comments. We conducted a comparison between the active sites of CsAlaDC and AtSerDC and observed a distinct difference in only two amino acids: F106 in CsAlaDC and Y111 in AtSerDC. The remaining amino acids were found to be identical. Expanding on previous research concerning Group II PLP-dependent amino acid decarboxylases, it was postulated and subsequently confirmed that these specific amino acids play a crucial role in substrate recognition. However, since we lack the structure of the enzyme-substrate complex, we are unable to elucidate the precise interactions occurring between the substrate and the amino acids at this particular site based solely on structural information.

      (3) Line 55 - Define EA again.

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have redefined “EA” as the abbreviation for ethylamine in the revised manuscript.

      (4) Line 58 - The meaning of "determined by the quality formation of tea" is not clear.

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have modified it in the revised manuscript.

      (5) Line 65 - Missing words between "despite they".

      Response: Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (6) Line 67 - Need a reference for the statement about lower activity?

      Response: Thank you for the question and comments. We have provided the following reference to support this statement in the revised manuscript.

      Reference: Bai, P. et al. (2021) Biochemical characterization of specific Alanine Decarboxylase (ADC) and its ancestral enzyme Serine Decarboxylase (SDC) in tea plants (Camellia sinensis). BMC Biotechnol. 21,17.

      (7) Line 100-101 - The meaning of "its closer relationship was Dicots plants." is not clear.

      Response: We have revised the sentence in the revised manuscript, as follows: “Phylogenetic analysis indicated that CsAlaDC is homologous with SerDCs in Dicots plants.”

      (8) Line 139 - Missing a word between "as well as" and "of".

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.

      (9) Line 142 - The usage of comprised here is not correct. It would be more correct to say "The overall architecture of CsAlaDC and AtSerDC is homodimeric with the two subunits...".

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.

      (10) Line 148-149 - I didn't understand the statement about the "N-terminal structures" Are these structures obtained from protein samples that have a truncated N-terminus?

      Response: Group II PLP-dependent amino acid decarboxylases are comprised of three distinct structural domains: the N-terminal domain, the large domain, and the C-terminal domain. Each of these domains possesses unique structural features. Similarly, CsAlaDC and AtSerDC can also be classified into three structural domains based on their specific characteristics. To achieve more stable proteins for further experiments, we conducted truncation on both of these proteins. The truncated section pertains to a subsection of the N-terminal domain and is truncated from the protein's N-terminus.

      (11) Line 153 - Say "is composed of" instead of "composes of".

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.

      (12) Line 156 - I didn't understand the statement about the cofactor binding process. What is the cofactor observed? And how can we say anything about the binding process from a single static structure of the enzyme? It might be better to say that the cofactor binding site is located at the subunit junction - but the identity of the cofactor still needs to be defined first.

      Response: Thank you for your comments and suggestions. The cofactor mentioned here is PLP. We aim to elucidate the binding state of PLP at the active site, excluding the binding process. The description has been revised in the revised manuscript.

      (13) Lines 157-158 - I didn't understand the conclusion about the roles of each monomer. In the images in Figure 3 - both monomers appear to bind PLP but the substrate is not present - so it's not clear how conclusions can be drawn about differential substrate binding in the two subunits.

      Response: Thank you very much for your careful reading and valuable suggestions. The main idea we want to convey is that this protein possesses two active sites. At each active site, the two monomers carry out distinct functions. Of course, our previous conclusion is inaccurate due to the non-existence of the substrate. So, we have made the necessary amendments in the revised manuscript.

      (14) Line 161 - I would say loop instead of ring.

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.

      (15) Line 165 - Please provide some references for this statement. It would also be ideal to state the proximity of the Zn-binding motif to the active site or otherwise provide some information about the role of the motif based on its location.

      Response: Thank you for your comments and suggestions. We have provided the following references to support this statement in the revised manuscript.

      Author response image 1.

      (A) Structure of histidine decarboxylase. (B) Structure of glutamate decarboxylase.

      Reference:

      30 Komori, H. et al. (2012) Structural study reveals that Ser-354 determines substrate specificity on human Histidine Decarboxylase. J Biol Chem. 287, 29175-83.

      31 Huang, J. et al. (2018) Lactobacillus brevis CGMCC 1306 glutamate decarboxylase: Crystal structure and functional analysis. Biochem Biophys Res Co. 503, 1703-1709

      In CsAlaDC, the zinc is positioned at a distance of 29.6 Å from the active center, whereas in AtSerDC, the zinc is situated 29 Å away from the active center. Hence, we hypothesize that this structure does not impact the enzyme's catalytic activity but might be correlated with its stability.

      (16) Lines 166-178 - This paragraph appears to be a list of all of the interactions between the protein, PLP, and the EA product. It would be ideal to provide some text to explain why these interactions are important and what we can learn from them.

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have been conducting additional analysis on the functional roles of amino acid residues involved in the interaction between the active site and PLP. This analysis focuses on aiding PLP binding, determining its orientation, and understanding enzyme catalytic mechanisms. These details are mentioned in the revised manuscript.

      (17) Line 192 - Bond not bound.

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have made corrections in the revised manuscript.

      (18) Lines 201-207 - It would be ideal to verify that the inclusion of 5 mM DTT affects Zn binding. It's not clear to me that this reagent would necessarily disrupt Zn binding. Under certain circumstances, it could instead promote Zn association. For example, if the Cys ligands are oxidized initially but then become reduced? I don't think the current experiment really provides any insight into the role of the Zn.

      Response: Thank you for your valuable insights regarding the role of DTT and its potential effects on Zn binding in our experiments. The main function of DTT is to protect or restore the reduced state of proteins and other biological molecules, particularly by disrupting the crosslinking formed by thiol (-SH) groups and disulfide bonds to maintain the function and structure of proteins. Therefore, the reason for DTT's inhibition of enzyme activity is unknown, and we cannot provide a reasonable explanation for this phenomenon. As a result, we have removed the section discussing the inhibition of enzyme activity by DTT in our revised manuscript.

      Reviewer #3 (Public Review):

      In the manuscript titled "Structure and Evolution of Alanine/Serine Decarboxylases and the Engineering of Theanine Production," Wang et al. solved and compared the crystal structures of Alanine Decarboxylase (AlaDC) from Camellia sinensis and Serine Decarboxylase (SerDC) from Arabidopsis thaliana. Based on this structural information, the authors conducted both in vitro and in vivo functional studies to compare enzyme activities using site-directed mutagenesis and subsequent evolutionary analyses. This research has the potential to enhance our understanding of amino acid decarboxylase evolution and the biosynthetic pathway of the plant-specialized metabolite theanine, as well as to further its potential applications in the tea industry. Response: Thank you very much for taking the time to review this manuscript. We appreciate all your insightful comments.

      Reviewer #3 (Recommendations For The Authors):

      Page 6, Figure 2, Page 23 (Methods)

      "The supernatants were purified with a Ni-Agarose resin column followed by size-exclusion chromatography."

      What kind of SEC column did the authors use? Can the authors provide the SEC elution profile comparison results and size standard curve?

      Response: We use a Superdex 200 (Hiload 16/600) column for size exclusion chromatography. The comparison results of SEC elution profiles for AtSerDC and CsAlaDC, along with the standard curve of SEC column, are presented below.

      Author response image 2.

      (A) Comparison of elution profiles of CsAlaDC and AtSerDC. (B) Elution profile of Blue Dextron 2000. (C) Elution profile of mixed protein (Aldolase, 158000 Da,71.765ml; Conalbumin, 75000 Da,79.391ml; Ovalbumin, 44000 Da,83.767ml; Carbonic anhydrase, 29000 Da,90.019ml; Ribonuclease A, 13700 Da,98.145ml). (D) Size standard curves of Superdex 200 (Hiload 16/600) column.

      Page 6 & Page 24 (Methods)

      "The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 ℃ and pH 8.0 for CsAlaDC, 40 ℃ and pH 8.0 for AtSerDC for 30 min)."

      (1) The enzymatic activities of CsAldDC and AtSerDC were measured at two different temperatures (45 and 40 ℃, but their activities were directly compared. Is there a reason for experimenting at different temperatures?

      Response: We determined that the optimal reaction temperature for AtSerDC is 40°C and for CsAlaDC is 45°C through our verification process. Consequently, all subsequent experiments were performed at these specific temperatures.

      Author response image 3.

      (A) Relative activity of CsAlaDC at different temperatures. (B) Relative activity of AtSerDC at different temperatures.

      (2) Enzyme activities were measured at temperatures above 40℃, which is not a physiologically relevant temperature and may affect the stability or activity of the proteins. At the very least, the authors should provide temperature-dependent protein stability data (e.g., CD spectra analysis) or, if possible, temperature-dependent enzyme activities, to show that their experimental conditions are suitable for studying the activities of these enzymes.

      Response: Thank you very much for your careful reading. We have already validated that the experimental temperature we used did not significantly affect the stability of the protein before experimenting. The results are shown in the figure below:

      Author response image 4.

      Place the two proteins individually into water baths set at temperatures of 25°C, 37°C, 45°C, 60°C, and 80°C for 15 minutes. Subsequently, carry out enzymatic reactions utilizing a standard reaction system, with untreated enzymes serving as the experimental control within the said system. The experimental results suggest that the temperature at which we experimented does not have a significant impact on the stability of the enzyme.

      (3) The authors used 20 mM of substrate. What are the physiological concentrations of alanine and serine typically found in plants?

      Response: The content of alanine in tea plant roots ranges from 0.28 to 4.18 mg/g DW (Yu et al., 2021; Cheng et al., 2017). Correspondingly, the physiological concentration of alanine is 3.14 mM to 46.92 mM, in tea plant roots. The content of serine in plants ranges from 0.014 to 17.6 mg/g DW (Kumar et al., 2017). Correspondingly, the physiological concentration of serine is 0.13 mM to 167.48 mM in plants. In this study, the substrate concentration of 20 mM was close to the actual concentrations of alanine and serine in plants.

      Yu, Y. et al. (2021) Glutamine synthetases play a vital role in high accumulation of theanine in tender shoots of albino tea germplasm "Huabai 1". J. Agric. Food Chem. 69 (46),13904-13915.

      Cheng, S. et al. (2017) Studies on the biochemical formation pathway of the amino acid L-theanine in tea (Camellia sinensis) and other plants.” J. Agric. Food Chem. 65 (33), 7210-7216.

      Kumar, V. et al. (2017) Differential distribution of amino acids in plants. Amino Acids. 49(5), 821-869.

      Pages 6-7 & Table 1

      (1) Use the correct notation for Km and Vmax. Also, the authors show kinetic parameters and use multiple units (e.g., mmol/L or mM for Km).

      Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected this in the revised manuscript.

      (2) When comparing the catalytic efficiency of enzymes, kcat/Km (or Vmax/Km) is generally used. The authors present a comparison of catalytic activity from results to conclusion. A clarification of what results are being compared is needed.

      Response: Thank you for your comments and suggestions. The catalytic activity is assessed by comparing reaction rates.

      Page 7 & Figure 3

      In Figure 3A, the authors describe the overall structure, but a simple explanation or labeling within the figure should be added.

      Response: Thank you very much for your suggestions, we have made modifications to Figure 3A as follows:

      Author response image 5.

      Crystal structures of CsAlaDC and AtSerDC. (A) Dimer structure of CsAlaDC. The color display of the N-terminal domain, large domain, and C-terminal domains of chain A is shown in light pink, khaki and sky blue, respectively. Chain B is shown in spring green. The PLP molecule is shown as a sphere model. The zinc finger structure at the C-terminus of CsAlaDC is indicated by the red box. The gray spheres represent zinc ions, while the red dotted line depicts the coordination bonds formed by zinc ions with cysteine and histidine.

      Figures 3F & 4A

      In these figures, the two structures are overlaid and compared, but the colors are very similar to see the differences. The authors should use a different color scheme.

      Response: Thank you very much for your suggestions, we have made modifications to the Figure 3F & 4A as follows:

      Author response image 6.

      (Figure 3F) The monomers of CsAlaDC and AtSerDC are superimposed. CsAlaDC is depicted in spring green, while AtSerDC is shown in plum. The conserved amino acid catalytic ring is indicated by the red box.

      (Figure 4A) Superposition of substrate binding pocket amino acid residues in CsAlaDC and AtSerDC. The amino acid residues of CsAlaDC are shown in spring green, the amino acid residues of AtSerDC are shown in plum, with the substrate specificity-related amino acid residue highlighted in a red ellipse.

      Pages 7 & 8

      Figures 3 and 4 do not include illustrations of what the authors describe in the text. The reader will not be able to understand the descriptions until they download and view the structures themselves. The authors should create additional figures to make it easier for readers to understand the structures.

      Response: Thank you very much for your suggestions, we have included supplementary figure 1 in the revised manuscript, which presents more elaborate structural depictions of the two proteins.

      Pages 9 & 10

      "This result suggested this Tyr is required for the catalytic activity of CsAlaDC and AtSerDC."

      The author's results are interesting, but it is recommended to perform the experiments in a specific order. First, experiments should determine whether mutagenesis affects the protein's stability (e.g., CD, as discussed earlier), and second, whether mutagenesis affects ligand binding (e.g., ITC, SPR, etc.), before describing how site-directed mutagenesis alters enzyme activity. In particular, the authors' hypothesis would be much more convincing if they could show that the ligand binding affinity is similar between WT and mutants.

      Response: Thank you for your insightful feedback on our manuscript, which we greatly appreciate. Your suggestion to methodically sequence the experiments provides a clear pathway to bolster the strength and conclusiveness of our results.

      We agree that it is crucial to first assess the stability of the mutant proteins, as changes therein could inadvertently affect catalytic activity. To this end, we have employed circular dichroism (CD) to study the potential structural alterations in the proteins induced by mutations. The experimental results are shown in the following figure:

      Author response image 7.

      (A) Circular Dichroism Spectra of CsAlaDC (WT). (B) Circular Dichroism Spectra of CsAlaDC (Y336F). (C) Circular Dichroism Spectra of CD of AtSerDC (WT). (D) Circular Dichroism Spectra of AtSerDC (Y341F).

      The experimental results indicate that the secondary structure of the mutant proteins remains unchanged, which means the mutations do not alter the protein's stability.

      The ligand PLP forms a Schiff base structure with the ε-amino group of a lysine residue in the protein, with maximum absorbance around 420-430 nm. Since we have already added PLP during the protein purification process, as long as the absorbance of mutant proteins and wild-type proteins is the same at 420-430 nm at equivalent concentrations, it indicates that the mutant proteins do not affect the binding of the ligand PLP. Therefore, we scanned the UV-visible absorption spectra of both the wild-type and mutant proteins, and the results are as presented in the following figure:

      Author response image 8.

      (A) UV-Visible Absorption Spectra of CsAlaDC (WT) compared to CsAlaDC (Y336F). (B) UV-Visible Absorption Spectra of AtSerDC (WT) compared to AtSerDC (Y341F).

      The mutant protein and the wild-type protein exhibit similar absorbance at 420-430 nm, indicating that the mutation does not affect the binding of PLP to the protein.

      The above experiments have confirmed that the mutations do not significantly affect the stability of the protein or the affinity for the ligand, so we can more confidently attribute changes in enzyme activity to the specific role of the tyrosine residue in question. We believe this comprehensive approach will substantiate our hypothesis and illustrate the necessity of this Tyr residue for the catalytic activity of CsAlaDC and AtSerDC enzymes.

      Figure 3

      In the 3D structure figure provided by the authors, the proposed reaction mechanism of the enzyme and the involved amino acids are not included. Can the authors add a supplementary figure with a schematic drawing that includes more information, such as distances?

      Response: Thank you for your valuable feedback on our manuscript. We completely agree that a schematic drawing with additional details, including distances, would enhance the clarity and understanding of the enzymatic mechanism. In response to your suggestion, we have added a supplementary figure 2 in the revised manuscript that accurately illustrates the proposed reaction pathway, highlighting the key amino acids involved.

      Page 10

      "The results showed that 5 mM L-DTT reduced the relative activity of CsAlaDC and AtSerDC to 22.0% and 35.2%, respectively"

      The authors primarily use relative activity to compare WT and mutants. Can the authors specify the exact experiments, units, and experimental conditions? Is it Vmax or catalytic efficiency? If so, under what specific experimental conditions?

      Response: Thank you for your attention and review of our research paper, we appreciate your suggestions and feedback. The experimental protocol employed to evaluate the influence of DTT on protein catalytic efficiency is outlined as follows:

      The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, 5 mM L-DTT, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 °C and pH 8.0 for CsAlaDC for 5 min, 40 °C and pH 8.0 for AtSerDC for 2 min). DTT is absent as a control in the reaction system. Then the reaction was stopped with 20 μL of 10% trichloroacetic acid. The product was derivatized with 6-aminoquinolyl-N-hydroxy-succinimidyl carbamate (AQC) and subjected to analysis by UPLC. All enzymatic assays were performed in triplicate.

      However, due to the unknown mechanism of DTT inhibition on protein activity, we have removed this part of the content in the revised manuscript.

      Pages 10-12

      The identification of 'Phe106 in CsAlaDC' and 'Tyr111 in AtSerDC,' along with the subsequent mutagenesis and enzymatic activity assays, is intriguing. However, the current manuscript lacks an explanation and discussion of the underlying reasons for these results. As previously mentioned, it would be helpful to gain insights and analysis from WT-ligand and mutant-ligand binding studies (e.g., ITC, SPR, etc.). Furthermore, the authors' analysis would be more convincing with accompanying structural analysis, such as steric hindrance analysis.

      Response: Thank you for your insightful comments and constructive feedback on our manuscript. We appreciate the interest you have expressed in the identification of 'Phe106 in CsAlaDC' and 'Tyr111 in AtSerDC' and their functional implications based on mutagenesis and enzymatic assays.

      In order to investigate the binding status of the mutant protein and the ligand PLP,we scanned the UV-visible absorption spectra of both the wild-type and mutant proteins, and the results are as presented in the following figure:

      Author response image 9.

      (A) UV-Visible Absorption Spectra of CsAlaDC (WT) compared to CsAlaDC (F106Y). (B) UV-Visible Absorption Spectra of AtSerDC (WT) compared to AtSerDC (Y111F).

      The mutant protein and the wild-type protein exhibit similar absorbance at 420-430 nm, indicating that the mutation does not affect the binding of PLP to the protein. Therefore, we can conclude that the change in activity of the mutant protein is caused by the substitution of the amino acid at that site, i.e., the amino acid at that site affects substrate specificity. By combining the structure of the two proteins, we can see that the Lys at position 111 of AtSerDC is a hydrophilic amino acid, which increases the hydrophilicity of the active site, and thus the substrate is the hydrophilic amino acid Ser. In contrast, the amino acid at the corresponding site in CsAlaDC is Phe, which, lacking a hydroxyl group compared to Lys, increases the hydrophobicity of the active site, making the substrate lean towards the hydrophobic amino acid Ala. We have added a discussion of the potential reasons for this result to the revised manuscript's discussion section.

      Page 5 & Figure 1B

      "As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. However, CsAlaDC is relatively distant from CsSerDC."

      In Figure 1B, CsSerDC and AtSerDC are in different clades, and this figure does not show that the two enzymes are closest. To provide another quantitative comparison, please provide a matrix table showing amino acid sequence similarities as a supplemental table.

      Response: Many thanks for your constructive suggestion. We added a matrix table showing amino acid sequence similarities in the supplemental materials. The results showed that the similarity of amino acid sequences between CsSerDC and AtSerDC is 86.21%, which is higher than that between CsAlaDC and CsSerDC (84.92%). This data exactly supports the description of Figure 1B. We added the description of the amino acid sequence similarities analysis in the revised manuscript. The description of "As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. " is not accurate enough, so we revised it to "As expected, CsSerDC was closer to AtSerDC, which implies that they shared similar functions.", in the revised manuscript.

      Page 5 & Figure 1C

      Figure 1C, which shows a multiple sequence alignment with the amino acid sequences of the 6 SerDCs and CsAlaDC, clearly shows the differences between the sequences of AlaDC and other SerDCs. However, the authors' hypothesis would be more convincing if they showed that this difference is also conserved in AlaDCs from other plants. Can the authors show a new multiple-sequence alignment by adding more amino acid sequences of other AlaDCs?

      Response: Thank you for your comments and suggestions. We aim to discover additional alanine decarboxylase. However, at present, the only experimentally confirmed alanine decarboxylase is CsAlaDC. No experimentally verified alanine decarboxylases have been found in other plant species.

      Figure 5A

      Figure 5A is missing the error bar.

      Response: Figure 5A serves as a preliminary screening for these mutants, without conducting repeated experiments. Subsequently, only the L110F and P114A mutants, which exhibited significantly improved activity, underwent further experimental verification to confirm their enhanced functionality.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the discovery and subsequent design of the AF03-NL chimeric antibody yielded a tool for studying filoviruses and provides a possible blueprint for future therapeutics. However, the data are incomplete and not presented clearly, which obscures flaws in the analyses and leaves unexplained phenomena. The work will be of interest to virologists studying antibodies.

      Author response: Thank for your very valuable comments. The ms has been revised substantially and some new data have been added to further support the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      Zhang et al. conducted a study in which they isolated and characterized a Marburg virus (MARV) glycoprotein-specific antibody, AF-03. The antibody was obtained from a phage-display library. The study shows that AF-03 competes with the previously characterized MARV-neutralizing antibody MR78, which binds to the virus's receptor binding site. The authors also performed GP mutagenesis experiments to confirm that AF-03 binds near the receptor binding site. In addition, the study confirmed that AF-03, like MR78, can neutralize Ebola viruses with cleaved glycoproteins. Finally, the authors demonstrated that NPC2-fused AF-03 was effective in neutralizing several filovirus species.

      Weaknesses:

      (1) The main premise of this study is unclear. Flyak et al. in 2015 described the isolation and characterization of a large panel of neutralizing antibodies from a Marburg survivor (Flyak et al., Cell, 2015). Based on biochemical and structural characterization, Flyak proposed that the Marburg neutralizing antibodies bind to the NPC1 receptor binding side. In the same study, it has been shown that several MARV-neutralizing antibodies can bind to cleaved Ebola glycoproteins that were enzymatically treated to remove the mucin-like domain and glycan cap. In the following study, it has been shown that the bispecific-antibody strategy can be used to deliver Marburg-specific antibodies into the endosome, where they can neutralize Ebola viruses (Wec et al., Science 2016). Finally, the use of lysosome-resident protein NPC2 to deliver antibody cargos to late endosomes has been previously described (Wirchnianski et al., Front. Immunol, 2021). The above-mentioned studies are not referenced in the introduction. The authors state that "there is no licensed treatment or vaccine for Marburg [virus] infection." While this is true, there are human antibodies that recognize neutralizing epitopes - that information can't be excluded while providing the rationale for the study. Furthermore, the authors use the word "novel" to describe the AF-03 antibody. How novel is AF-03 if multiple Marburg-neutralizing antibodies were previously characterized in multiple studies? Since AF-03 competes with previously characterized MR78, it binds to the same antigenic region as MR78. AF-03 also has comparable neutralization potency as MR78.

      Author response: Thank for your valuable advice. In terms of the novelty of AF-03, the inhibition assay indicates that Q128/N129/C226 functions as key amino acids responsible for AF-03 neutralization given that the neutralizing capacity of AF-03 to pesudotyped virus harboring these mutants is impaired (see revised Fig. 2A left panel). Furthermore, ELISA assays show that mutation of Q128S-N129S or C226Y significantly disrupts the binding of GP to AF-03, while the neutralizing and binding capacity of MR78 to mutant GP and pseudovirus harboring C226Y instead of Q128S-N129S is not almost affected (see revised Fig. 2A right panel and 2B). Considering the fact that AF-03 and MR78 could compete with each other to bind to MARV GP (Fig. 2D). we thus make a conclusion that the epitopes of these two mAbs overlapped partially. Therefore, AF-03 is not a clone of MR78 and is a novel neutralizing mAb to MARV.

      The work from Wirchnianski and colleagues has been referenced actually in the ms (see Ref. 38). Although our strategy for the design of broad-spectrum neutralizing antibody refers to their work, we further expand the species being evaluated including RAVN and mutated EBOV strains. The results show that NPC2-fused AF-03 exhibits neutralizing activity to 10 filovirus species and 17 EBOV mutants (Fig. 6A and B). The work by Flyak et al. in 2015 that described the isolation and characterization of a large panel of neutralizing antibodies from a Marburg survivor has been cited in Introduction section accordingly.

      (2) Without the AF-03-MARV GP crystal structure, it's unclear how van der Waals interactions, H-bonds, and polar and electrostatic interactions can be evaluated. While authors use computer-guided homology modeling, this technique can't be used to determine critical interactions. Furthermore, Flyak et al. reported that binding to the NPC1 receptor binding site is the main mechanism of Marburg virus neutralization by human monoclonal antibodies. Since both AF-03 (this study) and MR78 (Flyak study) competed with each other, that information alone was sufficient for GP mutagenesis experiments that identified the NPC1 receptor binding site as the main region for mutagenesis.

      Author response: Computer-guided homology modeling has been exploited successfully in our lab to determine key residues responsible for the interaction between antigen and mAbs (Immunol Res. 2015, 62:377; Scand J Immunol. 2019, 90:e12777; Sci Rep. 2022, 12:8469; Front Immunol. 2022, 13:831536). We refer to the crystal structure of MARV GP and the complex of MR78 and GP reported previously (Cell 2015, 160:904) and then model the complex of MARV GP and AF-03. Although AF-03 and MR78 compete with each other, we show that the epitopes of these two mAbs just overlap partially (Fig. 2A-D).

      (3) The AF-03-GP affinity measurements were performed using bivalent IgG molecules and trimeric GP molecules. This format does not allow accurate measurements of affinity due to the avidity effect. The reported KD value is abnormally low due to avidity effects. The authors need to repeat the affinity experiments by immobilizing trimeric GPs and then adding monovalent AF-03 Fab.

      Author response: As shown in Fig. 1A, GP protein used in this work is not trimer but largely monomer composed of MLD-deleted GP1 and GP2, which may at a certain extent weaken the engagement between GP and AF-03. It is noteworthy that we re-done the SPR assays for the binding of AF-03 to GP and show that KD value is 4.71x10-11M (see revised Fig. 1C). This GP protein is thus available to the evaluation of mAb affinity. In addition, it is reasonable to utilize bivalent IgG to detect the affinity of mAb to monomeric GP since the affinity likely decreases significantly when monovalent Fab is used.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the discovery of a filovirus neutralizing antibody, AF03, by phage display, and its subsequent improvements to include NPC2 that resulted in a greater breadth of neutralization. Overall, the manuscript would benefit from considerable grammatical review, which would improve the communication of each point to the reader. The authors do not convincingly map the AF03 epitope, nor do they provide any strong support for their assumption that AF03 targets the NPC1 binding site. However, the authors do show that AF03 competes for MR78 binding to its epitope, and provides good support for the internalization of AF03-NL as the mechanism for improved breadth over the original AF03 antibody.

      Strengths:

      This study shows convincing binding to Marburgvirus GP and neutralization of Marburg viruses by AF03, as well as convincing neutralization of Ebolaviruses by AF03-NL. While there are no distinct populations of PE-stained cells shown by FACS in Figure 5A, the cell staining data in Figure 5C are compelling to a non-expert in endosomal staining like me. The control experiments in Figure 7 are compelling showing neutralization by AF03-NL but not AF03 or NPC2 alone or in combination. Altogether these data support the internalisation and stabilisation mechanism that is proposed for the gain in neutralization breadth observed for Ebolaviruses by AF03-NL over AF03 alone.

      Weaknesses:

      Overall, this reviewer is of the opinion that this paper is constructed haphazardly. For instance, the neutralization of mutant pseudoviruses is shown in Figure 2 before the concept of pseudovirus neutralization by AF03 is introduced in Figure 3. Similarly, the control experiments for AF03+NPC2 are described in Figure 7 after the data for breadth of neutralization are shown in Figure 6. GP quality controls are shown in Figure 2 after GP ELISAs / BLI experiments are done in Figure 1. This is disorienting for the reader.

      Author response: AF-03 production and its binding capacity to GP is determined in Fig. 1. The epitopes of AF-03 is identified in Fig. 2. The neutralizing activity of AF-03 to pseudotyped MARV in vitro and in vivo is detected in Fig. 3. The neutralizing activity of AF-03 to pseudotyped ebolavirus harboring cleaved GP is detected in Fig. 4. The endosome-delivering ability of AF03-NL is examined in Fig. 5. The neutralization of filovirus species and EBOV mutants by AF03-NL is detected in Fig. 6. The requirement of CI-MPR for neutralization activity of AF03-NL is determined in Fig. 7. We think that this arrangement is suitable.

      Figure 1: The visualisation of AF03 modelling and docking endeavours is extremely difficult to interpret. Firstly, there is no effort to orient the non-specialist reader with respect to the Marburgvirus GP model. Secondly, from the figures presented it is impossible to tell if the Fv docks perfectly onto the GP surface, or if there are violent clashes between the deeply penetrating AF03 CDRs and GP. This information would be better presented on a white background, perhaps showing GP in surface view from multiple angles and slices. The authors attempt to label potential interactions, but these are impossible to read, and labels should be added separately to appropriately oriented zoomed-in views.

      Author response: To be readily understood the rationale of computer-guided modeling, the descriptions in the Methods and Results section have been refined accordingly. In addition, the information of the theoretical structure was presented on white background (see revised Fig. 1D-F).

      Figure 2: The neutralization of mutant pseudoviruses cannot be properly assessed using bar graphs. These data should be plotted as neutralization curves as they were done for the wild-type neutralization data in Figure 3. The authors conclude that Q128 & N129 are contact residues, but the neutralization data for this mutant appear odd as the lowest two concentrations of AF03 show higher neutralization than the second highest AF03 concentration. Neutralization of T204/Q205/T206 (green), Y218 (orange), K222 (blue), or C226 (purple) appears to be better than neutralization of the wild-type MARV. The authors do not discuss this oddity. What are the IC50's? The omission of antibody concentrations on the x-axis and missing IC50 values give a sense of obscuring the data, and the manuscript would benefit from greater transparency, and be much easier to interpret if these were included. I am intrigued that the Q128S/N129S mutant is reported as having little effect on the neutralization of MR78. The bar graph appears to show some effect (difficult to interpret without neutralization curves and IC50 data), and indeed PDB:5UQY seems to suggest that these amino acids form a central component of the MR78 epitope (Q128 forms potential hydrogen bonds with CDRH1 Y35 and CDRL3 Y91, while N129 packs against the MR78 CDRH3 and potentially makes additional polar contact with the backbone). Lastly, since neutralization was tested in both HEK293T cells and Huh7 cells in Figure 3, the authors should clarify which cells were used for neutralization in Figure 2.

      Author response: Thank for your advice. Accordingly, in the revised ms, the neutralization curve of AF-03 and MR78 is presented in revised Fig. 2A. The neutralization of AF-03 to pseudotyped MARV harboring Q128S/N129S or C226Y is impaired significantly compared with WT MARV and those bearing other indicated mutations, while Q128S/N129S instead of C226Y mutation affect the neutralizing capacity of MR78 at a certain extent. This is consistent with the data on the binding of AF-03 or MR78 to MARV GP protein assayed by ELISA (see revised Fig. 2B). Overall, these results show that Q128/N129/C226 functions as key amino acids responsible for AF-03 neutralization.

      Figure 3: The first two images in Figure 3C showing bioluminescent intensity from pseudovirus-injected mice pretreated with either 10mg/kg or 3mg/kg AF03 are identical images. This is apparent from the location, shape, and intensity of the bioluminescence, as well as the identical foot placement of each mouse in these two panels. Currently, this figure is incomplete and should be corrected to show the different mice treated with either 10mg/kg or 3mg/kg of AF03.

      Author response: Thank for your carefulness. Indeed, it is our mistake. In the revised ms, this fault has been corrected. The correct images have been added (see revised Fig. 3C).

      Figure 4 would benefit from a control experiment without antibodies comparing infection with GP-cleaved and GP-uncleaved pseudoviruses. The paragraph describing these data was also difficult to read and would benefit from additional grammatical review.

      Author response: Accordingly, a control experiment comparing the infection of GP-cleaved with GP-uncleaved pseudoviruses is performed. The results show that The infection of pseudotyped ebolavirus harboring cleaved GP to host cells is comparable or stronger than those containing intact GP(see revised Fig. s1). Therefore, the data in Fig. 4 support the inhibition of cell entry of ebolavirus species harboring cleaved GP by AF-03, which is not attributed to the possible impairment of cell entry capacity of GPcl-containing ebolavirus. In addition, the sentences have been modified to be read smoothly.

      Figure 5: The authors should clarify in the methods section that the "mock" experiment included the PE anti-human IgG Fc antibody. Without this clarification, the lack of a distinct negative population in the FACS data could be interpreted as non-specific staining with PE. If the PE antibody was added at an equivalent concentration to all panels, what does the directionality of the arrowheads in Figure 5A (labelled PE) and 5B (labelled pHrodo Red) indicate?

      Author response: Thank for your advice. In the revised version, we denote that the mock is actually a human IgG isotype in the figure legend. The arrowheads denote the fluorescence intensity of PE or pHrodo on the lateral axis of the plots. Of course, herein the percentage of PE or pHrodo-positive cells is shown.

      Figure 6B: These data would benefit from the inclusion of IC50, transparency of antibody concentrations used, and consistency in the direction of antibody concentrations (increasing to the right or left of the x-axis) when compared to Figure 2.

      Author response: The concentration of antibody titrated is shown in figure legends. The direction of antibody concentrations is unified throughout the paper. Although IC50 is not included, these data clearly show that AF03-NL rather than AF-03 prominently inhibits the cell entry of EBOV mutants.

      Reviewer #1 (Recommendations For The Authors):

      Line 143: anti-human should be anti-human.

      Line 223: From the SDS-PAGE results, it's not clear that the AF-03 was expressed in the eukaryotic cell line. Please, rephrase the sentence.

      Line 263: ELISA experiments can't be used to determine affinity.

      Line 394: Flyak et al. generated human antibodies from PBMC samples of Marburg survivors, not plasma samples.

      Author response: According to reviewer's advice, the sentences have been modified or corrected to more accurately describe the results. As well, the grammatic errors in the ms have been corrected carefully.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major Concerns:

      (1) An important point that the authors should clarify in this study is whether mice are detecting qualitative or quantitative differences between fresh and old cat saliva. Do the environmental conditions in which the old saliva was maintained cause degradation of Fel d 4, the main protein known for inducing a defensive response in rodents? (see Papes et al, 2010 again). If that is the case, one would expect that a lower concentration of Fel d 4 in the old saliva after protein degradation would result in reduced antipredator responses. Alternatively, if the authors believe that different proteins that are absent in the old saliva are contributing to the increased defensive responses observed with the fresh saliva, further protein quantification experiments should be performed. An important experiment to differentiate qualitative versus quantitative differences between the two types of saliva would be diluting the fresh saliva to verify if the amount of protein, rather than the type of protein, is the main factor regulating the behavioral differences.

      We thank the reviewer for their important suggestions. We agree that both the quality and quantity of molecular components in saliva undergo changes after the saliva is kept at room temperature for 4 hours. Our findings indicate that mice detect these changes through the VNO and adjust their defensive response patterns accordingly. For instance, freezing behavior is reduced in response to 4-hour-old saliva compared to fresh saliva. On the other hand, the duration of interaction with saliva (investigation behavior) remains low, and the stress hormone ACTH level is upregulated in both cases. A future study ought to identify the specific molecules—most likely proteins or peptides—in cat saliva responsible for these distinct defensive responses in mice. While Fel d 4 stands as one of the potential candidates as it has been shown to induce a form of defensive behavior in mice (Papes et al., 2010), there exists a possibility of a different molecule or a combination of multiple molecules playing a role. Once the molecules are identified, it is imperative to investigate how their quantity and quality change over time and how these factors correlate with freezing behavior in mice. Such an exploration will provide answers to this ethologically significant question raised by the reviewer. We added a paragraph in Discussion under the “The VNO as the sensor of predator cues that induce fear-related behavior” section to clarify this.

      (2) The authors claim that fresh saliva is recognized as an immediate danger by rodents, whereas old saliva is recognized as a trace of danger. However, the study lacks empirical tests to support this interpretation. With the current experimental tests, the behavioral differences between animals exposed to fresh vs. old saliva could be uniquely due to the reduced amount of the exact same protein (e.g., Fel d 4) in the two samples of saliva.

      As mentioned in response to comment 1, we agree with the alterations in both the quality and quantity of molecules within saliva after 4 hours. What we would like to emphasize in our current study is that mice detect these time-dependent changes through the VNO and subsequently adjust their defensive response patterns. Identifying the specific molecules responsible for inducing behavioral changes and investigating their time-dependent alterations is crucial in the next step. We added a paragraph in the Discussion under the 'The VNO as the sensor of predator cues that induce fear-related behavior' section to clarify this.

      (3) In Figure 4H, the authors state that there were no significant differences in the number of cFos-positive cells between the two saliva-exposed groups. However, this result disagrees with the next result section showing that fresh and old saliva differentially activate the VMH. It is unclear why cFos quantification and behavioral correlations were not performed in other upstream areas that connect the VNO to the VMH (e.g., BNST, MeA, and PMCo). That would provide a better understanding of how brain activity correlates with the different types of behaviors reported with the fresh vs. old saliva.

      We greatly appreciate this valuable advice. We added c-Fos immunoreactivity (IR) data in the BNST, MeApv, and PAG, together with the data for VMH as shown in new Figure 4G-J. Upon exposure to both fresh and old saliva, we observed an upregulation trend of cFos in the MeApv, VMH, and dPAG, but not in the BNST, compared to the control stimulus.

      Moreover, we conducted correlation analyses between the numbers of cFos-positive neurons and the duration of freezing behavior in those neural substrates, which have been added to new Figure 5. The numbers of cFos-IR signals in neurons in the BNST and dPAG did not correlate with the duration of freezing behavior in any of the exposure groups (Figure 5C, F). However, in addition to a significant positive correlation in the VMH for the fresh saliva-exposed group (R2 = 0.5708, 95% CI [-0.1449, 0.9714], p = 0.0412) (Figure 5E), we observed a similar positive correlation trend in the MeApv (R2 = 0.3854, 95% CI [0.3845, 0.9525], p = 0.0942), although it was not statistically significant possibly due to low sample numbers (Figure 5D).

      Based on these results, our current circuit model is as follows: different numbers of the VNO sensory neurons activated by fresh and old saliva result in differential excitation levels in mitral cells in the AOB. This, in turn, leads to the differential activation of targeting neural substrates, possibly MeApv, resulting in the differential activation of VMH neurons. This model is depicted in Figure 7 and discussed under the section of 'Differential processing of fresh and old saliva signals in the VNO-to-VMH pathway' in the Discussion."

      (4) The interpretation that fresh and old saliva activates different subpopulations of neurons in the VMH based on the observation that cFos positively correlates with freezing responses only with the fresh saliva lacks empirical evidence. To address this question, the authors should use two neuronal activity markers to track the response of the same population of VHM cells within the same animals during exposure to fresh vs. old saliva. Alternatively, they could use single-cell electrophysiology or imaging tools to demonstrate that cat saliva of distinct freshness activates different subpopulations of cells in the VMH. Any interpretation without a direct within-subject comparison or the use of cell-type markers would become merely speculative. Furthermore, the authors assume that differential activations of mitral cells between fresh and old saliva result in the differential activation of VMH subpopulations (page 13, line 3). However, there are intermediate structures between the mitral cells and the VMH, which are completely ignored in this study (e.g., BNST, medial amygdala).

      We appreciate this important feedback. We agree that performing a same-animal comparison for fresh and old saliva exposure will offer direct evidence of the differential activation of a sub-population of VMH neurons. However, there is technical difficulties. We have stimulated the same animal with the same or different types of swabs (e.g., Freshcontrol, fresh-fresh, fresh-old, or old-fresh) and observed that once mice were exposed to a saliva-containing swab and exhibited freezing behavior, they no longer made contact with the second swab within the timeframe when two different types of neuroactivity markers can be analyzed. As shown in Figure 2A, direct contact with the saliva swab is necessary for triggering saliva-elicited freezing behavior. Therefore, we concur that conducting further investigations into real-time neural activation responses to both fresh and old saliva within the same subjects, using an appropriate stimulus delivery method into the VNO, as demonstrated in (Bansal et al., 2021; Ben-Shaul et al., 2010; Bergan et al., 2014), would be useful to strengthen our argument.

      For the second part of the comment regarding the intermediate structures between the mitral cells and the VMH, please refer to our comment above in response to comment 3.

      (5) The authors incorrectly cited the Papes et al., 2010 article on several occasions across the manuscript. In the introduction, the authors cited the Papes et al 2010 study to make reference to the response of rodents to chemical cues, but the Papes et al. study did not use any of the chemical cues listed by the authors (e.g., fox feces, snake skin, cat fur, and cat collars). Instead, the Papes et al. 2010 article used the same chemical cue as the present study: cat saliva. The Papes et al. 2010 article was miscited again in the results section where the authors cited the study to make reference to other sources of cat odor that differ from the cat saliva such as cat fur and cat collars. Because the Papes et al. 2010 article has previously shown the involvement of Trpc2 receptors in the VNO for the detection of cat saliva and the subsequent expression of defensive behaviors by using Trpc2-KO mice, the authors should properly cite this study in the introduction and across the manuscript when making reference to their findings.

      The study conducted by Papes et al. in 2010 (Papes et al., 2010) explored mouse defensive responses triggered by native odors derived from three natural mouse predator species: cat, snake, and rat. These odors were derived from neck fur swabs, shed skin, and urine, respectively. Notably, all three types of samples induced defensive risk assessment and avoidance behaviors in mice. These responses were significantly diminished in Trpc2 knock-out (KO) mice, which lack the Trpc2 transduction channel in their vomeronasal sensory neurons, resulting in an impairment in transmitting sensory signals to the brain. Moreover, Papes et al. (2010) mentioned that, 'we did find cat saliva, a potential source of fur chemosignals, sufficient to induce c-Fos expression in the AOB and initiate defensive behavior.' While Papes et al. reported c-Fos expression in the AOB as well as behavioral responses induced by cat saliva in C57BL/6 mice, they did not provide information regarding the c-Fos expression or the defensive behavioral responses to cat saliva in Trpc2KO mice. Overall, we highly value these findings and explicitly state in the results section of our study that ‘Cat saliva has been considered as a source of predator cues found on cat fur and collars, which induce defensive behaviors in rodents (Engelke et al., 2021; Papes et al., 2010),’ providing the rationale for our utilization of cat saliva in our experimental design.

      (6) In the introduction, the authors hypothesized that the VNO detects predator cues and sends sensory signals to the VMH to trigger defensive behavioral decisions and stated that direct evidence to support this hypothesis is still missing. However, the evidence that cat saliva activates the VMH and that activity in the VMH is necessary for the expression of antipredator defensive response in rodents has been previously demonstrated in a study by Engelke et al., 2021 (PMID: 33947849), which was entirely omitted by the authors.

      We appreciate this insightful comment. Our original sentence meant that the direct evidence was missing for the hypothesis that the mouse VNO detects predator cues and sends sensory signals to the VMH, triggering appropriate defensive behavioral decisions. To clarify this, we altered the sentence (the last sentence of the second last paragraph in Introduction) to “However, how the sensory signals detected through the VNO-to-VMH circuitry modulate behavioral decisions in specific contexts remains elusive.

      The study in Engelke et al., 2021(Engelke et al., 2021) has shown that cat saliva activates the VMH and that activity in the VMH is necessary for the expression of antipredator defensive response, including freezing behavior, in rats. This important paper is now cited at multiple locations; page 4 line 16, page 9 line 8, and page 14 line 17. Interestingly, the vomeronasal receptor genes expressed in cat saliva-responsive VNO neurons, V2R-A4 subfamily genes, seem to have expanded independently within mice and rats, lacking direct V2R-A4 orthologues between mice and rats (Rocha et al. submitted). Therefore, exploring the sensory mechanism behind the induction of defensive behavioral responses in rats by cat saliva would be highly intriguing. Comparing the mechanism operating in rats with that observed in mice could offer valuable insights into understanding how the divergent sensory signaling pathways lead to the VMH-mediated defensive behavioral responses across different species.

      (7) In the discussion, the authors stated that their findings suggest that the induction of robust freezing behavior is mediated by a distinct subpopulation of VMH neurons. The authors should cite the study by Kennedy et al., 2020 (PMID: 32939094) that shows the involvement of VMH in the regulation of persistent internal states of fear, which may provide an alternative explanation for why distinct concentrations of saliva could result in different behavioral outcomes.

      We appreciate this valuable advice to cite this important paper. It is now cited at page 14 line 17 in the Discussion under “Differential activation of VMH neurons potentially underlying distinct intensities of freezing behavior.” We agree that it is intriguing to hypothesize that different freshness of cat saliva induces different degree of persistence of neural activity in a subpopulation of VMH neurons, which regulates the freezing behavior intensity.

      (8) The anatomical connectivity between the olfactory system and the ventromedial hypothalamus (VMH) in the abstract is unclear. The authors should clarify that the VMH does not receive direct inputs from the vomeronasal organ (VNO) nor the accessory olfactory bulb (AOB) as it seems in the current text.

      We apologize for the confusion caused by our statement in the abstract. The reviewer is correct that the VMH does not receive direct inputs from the VNO and AOB. The abstract now states: 'The vomeronasal organ (VNO) is one of the major sensory input channels through which predator cues are detected with ascending inputs to the medial hypothalamic nuclei, especially to the ventromedial hypothalamus (VMH), through the medial amygdala (MeA) and bed nucleus of the stria terminalis (BNST).’

      Reviewer #2 (Public Review):

      Weakness:

      The findings are relatively preliminary. The identities of the receptor and the ligand in the cat saliva that induces the behavior remain unclear. The identity of VMH cells that are activated by the cat saliva remains unclear. There is a lack of targeted functional manipulation to demonstrate the role of V2R-A4 or VMH cells in the behavioral response to cat saliva.

      We concur with the reviewer’s comments and agree with the necessity to explore the behavioral response to cat saliva in mice with V2R-A4 receptor(s) knocked out, alongside those with targeted functional manipulations in the VMH. These future studies will allow us to further elucidate the molecular and neural mechanisms underlying this sensory-tohypothalamic circuit.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) It is unclear if fresh and old saliva indeed alter the perceived imminence predation, as claimed by the authors. Prior work indicates that lower imminence induces anxiety-related actions, such as re-organization of meal patterns and avoidance of open spaces, while slightly higher imminence produces freezing. Here, the authors show that fresh and old predator saliva only provoke different amounts of freezing, rather than changing the topography of defensive behaviors, as explained above. Another prediction of predatory imminence theory would be that lower imminence induced by old saliva should produce stronger cortical activation, while fresh saliva would activate the amygdala, if these stimuli indeed correspond to significantly different levels of predation imminence.

      We thank the reviewer for this valuable insight. In our current study, we exclusively compared defensive behavioral responses to 15-minute-old and 4-hour-old cat saliva in mice within their home cages. In future studies, it would be intriguing to expand this investigation by examining behavioral changes in response to saliva collected at additional time points across diverse behavioral settings. Additionally, exploring neural activity in various brain regions in future studies would complement our understanding of these responses.

      (2) It is known that predator odors activate and require AOB, VNO, and VMH, thus replications of these findings are not novel, decreasing the impact of this work.

      We acknowledge the previous findings mentioned by the reviewer. Our finding in this paper is that cat saliva samples with different freshness predominantly activate different numbers of VNO sensory neurons expressing the same subfamily of sensory receptors, which results in differential activation of the downstream circuit to modulate behavioral outputs.

      (3) There is a lack of standard circuit dissection methods, such as characterizing the behavioral effects of increasing and decreasing the neural activity of relevant cell bodies and axonal projections, significantly decreasing the mechanistic insights generated by this work.

      We thank the reviewer for the valuable comments. We acknowledge that exploring the behavioral effects through the manipulation of specific cell types within defined neural substrates, along with characterizing circuit connectivity, is crucial to understand this circuit more thoroughly in future studies.

      (4) The correlation shown in Figure 5c may be spurious. It appears that the correlation is primarily driven by a single point (the green square point near the bottom left corner). All correlations should be calculated using Spearman correlation, which is non-parametric and less likely to show a large correlation due to a small number of outliers. Regardless of the correlation method used, there are too few points in Figure 5c to establish a reliable correlation. Please add more points to 5c.

      We thank the reviewer for this important suggestion. We assessed normality of the data using the Shapiro-Wilk and Kolmogorov-Smirnov tests, confirming that the dataset is parametric. We anticipate employing a larger sample size in future studies to further examine rigorous correlation patterns.

      (5) Some of the findings are disconnected from the story. For example, the authors show that V2R-A4-expressing cells are activated by predator odors. Are these cells more likely to be connected to the rest of the predatory defense circuit than other VNO cells?

      Yes, our hypothesis posits that V2R-A4-expressing VNO sensory neurons serve as receptor neurons for predator cues present in cat saliva. Additionally, we assume that these specific sensory neurons have stronger anatomical connections with the defensive circuit compared to VNO sensory neurons expressing other receptor subfamilies. In our modified Discussion section, we discussed this point under “V2R-A4 subfamily as the receptor for predator cues in cat saliva.”

      (6) Were there other behavioral differences induced by fresh compared to old saliva? Do they provoke differences in stretch-attend risk evaluation postures, number of approaches, the average distance to odor stimulus, the velocity of movements towards and away from the odor stimulus, etc?

      We appreciate the reviewer's valuable comments. We have now incorporated an analysis of stretch-sniff risk assessment behavior, presented in new Figure 1F (graph) and Supplemental Figure 1B (raster plot). Mice exhibited stretch-sniff risk assessment behavior, which remained consistent across control, fresh saliva, and old saliva swabs. Additionally, we have also included a raster plot for direct investigation, previously noted as ‘interaction’ in the original manuscript (Supplemental Figure 1C). Mice exposed to a swab containing either fresh or old saliva significantly avoided directly investigating the swab. In contrast, mice exposed to a clean control swab spent a significant amount of time directly investigating the swab, engaging in behaviors such as sniffing and chewing (Figure 1G). A comparison of temporal behavioral patterns revealed a slightly higher frequency of direct investigation behavior toward old saliva compared to fresh saliva at the beginning of the exposure period (Supplemental Figure 1C).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (A) In the discussion (page 13, line 13), the authors proposed approaches to isolate receptors among the V2R-A4 subfamily that could be responsible for the detection of predator cues in cat saliva such as mRNA profiling from cells isolated from VNO GCaMP imaging. However, the authors argue that this method can lead to false positive results. The authors should clarify what they mean by this exactly.

      We meant that pairing of kairomones and their cognate vomeronasal receptors is overall challenging, and subsequent confirmations by performing loss-of-function, as well as gainof-function studies, are necessary to avoid false positive receptor-ligand pairings. We modified the sentence in the discussion as follows: “…. as well as receptor mRNA profiling from isolated single cells activated by cat saliva in GcaMP imaging using the VNO slices in vitro (Haga-Yamanaka et al., 2014; Wong et al., 2020). Receptor candidates identified using either of the methods can be further confirmed by examining necessity and sufficiency for detecting cat saliva using genetically modified mouse lines.”

      (B) In the discussion, the authors mention that imminent predator cues present in the cat saliva activate a specific population of VMN neurons. However, the authors have not demonstrated that imminent predator cues exist and the differences between fresh and old saliva are not simply a matter of concentration and integrity of the same protein (see a similar concern in item 2 above).

      In alignment with our responses to the reviewer’s public comments 1 and 2, we acknowledge the changes in both the quality and quantity of molecules in cat saliva when kept at room temperature for 4 hours. Our findings demonstrate that mice detect this timedependent alteration through the VNO, leading to subsequent adjustments in their defensive response patterns. The identification of specific molecules responsible for inducing behavioral changes and an exploration of their time-dependent alterations are crucial steps in our ongoing research. To provide further clarification, we have added a paragraph in the discussion section under 'The VNO as the sensor of predator cues that induce fear-related behavior.’

      (C) In the introduction, the authors cite several studies and reviews that investigated sensory neural circuits that mediate behavioral responses to chemical predator cues in mice. However, the majority of these studies used rats. Therefore, it is recommended to instead indicate that these studies focus on using rodent models.

      We appreciate this insightful comment. We have now replaced the term 'mice/mouse' with 'rodents' in corresponding parts of the manuscript.

      (D)The description of the extended amygdala is unclear and gives the impression that the posteroventral part of the medial amygdala is also part of the extended amygdala (page 3, line 25).

      We appreciate the reviewer’s important feedback. We have removed the phrase 'the extended amygdala consisting of' from the text.

      (E) The authors should justify why they have focused on the role of V2R-A4 in cat saliva detection. As shown in the Figure 3A schematic, many other receptors within the V2R family could have been evaluated. Additionally, the authors should indicate how many mice were used for calculating the ratio for each receptor in Figure 3C, and a group comparison should be performed.

      As shown in Supplemental Figure 2 and Figure 3C, our initial investigation involved assessing the co-localization of pS6 signals with signals derived from in situ hybridization probes for all V2R subfamilies. Each probe was designed to recognize all the receptor genes within the subfamily under the tested conditions. This examination led to the identification of V2R-A4, whose probe signals overlap with pS6 signals induced by exposure to cat saliva. In Figure 3C, the percentage of total overlap between the in situ probe and pS6 signals in VNO sections was examined from n=3-6 animals, which is now mentioned in the modified figure legend.

      (F) The authors should make it clear to readers at the very beginning of the manuscript that the behavioral differences between fresh and old saliva are not caused by the inefficiency of the old cat saliva to induce defensive responses. Thus, other antipredator behavioral responses should be also quantified (e.g., avoidance time, number and time of investigations to the cat saliva source, risk-assessment, etc.)

      We appreciate this valuable comment from the reviewer. In the original version of our manuscript, we used the term 'interaction' to indicate 'direct interaction with the swab for investigation.' We have now replaced the term 'interaction' with 'direct investigation' and added the temporal patterns of these behavioral episodes in Supplemental Figure 1C. Our observations indicate that mice avoid directly investigating both fresh and old saliva compared to the control (Figure 1G). However, there is a slight increase in investigation behavior toward old saliva at the beginning of exposure compared to fresh saliva (Supplemental Figure 1C). Furthermore, we have included the duration (Figure 1F) and temporal patterns (Supplemental Figure 1B) of stretch-sniff risk assessment behavior. Notably, stretch-sniff behavior did not differ towards control, fresh, and old saliva swabs.

      (G) The selected representative images for Gαo- and pS6-labeled neurons in Figure 2 should have similar levels of DAPI labeling. Further, the plot depicting the duration of freezing as a function of pS6-IR signals in the VNO (Figure 2H) is difficult to follow. The authors should indicate on the graph which data points represent fresh or old cat saliva exposure, similar to the style used in Figure 5 plots.

      We have replaced the representative image in Figure 2E to align the DAPI intensity. Additionally, we updated the data points in Figure 2H and introduced a color code to indicate saliva types.

      (H) The schematic in Figure 4 is misleading because the AOB does not directly project to the VMH. The authors should explain which regions are conveying indirect predator information from AOB to VMH (see a similar concern in item 7 above).

      We thank the reviewer’s important feedback. We modified the image in Figure 4A to show the entire defensive behavior circuit initiated from the VNO.

      Reviewer #2 (Recommendations For The Authors):

      (1) This result suggests that V2R-A4 may be the dominant VR for mice to detect cat saliva.

      Future studies should determine the identity of the receptor and the ligand in the cat saliva. Additionally, the functional importance of V2R-A4 remains unclear. It is important to knockout the receptor and test changes in cat saliva-induced freezing.

      We concur with the reviewer’s comments and recognize the necessity of exploring the behavioral response to cat saliva in mice with V2R-A4 receptor(s) knocked out. Moreover, the identification of the ligand in cat saliva is critical for a deeper understanding of the molecular mechanisms in future studies.

      (2) AOB does not project to VMH directly. Other known important nodes for the predator defense circuit include MeApv, BNST, PMd, AHN, and PAG. It will be helpful to provide c-Fos data in those regions (especially MEA and BNST as they are between AOB and VMH) to provide a complete picture of how the brain processes cat saliva to induce the behavior change.

      We appreciate this important feedback by the reviewer. We have now added c-Fos expression analysis data in the BNST, MeApv, and PAG, in addition to the VMH. Upon exposure to fresh and old saliva, we observed the upregulation of cFos in the MeApv, VMH, and dPAG, but not in the BNST, compared to the control stimulus. The data are now shown in Figure 4G-J. Moreover, we also added correlation analyses between the numbers of cFospositive neurons and the duration of freezing behavior in those neural substrates to Figure 5. The numbers of cFos-IR signals in neurons in the BNST and dPAG, did not correlate with the duration of freezing behavior in any of the exposure groups (Figure 5C, F). However, in addition to a significant positive correlation in the fresh saliva-exposed group in the VMH (R2 = 0.5708, 95% CI [-0.1449, 0.9714], p = 0.0412) (Figure 5E), we observed a similar positive correlation trend in the MeApv (R2 = 0.3854, 95% CI [-0.3845, 0.9525], p = 0.0942), although it was not statistically significant possibly due to low sample numbers (Figure 5D). Based on these results, our current circuit model is as follows: different numbers of the VNO sensory neurons activated by fresh and old saliva result in differential excitation levels in mitral cells in the AOB. Differential excitation of mitral cells leads to the differential activation of targeting neural substrates, possibly MeApv, which results in differential activation of VMH neurons. This model is depicted in Figure 7 and discussed under the section of “Differential processing of fresh and old saliva signals in the VNO-toVMH pathway” in Discussion.

      (3) It is interesting that activation level difference in the VNO by old and fresh cat saliva does not transfer to AOB. It could be informative to examine the correlation between VNO and AOB p6/c-Fos cell number and AOB and VMH c-Fos cell number across animals to understand whether the activation levels across those regions are related. If they are not correlated, it could be helpful to add a discussion regarding potential reasons, e.g. neuromodulatory inputs to the AOB.

      We agree that analyzing the number of pS6/cFos-positive cells from all the regions in the same animals are ideal; however, due to technical difficulties, we were unable to collect the entire set of neural substrates from the same animals.

      (4) Please indicate n in all figure plots and specify what individual dots mean. In Figure 4h, there are 7 dots in the old saliva group, presumably indicating 7 animals. In Figure 6b, there appear to be more than 7 dots for the old cat saliva group. Are there more than 7 animals? If so, why are they not included in Figure 4h? If not, what does each dot mean? Note that each dot should represent an independent sample. One animal should not contribute more than one dot.

      We apologize for the confusion about Figure 6b. Each of these dots indicates the number of cFos signals in a single VMH hemisphere sample. The data used for this analysis were the same as the ones for the VMH used in Figure 4. This is now clarified in the figure legends.

      (5) The identification of a cluster of VMHdm cells uniquely activated by fresh cat saliva urine is interesting. It will be important to identify the molecular handle of the cells to facilitate further investigation. This could be achieved using either activity-dependent RNAseq or double in situ of saliva-induced c-Fos and candidate genes (candidate gene may be identified based on the known gene expression pattern).

      We agree that these experiments are very valuable. We would like to perform those experiments in future studies.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please cite recent relevant papers showing VMH activity induced by predators, such as https://pubmed.ncbi.nlm.nih.gov/33115925/ and https://pubmed.ncbi.nlm.nih.gov/36788059/

      We thank the reviewer’s suggestion to cite these important papers. https://pubmed.ncbi.nlm.nih.gov/33115925/ (Esteban Masferrer et al., 2020) and https://pubmed.ncbi.nlm.nih.gov/36788059/ (Tobias et al., 2023) are now cited at page 14 line 17 in the Discussion under “Differential activation of VMH neurons potentially underlying distinct intensities of freezing behavior.”

      (2) Add complete statistical information in the figure legends of all figures, which should include n, name of test used, and exact p values.

      We included statistical analysis results in figure legends; for Figure 6B, we provided statistical analysis results in Supplemental Table 1.

      (3) Please paste all figure legends directly below their corresponding figure to make the manuscript easier to read.

      We have added figure legends directly below their corresponding figures.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      Statistics analysis results have been included in figure legends and supplemental table 1.

      References

      Bansal R, Nagel M, Stopkova R, Sofer Y, Kimchi T, Stopka P, Spehr M, Ben-Shaul Y. 2021. Do all mice smell the same? Chemosensory cues from inbred and wild mouse strains elicit stereotypic sensory representations in the accessory olfactory bulb. BMC Biol 19:133.

      Ben-Shaul Y, Katz LC, Mooney R, Dulac C. 2010. In vivo vomeronasal stimulation reveals sensory encoding of conspeciic and allospeciic cues by the mouse accessory olfactory bulb. Proc Natl Acad Sci U S A 107:5172‒5177.

      Bergan JF, Ben-Shaul Y, Dulac C. 2014. Sex-speciic processing of social cues in the medial amygdala. Elife 3:e02743.

      Engelke DS, Zhang XO, OʼMalley JJ, Fernandez-Leon JA, Li S, Kirouac GJ, Beierlein M, Do-Monte FH. 2021. A hypothalamic-thalamostriatal circuit that controls approachavoidance conlict in rats. Nat Commun 12:2517.

      Esteban Masferrer M, Silva BA, Nomoto K, Lima SQ, Gross CT. 2020. Differential Encoding of Predator Fear in the Ventromedial Hypothalamus and Periaqueductal Grey. J Neurosci 40:9283‒9292.

      Papes F, Logan DW, Stowers L. 2010. The vomeronasal organ mediates interspecies defensive behaviors through detection of protein pheromone homologs. Cell 141:692‒703.

      Tobias BC, Schuette PJ, Maesta-Pereira S, Torossian A, Wang W, Sethi E, Adhikari A. 2023. Characterization of ventromedial hypothalamus activity during exposure to innate and conditioned threats. Eur J Neurosci 57:1053‒1067.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      PUBLIC REVIEWS

      Reviewer #1 (Public Review):

      In this study, the authors investigate the role of triglycerides in spermatogenesis. This work is based on their previous study (PMID: 31961851) on triglyceride sex differences in which they showed that somatic testicular cells play a role in whole body triglyceride homeostasis. In the current study, they show that lipid droplets (LDs) are significantly higher in the stem and progenitor cell (pre-meiotic) zone of the adult testis than in the meiotic spermatocyte stages. The distribution of LDs anti-correlates with the expression of the triglyceride lipase Brummer (Bmm), which has higher expression in spermatocytes than early germline stages. Analysis of a bmm mutant (bmm[1]) - a P-element insertion that is likely a hypomorphic - and its revertant (bmm[rev]) as a control shows that bmm acts autonomously in the germline to regulate LDs. In particular, the number of LDs is significantly higher in spermatocytes from bmm[1] mutants than from bmm[rev] controls. Testes from males with global loss of bmm (bmm[1]) are shorter than controls and have fewer differentiated spermatids. The zone of bam expression, typically close to the niche/hub in WT, is now many cell diameters away from the hub in bmm[1] mutants. There is an increase in the number of GSCs in bmm[1] homozygotes, but this phenotype is probably due to the enlarged hub. However, clonal analyses of GSCs lacking bmm indicate that a greater percentage of the GSC pool is composed of bmm[1]-mutant clones than of bmm[rev]-clones. This suggests that loss of bmm could impart a competitive advantage to GSCs, but this is not explored in greater detail. Despite the increase in number of GSCs that are bmm[1]-mutant clones, there is a significant reduction in the number of bmm[1]-mutant spermatocyte and post-meiotic clones. This suggests that fewer bmm[1]mutant germ cells differentiate than controls. To gain insights into triglyceride homeostasis in the absence of bmm, they perform mass spec-based lipidomic profiling. Analyses of these data support their model that triglycerides are the class of lipid most affected by loss of bmm, supporting their model that excess triglycerides are the cause of spermatogenetic defects in bmm[1]. Consistent with their model, a double mutant of bmm[1] and a diacylglycerol Oacyltransferase 1 called midway (mdy) reverts the bmm-mutant germline phenotypes.

      There are numerous strengths of this paper. First, the authors report rigorous measurements and statistical analyses throughout the study. Second, the authors utilize robust genetic analyses with loss-of-function mutants and lineage-specific knockdown. Third, they demonstrate the appropriate use of controls and markers. Fourth, they show rigorous lipidomic profiling. Lastly, their conclusions are appropriate for the results. In other words, they don't over-state the results. Overall, the rigorously quantified results support the major aim that appropriate regulation of triglycerides are needed in a germline cell-autonomous manner for spermatogenesis.

      This paper should have a positive impact on the field. First and foremost, there is limited knowledge about the role of lipid metabolism in spermatogenesis. The lipidomic data will be useful to researchers in the field who study various lipid species. Going forward, it will be very interesting to determine what triglycerides regulate in germline biology. In other words, what functions/pathways/processes in germ cells are negatively impacted by elevated triglycerides. And as the authors point out in the discussion, it will be important to determine what regulates bmm expression such that bmm is higher in later stages of germline differentiation.

      We thank the Reviewer for their positive assessment of our revised manuscript!

      Reviewer #2 (Public Review):

      Summary:

      Here, the authors show that neutral lipids play a role in spermatogenesis. Neutral lipids are components of lipid droplets, which are known to maintain lipid homeostasis, and to be involved in non-gonadal differentiation, survival, and energy. Lipid droplets are present in the testis in mice and Drosophila, but not much is known about the role of lipid droplets during spermatogenesis. The authors show that lipid droplets are present in early differentiating germ cells, and absent in spermatocytes. They further show a cell autonomous role for the lipase brummer in regulating lipid droplets and, in turn, spermatogenesis in the Drosophila testis. The data presented show that a relationship between lipid metabolism and spermatogenesis is congruous in mammals and flies, supporting Drosophila spermatogenesis as an effective model to uncover the role lipid droplets play in the testis.

      Strengths and weaknesses:

      The authors do a commendably thorough characterization of where lipid droplets are detected in normal testes: located in young somatic cells, and early differentiating germ cells. They use multiple control backgrounds in their analysis, including w[1118], Canton S, and Oregon R, which adds rigor to their interpretations. The authors employ markers that identify which lipid droplets are in somatic cells, and which are in germ cells. The authors use these markers to present measured distances of somatic and germ cell-derived lipid droplets from the hub. Because they can also measure the distance of somatic and germ cells with age-specific markers from the hub, these results allow the authors to correlate position of lipid droplets with the age of cells in which they are present. This analysis is clearly shown and well quantified.

      The quantification of lipid droplet distance from the hub is applied well in comparing brummer mutant testes to wild type controls. The authors measure the number of lipid droplets of specific diameters, and the spatial distribution of lipid droplets as a function of distance from the hub. These measurements quantitatively support their findings that lipid droplets are present in an expanded population of cells further from the hub in brummer mutants. The authors further quantify lipid droplets in germline clones of specified ages; the quantitative analysis here is displayed clearly and supports a cell autonomous role for brummer in regulating lipid droplets in spermatocytes.

      Data examining testis size and number of spermatids in brummer mutants clearly indicates the importance of regulating lipid droplets to spermatogenesis. The authors show beautiful images supported by rigorous quantification supporting their findings that brummer mutants have both smaller testes with fewer spermatids at both 29 and 25C. There is also significant data supporting defects in testis size, but not spermatid number, in 14-day-old brummer mutant animals compared to controls. Their analysis clearly shows an expanded region beyond the testis apex that includes younger germ cells, supporting a role for lipid droplets influencing germ cell differentiation during spermatogenesis.

      The authors present a series of data exploring a cell autonomous role for brummer in the germline, including clonal analysis and tissue specific manipulations. The clonal data indicating increased lipid droplets in spermatocyte clones, and a higher proportion of brummer mutant GSCs at the hub are convincing and supported by quantitation. The authors also show a tissue specific rescue of the brummer testis size phenotype by knocking down mdy specifically in germ cells, which is also supported by statistically significant quantitation. The authors present data examining the number of spermatocyte and post-meiotic clones 14 days after clonal induction. Their finding is significant with a p-value of 0.0496, which they acknowledge is less robust than their other data reported in this study, and could be a result of a low sample size. They indicate that future studies might validate these results with additional samples.

      The authors do a beautiful job of validating where they detect brummer-GFP by presenting their own pseudotime analysis of publicly available single cell RNA sequencing data. Their data is presented very clearly, and supports expression of brummer in older somatic and germline cells of the age when lipid droplets are normally not detected. The authors also present a thorough lipidomic analysis of animals lacking brummer to identify triglycerides as an important lipid droplet component regulating spermatogenesis.

      Impact:

      The authors present data supporting the broad significance of their findings across phyla. This data represents a key strength of this manuscript. The authors show that loss of a conserved triglyceride lipase impacts testis development and spermatogenesis, and that these impacts can be rescued by supplementing diet with medium-chain triglycerides. The authors point out that these findings represent a biological similarity between Drosophila and mice, supporting the relevance of the Drosophila testis as a model for understanding the role of lipid droplets in spermatogenesis. The connection buttresses the relevance of these findings and this model to a broad scientific community.

      We thank the Reviewer for their positive assessment of our revised paper!

      RECOMMENDATIONS FOR THE AUTHORS

      Reviewer #2 (Recommendations For The Authors):

      The authors addressed most of my recommendations in a way that is satisfactory to me. I would like a bit more information added to the methods section about how hub area was quantified. For example, did the authors measure area within a defined region in a single Z plane (perhaps the Z plane at the center of the hub, or the Z plane with the largest area)? Alternatively, did they authors measure area in a more 3 dimensional way, i.e. volume. Adding this information to the methods would satisfy all of my previous recommendations.

      We thank the Reviewer for pointing out that this information was not clear in the revised manuscript. We changed the methods section to clarify our methods as follows:

      “The hub was identified as the FasIII-positive area of the testis. Hub size was estimated by measuring the FasIII-positive area in a Z-projected image of the hub in each testis. Zprojections were made using the ‘sum slices’ function in Fiji.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors demonstrated that YAP/TAZ promotes P-body formation in a series of cancer cell lines. YAP/TAZ modulates the transcription of multiple P-body-related genes, especially repressing the transcription of the tumor suppressor proline-rich nuclear receptor coactivator 1 (PNRC1) through cooperation with the NuRD complex. PNRC1 functions as a critical repressor in YAP-induced biogenesis of P-bodies and tumorigenesis in colorectal cancer (CRC). Reexpression of PNRC1 or disruption of P-bodies attenuated the protumorigenic effects of YAP. Overall, these findings are interesting and the study was well conducted.

      We thank the reviewer for the positive comments for our work.

      Major concerns:

      (1) RNAseq data indicated that Yap has the capacity to suppress the expression of numerous genes. In addition to PNRC1, could there be additional Yap targeting factors involved in Yap-mediated the formation of P-bodies?

      Yes, indeed. Additional YAP target genes, such as AJUBA, SAMD4A, are also involved in YAP-mediated the formation of P-bodies (Fig. 1B-D). Knockdown of either SMAD4A or AJUBA attenuated the P-body formation induced by overexpression of YAP5SA (Fig. 3A).

      (2) It is still not clear how PNRC1 regulates P-bodies. Knockdown of PNRC1 prevented the reduction of P-bodies caused by Yap knockdown. How do the genes related to P-bodies that are positively regulated by Yap, such as SAMD4A, AJUBA, and WTIP, change in this scenario? Given that the expression of Yap can differ considerably among various cell types, is it possible for P-bodies to be present in tumor cells lacking Yap expression?

      The detail mechanism of PNRC1’s suppressive effect on P-body formation was well explored in Gaviraghi et al.’s paper, in which PNRC1 was first identified as a tumor suppressor gene (EMBO, 2018, PMID: 30373810). Gaviraghi et al. revealed that overexpression of PNRC1 leads to translocation of cytoplasmic DCP1A/DCP2 into the nucleolus, which subsequently attenuates rRNA transcription and ribosome biogenesis. Since DCP1A and DCP2 are essential for formation of P-bodies, loss of cytoplasmic DCP1A/DCP2 also disrupts P-body formation. This background information has been included in the Results and Discussion sections in the manuscript:

      Previously, we have performed the RNA-seq analysis of HCT116 cells with overexpression of PNRC1. Compared with YAP5SA overexpression (520 differentially expressed genes), overexpression of PNRC1 showed less effect on the gene expression profile (147 differentially expressed genes) and expression of SAMD4A, AJUBA and WTIP were not affected by PNRC1 overexpression.

      In this study, we found that YAP could promote P-body formation in a series of cancer cell lines. During the exploration, we observed that P-bodies hardly existed in the RKO colorectal cancer cell line (Figure 1 for the reviewer). However, the regulatory effect of YAP/TAZ on SAMD4A, AJUBA, and WTIP was still observed (Figure 2 for the reviewer). These data suggest that YAP’s activity could be sufficient but not required for the P-body formation. So, we agree that P-bodies could be present in tumor cells lacking Yap expression.

      Author response image 1.

      Author response image 2.

      (3) The authors demonstrated that CHD4 can bind to Yap target genes, such as CTGF, AJUBA, SAMD4A (Figure 4 - Figure Supplement 1D). Does the NuRD complex repress the expression of these genes? the NuRD complex could prevent the formation of P-bodies?

      Good point! Following the reviewer’s suggestions, we detected the mRNA levels of AJUBA, WTIP and SAMD4A, and the P-body formation the CHD4 knockdown cells. Interestingly, knockdown of CHD4 induced mild downregulation of AJUBA, WTIP and SAMD4A in HCT116 cells (Figure 3 for the reviewer). Of note, NuRD complex is involved in both transcriptional repression and activation (PNAS 2011, PMID: 21490301; Stem Cell Reports. 2021, PMID: 33961790). As expected, knockdown CHD4 induced decreased number of P-bodies in HCT116 cell (new Figure 4-Supplement 1E), which is consistent to the enhanced expression of PNRC1 (Figure 4F).

      Author response image 3.

      Author response image 4.

      (4) YAP/TAZ promotes the formation of P-bodies which contradicts the previous study's conclusion (PMID: 34516278). Please address these inconsistent findings.

      The contradictory observations between our and the previous studies could be due to the different cell lines (HUVEC vs cancer cell lines) and different stimuli (KHSV infection vs normal culture condition or serum stimulation, cell density and stiffness). Actually, we have discussed the contradictory observation in the previous study in the Discussion section as followed:

      “In contrast, a recent study, which provided the first link between YAP and P-bodies, implicated YAP as a negative regulator of P-bodies in KHSV-infected HUVECs (Castle et al, 2021). Elizabeth L. Castle et al. reported that virus-encoded Kaposin B (KapB) induces actin stress fiber formation and disassembly of P-bodies, which requires RhoA activity and the YAP transcriptional program (Castle et al, 2021). YAP-enhanced autophagic flux was proposed to participate in KapB-induced P-body disassembly, consistent with the concept that stress granules and P-bodies are cleared by autophagy (Buchan et al, 2013; Castle et al, 2021). However, an increasing number of studies have reported the contradictory role of YAP in autophagy regulation, which suggests that YAP-mediated autophagy regulation is cell type- and context-dependent (Jin et al, 2021; Pei et al, 2022; Totaro et al, 2019; Wang et al, 2020). Furthermore, though YAP is required for the cell proliferation in HUVEC, transformed cell lines often display elevated baseline YAP/TAZ activity compared to normal cells and possess many alterations in growth signaling pathways including autophagy signaling (Nguyen & Yi, 2019; Shen & Stanger, 2015; Zanconato et al, 2016). Thus, the contradictory observations regarding the role of YAP in modulating P-body formation between Elizabeth L. Castle et al.’s study and our study could be due to the different cell contexts and different cell conditions (baseline vs. KHSV infection).”

      Reviewer #2 (Public Review):

      In a study by Shen et al., the authors investigated YAP/TAZ target genes that play a role in the formation of processing bodies (P-bodies). P-bodies are membraneless cytoplasmic granules that contain translationally repressed mRNAs and components of mRNA turnover. GO enrichment analysis of the RNA-Seq data of colorectal cancer cells (HCT116) after YAP/TAZ knockdown showed that the downregulated genes were enriched in P-body resident proteins. Overexpression, knockdown, and ChIP-qPCR analyses showed that SAMD4A, PNRC1, AJUBA, and WTIP are YAP-TEAD target genes that also play a role in P-body biogenesis. Using P-body markers such as DDX6 and DCP1A, the authors showed that the knockdown of YAP in the HCT116 cell line causes a reduction in the number of P-bodies. Similarly, overexpression of constitutively active YAP (YAP 5SA) increased the P-body number. The YAP-TEAD target genes SAMD4A and AJUBA positively regulate P-body formation, because lowering their expression levels using siRNA reduces the number of P-bodies. The other YAP target gene, PNRC1, is a negative regulator of P-body biogenesis and consistently YAP suppresses its expression through the recruitment of the NuRD complex. YAP target genes that modulate P-body formation play prominent roles in oncogenesis. PNRC1 suppression is key to YAP-mediated proliferation, colony formation, and tumorigenesis in HCT116 xenografts. Similarly, SAMD4 and AJUBA knockdown abrogated cell viability. In summary, this study demonstrated that SAMD4, AJUBA, WTIP, and PNRC1 are bona fide YAP-TEAD target genes that play a role in P-body formation, which is also linked to the oncogenesis of colon cancer cells.

      We thank the reviewer for the positive comments for our work.

      Major Strengths:

      The majority of the experiments were appropriately planned so that the generated data could support the conclusions drawn by the authors. The phenotype observed with YAP/TAZ knockdown correlated inversely with YAP5SA overexpression, which is complementary. Where possible, the authors also used point mutations that selectively disrupt protein-protein interactions, such as YAP S94A and PNRC1 W300A. The CRC cell line HCT116 was used throughout the study; additionally, data from other cancer cell lines were used to support the generality of the findings.

      We thank the reviewer for the positive comments regarding the strength and significance of our work.

      Weaknesses:

      The authors did not elucidate the mechanistic link between P-body formation and oncogenesis; therefore, it is unclear why an increase in the number of P-bodies is pro-tumorigenic. AJUBA and SAMD4 may have housekeeping functions and reduce the proliferation of YAP-independent cell lines. Figure 6 - Figure Supplement 4 shows a reduction in cell viability and migration in control HCT116 cell lines upon AJUBA/SAMD4 knockdown. Therefore, it is unclear whether their tumor suppressive role is YAP-dependent. The authors extrapolated and suggested that their findings could be exploited therapeutically, without providing much detail. How do they plan to stimulate the expression of PNRC1? It is not necessary for every scientific finding to lead to a therapeutic benefit; therefore, they can tone down such statements if therapeutic exploitation is not realistic. The authors elucidated a mechanism for PNRC1 repression and one wonders why no attempts were made to understand the mechanism of activation of SAMD4, AJUBA, and WTIP expression.

      We thank the reviewer for pointing out these issues to further improve the quality of our study. As mentioned in the Abstract section, the role of P-bodies in tumorigenesis and tumor progression is not well studied. In this study, we revealed that disruption of P-body formation by knockdown of essential P-body-related genes attenuates YAP-driven oncogenic function in CRC, which provides evidence implicating the pro-tumorigenic role of P-bodies. We agree with the reviewer that the mechanism of P-body formation promoting tumorigenesis is an important scientific question warranting exploration and plan to investigate this fancy question in next study.

      AJUBA has been known to act as a signal transducer in oncogenesis and promote CRC cell survival (Pharmacol Res. 2020, PMID: 31740385; Oncogene. 2017, PMID: 27893714). Furthermore, as the reviewer suggested, we found that knockdown of both AJUBA and SAMD4A suppressed the cell proliferation in the YAP-deficient cell line, SHP-77, which further implicates the oncogenic role of AJUBA and SAMD4A (Figure 4 for the reviewer). Numerous studies have shown that YAP/TAZ knockdown suppressed the cell proliferation of HCT116 cells. Thus, not surprisingly, knockdown of AJUBA and SAMD4A also repressed the cell proliferation of the “parental” control HCT116 cells. Since the molecular mechanistic studies identified the AJUBA and SAMD4A were bona fide YAP-TEAD target genes, the co-dependencies of YAP and AJUBA/SAMD4A in the HCT116 cells imply that the pro-tumorigenic function of YAP could be dependent on activation of AJUBA/SAMD4A, in some extent (due to the large amount of YAP target genes).

      Author response image 5.

      Tumor suppressor genes are frequently epigenetically silenced in cancer cells, so is PNRC1. In our preliminary study, we found that the DNA methyltransferase inhibitor 5-Azacytidine dramatically increased the mRNA level of PNRC1 in HCT116 cells (Figure 5 for the reviewer), which suggests that PNRC1 is epigenetically suppressed by DNA methylation in CRC cells and could be re-activated or re-expressed by DNA methyltransferase inhibitor for the cancer treatment.

      Author response image 6.

      YAP/TAZ are well-known as transcriptional co-activators and the mechanism of transcriptional activation of target genes has been well-studied (Cell Stress. 2021, PMID: 34782888). However, years later, the function of YAP/TAZ as the transcriptional co-repressors was brought to the forefront. Both NuRD and Polycomb repressive complex 2 (PRC2) are involved in the transcriptional repressor function of YAP (Cell Rep. 2015, PMID: 25843714; Cancer Res. 2020, PMID: 32409309). Thus, we focused on exploring mechanism for PNRC1 repression in this study, but not the mechanism of activation of SAMD4A, AJUBA, and WTIP expression.

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments: The suggested experiments were aimed at minimizing the weaknesses of the manuscript. The roles of AJUBA and SAMD4 can be elucidated in a YAP-independent cell line. After knockdown of AJUBA or SAMD4 in a YAP-independent cell line, the effects on proliferation and migration should be determined.

      Following the reviewer’s suggestions, we explored the role of AJUBA and SAMD4A in the YAP-independent cell line, SHP-77 (Cancer Cell. 2021, PMID: 34270926). Unfortunately, SHP-77 cells are suspension cells mixed with some loosely adherent cells, and we found that SHP-77 cells are not available for cell migration assay. By CCK8 assay, we found that knockdown of both AJUBA and SAMD4A suppressed the cell proliferation in SHP-77 cells, which further implicates the oncogenic role of AJUBA and SAMD4A.

      Author response image 7.

      Experiments directed at elucidating whether the mRNAs of tumor suppressor genes undergo sequestration and decay in P-bodies that ultimately promote tumorigenesis will provide a mechanistic link between P-body formation and tumorigenesis. The enrichment of P-bodies through biochemical methods has been employed in other studies. RNA-seq after P-body enrichment may provide opportunities to unravel the link between P-body formation and tumorigenesis.

      We thank the reviewer for the constructive suggestions to further improve the significance of our study. We do have plans to purify the P-bodies to further elucidate underlying mechanisms of pro-tumorigenic role of P-bodies tumor cells. However, we are newcomers in the P-body field and encountered a lot of issues to establish the biochemical assays of P-bodies. Hopefully, we can solve these technical issues soon and present our new data in the next paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines psychophysics, fMRI, and TMS to reveal a causal role of FEF in generating an attention-induced ocular dominance shift, with potential relevance for clinical applications. The evidence supporting the claims of the authors is solid, but the theoretical and mechanistic interpretation of results and experimental approaches need to be strengthened. The work will be of broad interest to perceptual and cognitive neuroscience.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Based on a "dichoptic-background-movie" paradigm that modulates ocular dominance, the present study combines fMRI and TMS to examine the role of the frontoparietal attentional network in ocular dominance shifts. The authors claimed a causal role of FEF in generating the attention-induced ocular dominance shift.

      Strengths:

      A combination of fMRI, TMS, and "dichoptic-background-movie" paradigm techniques is used to reveal the causal role of the frontoparietal attentional network in ocular dominance shifts. The conclusions of this paper are mostly well supported by data.

      Weaknesses:

      (1) The relationship between eye dominance, eye-based attention shift, and cortical functions remains unclear and merits further delineation. The rationale of the experimental design related to the hemispheric asymmetry in the FEF and other regions should be clarified.

      Thanks for the reviewer’s comments! We have further clarified the relationship between eye dominance shift, eye-based attention, and cortical functions in the Introduction and Discussion. In the Introduction, we introduce the modulating effects of eye-based attention on eye dominance. On one hand, eye-based attention can enhance eye dominance of the attended eye in real time (see page 3 first paragraph or below):

      ”For instance, presenting top-down attentional cues to one eye can intensify the competition strength of input signals in the attended eye during binocular rivalry (Choe & Kim, 2022; Zhang et al., 2012) and shift the eye balance towards the attended eye (Wong et al., 2021).”

      On the other hand, prolonged eye-based attention can induce a shift of eye dominance to the unattended eye (see page 3 second paragraph or below):

      “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).”

      Moreover, we discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or below, which also respond to this reviewer’s comment of Weakness #2):

      “Then how does FEF regulate the attention-induced ocular dominance shift? Our previous work has found that the aftereffect (for simplicity, hereafter we use aftereffect to denote the attention-induced ocular dominance shift) can be produced only when the adapting stimuli involve adequate interocular competition, and is measurable only when the testing stimuli are not binocularly fused (Song et al., 2023). Given the indispensability of interocular competition, we explained those findings in the framework of the ocular-opponency-neuron model of binocular rivalry (Said & Heeger, 2013). The model suggests that there are some opponency neurons which receive excitatory inputs from monocular neurons for one eye and inhibitory inputs from monocular neurons for the other eye (e.g. AE-UAE opponency neurons receive excitatory inputs from the attended eye (AE) and inhibitory inputs from the unattended eye (UAE)). Then a difference signal is computed so that the opponency neurons fire if the excitatory inputs surpass the inhibitory inputs. Upon activation, the opponency neurons will in turn suppress the monocular neurons which send inhibitory signals to them.

      Based on this model, we proposed an ocular-opponency-neuron adaptation account to explain the aftereffect, and pointed out that the attentional system likely modulated the AE-UAE ocular opponency neurons (Song et al., 2023). So why would FEF modulate the AE-UAE opponency neurons? The reason may be two fold. Firstly, understanding the logic during the dichoptic-backward-movie viewing may require filtering out the distracting information (from the unattended eye) and sustaining attention (to the attended eye), which is exactly the role of FEF (Esterman et al., 2015; Lega et al., 2019).

      Secondly, due to the special characteristics of binocular vision system, filtering the distracting input from the unattended eye may have to rely on the interocular suppression mechanism. According to the ocular-opponency-neuron model, this is achieved by the firing of the AE-UAE opponency neurons that send inhibitory signals to the UAE monocular neurons.

      As mentioned previously, the firing of the AE-UAE opponency neurons requires stronger activity for the AE monocular neurons than for the UAE monocular neurons. This is confirmed by the results shown in Figure 8 of Song et al. (2023) that monocular response for the attended eye during the entire adaptation phase was slightly stronger than that for the unattended eye. Accordingly, during adaptation the AE-UAE opponency neurons were able to activate for a longer period thus adapted to a larger extent than the UAE-AE opponency neurons. This would cause the monocular neurons for the unattended eye to receive less inhibition from the AE-UAE opponency neurons in the post-test as compared with the pre-test, leading to a shift of ocular dominance towards the unattended eye. In this vein, the magnitude of this aftereffect should be proportional to the extent of adaptation of the AE-UAE relative to UAE-AE opponency neurons. Attentional enhancement on the AE-UAE opponency neurons is believed to strengthen this aftereffect, as it has been found that attention can enhance adaptation (Dong et al., 2016; Rezec et al., 2004). Inhibition of FEF likely led such attentional modulation to be much less effective. Consequently, the AE-UAE opponency neurons might not have the chance to adapt to a sufficiently larger extent than the UAE-AE opponency neurons, leading to a statistically non-detectable aftereffect in Experiment 2. Therefore, the results of Experiments 2-4 in the present study suggest that within the context of the ocular-opponency-neuron adaptation account, FEF might be the core area to fulfill the attentional modulations on the AE-UAE opponency neurons.”

      We used the experimental design with hemispheric asymmetry in the FEF and other regions for two reasons. First, many studies have shown that the dorsal attentional network has a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010). This was also indicated by the results of Experiment 1 (Figure 3). Second, we found that a recent research applying TMS to FEF and IPS stimulated only the right hemisphere (Gallotto et al., 2022). Therefore, we selected the right FEF and right IPS as the target regions for cTBS. In the Methods section of Experiment 2, we have elucidated the reasons for the selection of cTBS target regions (see page 35, first paragraph or below):

      “Given that the dorsal attentional network primarily consists of the FEF and the IPS (Corbetta & Shulman, 2002; Mayrhofer et al., 2019), with a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010), we selected the right FEF and right IPS from the four clusters identified in Experiment 1 as the target regions for cTBS (Gallotto et al., 2022).”

      (2) Theoretically, how the eye-related functions in this area could be achieved, and how it interacts with the ocular representation in V1 warrant further clarification.

      Thanks for the reviewer’s comment! In the revised manuscript, we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or the quoted paragraphs under this reviewer’s first Public comment).

      Reviewer #2 (Public Review):

      Summary

      Song et al investigate the role of the frontal eye field (FEF) and the intraparietal sulcus (IPS) in mediating the shift in ocular dominance (OD) observed after a period of dichoptic stimulation during which attention is selectively directed to one eye. This manipulation has been previously found to transiently shift OD in favor of the unattended eye, similar to the effect of short-term monocular deprivation. To this aim, the authors combine psychophysics, fMRI, and transcranial magnetic stimulation (TMS). In the first experiment, the authors determine the regions of interest (ROIs) based on the responses recorded by fMRI during either dichoptic or binocular stimulation, showing selective recruitment of the right FEF and IPS during the dichoptic condition, in line with the involvement of eye-based attention. In a second experiment, the authors investigate the causal role of these two ROIs in mediating the OD shift observed after a period of dichoptic stimulation by selectively inhibiting with TMS (using continuous theta burst stimulation, cTBS), before the adaptation period (50 min exposure to dichoptic stimulation). They show that, when cTBS is delivered on the FEF, but not the IPS or the vertex, the shift in OD induced by dichoptic stimulation is reduced, indicating a causal involvement of the FEF in mediating this form of short-term plasticity. A third control experiment rules out the possibility that TMS interferes with the OD task (binocular rivalry), rather than with the plasticity mechanisms. From this evidence, the authors conclude that the FEF is one of the areas mediating the OD shift induced by eye-selective attention.

      Strengths

      (1) The experimental paradigm is sound and the authors have thoroughly investigated the neural correlates of an interesting form of short-term visual plasticity combining different techniques in an intelligent way.

      (2) The results are solid and the appropriate controls have been performed to exclude potential confounds.

      (3) The results are very interesting, providing new evidence both about the neural correlates of eye-based attention and the involvement of extra-striate areas in mediating short-term OD plasticity in humans, with potential relevance for clinical applications (especially in the field of amblyopia).

      Weaknesses

      (1) Ethics: more details about the ethics need to be included in the manuscript. It is only mentioned for experiment 1 that participants "provided informed consent in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences". (Which version of the Declaration of Helsinki? The latest version requires the pre-registration of the study. The code of the approved protocol together with the code and date of the approval should be provided.) There is no mention of informed consent procedures or ethics approval for the TMS experiments. This is a huge concern, especially for brain stimulation experiments!

      Response: Thanks for the reviewer’s comment! In the revised manuscript, we have provided the code of the approved protocol and date of the approval (see page 25 second paragraph or below):

      “This study was approved (H21058, 11/01/2021) by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences.”

      Indeed, ethics approval and informed consent were obtained for each experiment. To avoid duplication in the text, we only presented the ethics instructions in the Methods section of Experiment 1. We have now clarified in that section that all the experiments in this study were approved by the IRB in our Institute.

      (2) Statistics: the methods section should include a sub-section describing in detail all the statistical analyses performed for the study. Moreover, in the results section, statistical details should be added to support the fMRI results. In the current version of the manuscript, the claims are not supported by statistical evidence.

      Response: Thanks for the reviewer’s suggestion! In the Methods section of revised manuscript, we have added a section to describe the detailed statistical analyses for each experiment (see page 37 last paragraph for Experiment 2 and page 38 last paragraph for Experiment 3 or below):

      “Statistical analyses were performed using MATLAB. A 3 (stimulation site: Vertex, FEF, IPS) × 2 (test phase: pre-test and post-test) repeated measures ANOVA was used to investigate the effect of cTBS delivery on ocular dominance shift. Moreover, for the blob detection test, the target detection rate of each experimental condition was calculated by dividing the summed number of detected blob targets by the total number of blob targets. Then, a 2 (eye: attended eye, unattended eye) × 3 (stimulation site: Vertex, FEF, IPS) repeated measures ANOVA on the detection performance was performed. Post-hoc tests were conducted using paired t-tests (2-tailed significance level at α = 0.05), and the resulting p-values were corrected for multiple comparisons using the false discovery rate (FDR) method (Benjamini & Hochberg, 1995).”

      “In addition to the data analysis in Experiment 2, we complemented the standard inferential approach with the Bayes factor (van den Bergh et al., 2023; van Doorn et al., 2021; Wagenmakers et al., 2018), which allows quantifying the relative evidence that the data provide for the alternative (H1) or null hypothesis (H0). We conducted the Bayesian repeated measures ANOVA using JASP with default priors and computed inclusion Bayes factors (BFincl) which suggest the evidence for the inclusion of a particular effect calculated across matched models. A BF greater than 1 provides support for the alternative hypothesis. Specifically, a BF between 1 and 3 indicates weak evidence, a BF between 3 and 10 indicates moderate evidence, and a BF greater than 10 indicates strong evidence (van Doorn et al., 2021). In contrast, a BF below 1 provides evidence in favor of the null hypothesis.”

      Furthermore, in the Results section of revised manuscript, we have added the statistical details to support the fMRI results (see page 9 last paragraph or below):

      “To seek these brain regions, we used the AFNI program “3dttest++” to access the difference of ‘dichoptic-binocular’ contrast between the experimental and control runs. The AFNI program “ClustSim” was then applied for multiple comparison correction, yielding a minimum significant cluster size of 21 voxels (voxel wise p = .001; cluster threshold α = 0.05). We found 4 clusters showing stronger responses to the dichoptic movies than to the binocular movies especially in the experimental runs.”

      (3) Interpretation of the results: the TMS results are very interesting and convincing regarding the involvement of the FEF in the build-up of the OD shift induced by dichoptic stimulation, however, I am not sure that the authors can claim that this effect is related to eye-based attention, as cTBS has no effect on the blob detection task during dichoptic stimulation. If the FEF were causally involved in eye-based attention, one would expect a change in performance in this task during dichoptic stimulation, perhaps a similar performance for the unattended and attended eye. The authors speculate that the sound could have an additional role in driving eye-based attention, which might explain the lack of effect for the blob discrimination task, however, this hypothesis has not been tested.

      Response: Thanks for the reviewer’s comment! Following this reviewer’s insightful suggestion, we have conducted a new experiment to examine the effect of sound on blob detection task (see Experiment 4 in the revised manuscript). The procedure was similar to that of Experiment 2 except that the sound was no longer presented during the dichoptic-backward-movie adaptation. The results showed that the interocular difference of blob detection rate after sound elimination remained unaffected by the cTBS, which disagreed with our explanation in the previous version of manuscript. Based on the new data, we now question the validity to use the blob detection rate to precisely quantify eye-based attention, and have tried to explain why the blob detection results do not contradict with our account for the function role of FEF in modulating the aftereffect in the Discussion of the revised manuscript (see page 23 second paragraph to page 24 first paragraph or below):

      “An unresolved issue is why inhibiting the cortical function of FEF did not impair the performance of blob detection task. One potential explanation is that the synchronized audio in Experiment 2 might help increase the length of time that the regular movie dominated awareness. However, the results of Experiment 4 did not support this explanation, in which the performance of blob detection survived from the inhibition of FEF even when silent movies were presented. Although this issue remains to be explored in future work, it does not contradict with our notion of FEF modulating AE-UAE opponency neurons. It should be noted that our notion merely states that FEF is the core area for attentional modulations on activities of AE-UAE opponency neurons. No other role of FEF during the adaptation is assumed here (e.g. boosting monocular responses or increasing conscious level of stimuli in the attended eye). In contrast, according to the most original definition, the blob detection performance serves as an estimation of visibility (or consciousness level) of the stimuli input from each eye, despite the initial goal of adopting this task is to precisely quantify eye-based attention (which might be impractical). Thus, according to our notion, inhibition of FEF does not necessarily lead to deteriorate performance of blob detection. Furthermore, our findings consistently indicated that the visibility of stimuli in the attended eye was markedly superior to that of stimuli in the unattended eye, yet the discrepancy in the SSVEP monocular responses between the two eyes was minimal though it had reached statistical significance (Song et al., 2023). Therefore, blob detection performance in our work may only faithfully reflect the conscious level in each monocular pathway, but it is probably not an appropriate index tightly associated with the attentional modulations on monocular responses in early visual areas. Indeed, previous work has argued that attention but not awareness modulates neural activities in V1 during interocular competition (Watanabe et al., 2011), but see (Yuval-Greenberg & Heeger, 2013). We have noticed and discussed the counterintuitive results of blob detection performance in our previous work (Song et al., 2023). Here, with the new counterintuitive finding that inhibition of FEF did not impair the performance of blob detection, we suspect that blob detection performance in the “dichoptic-backward-movie” adaptation paradigm may not be an ideal index that can be used to accurately quantify eye-based attention.

      (4) Writing: in general, the manuscript is well written, but clarity should be improved in certain sections.

      (a) fMRI results: the first sentence is difficult to understand at first read, but it is crucial to understand the results, please reformulate and clarify.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have reformulated this sentence (see page 9 last paragraph or below):

      “It was only in the dichoptic condition of experimental runs that participants had to selectively pay more attention to one eye (i.e., eye-based attention). Therefore, we speculate that if certain brain regions exhibit greater activities in the dichoptic condition as compared to the binocular condition in the experimental runs but not in the control runs, the activation of these brain regions could be attributable to eye-based attention.”

      (b) Experiment 3: the rationale for experiment one should be straightforward, without a long premise explaining why it would not be necessary.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have streamlined the lengthy premise explaining to make the rationale of Experiment 3 more straightforward (see page 15 last two paragraphs or below):

      “The results of Experiment 2 support the notion that eye-based attention was the cause for attention-induced ocular dominance plasticity. However, an alternative account is that the significant two-way interaction between test phase and stimulation site did not stem from any persistent malfunction of FEF in modulating ocular dominance, but rather it was due to some abnormality of binocular rivalry measures in the post-test that occurred after stimulation at the FEF only (and not at the other two brain sites). For instance, stimulation at the FEF might simply reduce the ODI measured in the binocular rivalry post-test.

      Therefore, we conducted Experiment 3 to examine how suppression of the three target sites would impact binocular rivalry performance, in case that any unknown confounding factors, which were unrelated to adaptation but related to binocular rivalry measures, contributed to the results.”

      (c) Discussion: the language is a bit familiar here and there, a more straightforward style should be preferred (one example: p.19 second paragraph).

      Response: Thanks for the reviewer’s suggestion! We have carefully revised the language in the discussion. The discussion following the example paragraph has been largely rewritten.

      (5) Minor: the authors might consider using the term "participant" or "observer" instead of "subject" when referring to the volunteers who participated in the study.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have replaced the term “subject” with “participant”.

      Reviewer #3 (Public Review):

      Summary:

      This study studied the neural mechanisms underlying the shift of ocular dominance induced by "dichoptic-backward-movie" adaptation. The study is self-consistent.

      Strengths:

      The experimental design is solid and progressive (relationship among three studies), and all of the raised research questions were well answered.

      The logic behind the neural mechanisms is solid.

      The findings regarding the cTMS (especially the position/site can be useful for future medical implications).

      Weaknesses:

      Why does the "dichoptic-backward-movie" adaptation matter? This part is severely missing. This kind of adaptation is neither intuitive like the classical (Gbison) visual adaptation, nor practical as adaptation as a research paradigm as well as the fundamental neural mechanism. If this part is not clearly stated and discussed, this study is just self-consistent in terms of its own research question. There are tons of "cool" phenomena in which the neural mechanisms are apparent as "FEF controls vision-attention" but never tested using TMS & fMRI, but we all know that this kind of research is just of incremental implications.

      Response: Thanks for the reviewer’s comment! We designed the "dichoptic-backward-movie" adaptation to study the perceptual consequence and mechanisms of sustained attention to a monocular pathway. Since the overall visual input to both eyes during adaptation were identical, any effect (i.e. the change of ocular dominance in our study) after adaptation can be easily ascribed to unbalanced eye-based attention between the two eyes rather than unbalanced input energy across the eyes. In typical short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is undoubtedly distributed to the non-deprived eye. The fact that in a short-term monocular deprivation paradigm the deprived eye is also the unattended eye prevents researchers from ascertaining whether unbalanced eye-based attentional allocation contributes to the shift of ocular dominance just like unbalanced visual input across the two eyes. That is why the “dichoptic-backward-movie” adaptation was adopted in the present study. This new paradigm balances the input energy across the eyes but leaves attention unbalanced across the eyes. In the revised manuscript, we have added the description of the “dichoptic-backward-movie” adaptation (see page 3 last paragraph and page 4 first paragraph or below). Hope this complementary information improves the clarity.

      “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).” In short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is biased towards the non-deprived eye. However, it is difficult to tease apart the potential contribution of unbalanced eye-based attention from the consequence of the unbalanced input energy, as the deprived eye is also the unattended eye. Therefore, the advantage of the “dichoptic-backward-movie” adaptation paradigm is to balance the input energy across the eyes but leave attention unbalanced across the eyes.

      Our previous work (Song et al., 2023) has shown that eye-based attention plays a role in the formation of ocular dominance shift following adaptation to dichoptic backward movie. However, because the “dichoptic-backward-movie” adaptation paradigm is new, to our knowledge, no literature has ever discovered the brain areas that are responsible for eye-based attention. Our fMRI experiment for the first time resolves this issue, which, we believe, is one of the novelties of the present study. Attention is a pretty general definition of our ability to select limited information for preferential or privileged processing, yet it includes numerous aspects (e.g. spatial attention for spatial locations, feature-based attention for visual features, object-based attention for objects, social attention for social cues, and eye-based attention for monocular pathways etc). Are we 100% sure that the same brain network always underlies every aspect of attention including eye-based attention? No test, no answer. Maybe the answer is Yes, but we are not aware of any evidence for that from literature. It is not unlikely that attention is like an elephant while researchers are like blind people touching the elephant from different angles. Even if all previous researchers have touched the side of the elephant and state that an elephant is no different from a wall, as long as one researcher grabs the elephant’s tail, the “wall” knowledge will be falsified. From this perspective of the essence of science (falsifiable), we have the confidence to say that our fMRI experiment on eye-based attention is novel, because to our knowledge our experiment is the first one to explore the issue. On the basis of the fMRI experiment (otherwise we would have no idea on which precise brain site to apply the cTBS), we could successfully complete the subsequent TMS experiments.

      Of course, if the reviewer can kindly point out any previous neuroimaging work we missed that has already disclosed the neural mechanisms underlying human’s eye-based attention, we would truly appreciate the reviewer very much. But even so, we would like to emphasize that the purpose of the current study was actually not to use TMS & fMRI to confirm that “FEF controls visual attention”. As we mentioned in the Abstract and expanded the introduction in the last two paragraphs of Introduction, the goal of the TMS experiments is to examine the causal role of eye-based attention in producing the aftereffect of “dichoptic-backward-movie” adaptation. This research question is also new, thus we do not think the TMS experiments are incremental, either. Our findings provided direct causal evidence for the effect of FEF on modulating ocular dominance through eye-based attention. Please see the last two sentences in the first paragraph on page 20 in the revised manuscript or below,

      “Interestingly, in our Experiment 2 this aftereffect was significantly attenuated after we temporarily inhibited the cortical function of FEF via cTBS. This finding indicates the crucial role of FEF in the formation of attention-induced ocular dominance shift.”

      as well as the last sentence of the Abstract,

      “…and in this network, FEF plays a crucial causal role in generating the attention-induced ocular dominance shift.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The hemispheric asymmetry in the eye-based attention-related cortex should be further examined and discussed. For example, IPS in both hemispheres was identified in the fMRI experiment. It is not clear why only the right IPS was stimulated in the TMS experiment.

      Response: Thanks for the comment. We have elucidated the reasons for the experimental design with hemispheric asymmetry in FEF and IPS. Please see our response to the Weakness #1 raised by Reviewer #1 in the Public Review section.

      (2) It is known that the frontoparietal cortex plays a role in the contralateral shift of attentional allocation. Meanwhile, the latest stage of ocular-specific representation is V1. The authors should discuss how the eye-related function can be achieved in FEF.

      Response: Thanks for the comment. we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph in the revised manuscript, and our response to the Weakness #2 raised by Reviewer #1 in the Public Review section).

      (3) To further validate the role of FEF in eye-related attention shifts, the authors may consider using the traditional monocular deprivation paradigm with fMRI and TMS. It would be valuable to compare the neural mechanisms related to the classical monocular deprivation paradigm with the current findings.

      Response: Thanks for the reviewer’s suggestion! That is indeed an interesting research topic that we are currently exploring. The current study investigated the attention-induced ocular dominance shift with the “dichoptic-backward-movie-adaptation” paradigm. This paradigm is substantially different from traditional short-term monocular deprivation. In our Neuroscience Bulletin paper (Song et al. 2023), we discuss the reason as follows.

      “An alternative account of our results is the homeostatic plasticity mechanism. The function of this mechanism is to stabilize neuronal activity and prevent the neuronal system from becoming hyperactive or hypoactive. For this goal, the mechanism moves the neuronal system back toward its baseline after a perturbation [51, 52]. In our case, the aftereffect can be explained such that the visual system boosts the signals from the unattended eye to maintain the balance of the network’s excitability. However, this account cannot easily explain why the change of neural ocular dominance led by prolonged eye-based attention was observed here using the binocular rivalry testing stimuli, but absent in the previous research using the binocularly fused stimuli [11]. In contrast, a recent SSVEP study also using the binocularly fused stimuli has successfully revealed a shift of neural ocular dominance after two hours of monocular deprivation [31], which is in line with the homeostatic plasticity account. Therefore, the mechanisms underlying the “dichoptic-backward-movie” adaptation and monocular deprivation are probably not fully overlapped with each other; and the binocular rivalry mechanism described in the ocular-opponency-neuron model seems to be more preferable than the homeostatic plasticity mechanism in accounting for the present findings.”

      Therefore, before asking whether FEF plays a role in the attention-induced ocular dominance shift in a traditional monocular deprivation paradigm, one should probably first examine whether attention also plays a role in traditional monocular deprivation, and whether the ocular-opponency-neuron adaptation account can also be used to explain the traditional monocular deprivation effect. Our newly accepted paper “Negligible contribution of adaptation of ocular opponency neurons to the effect of short-term monocular deprivation” (https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1282113/full) gives a generally negative answer to the second question. And as to the first question, we have one manuscript under review and another ongoing study. In other words, to get a satisfactory answer to this particular comment of this reviewer, we need to first obtain clear answers to the two above questions. We think this is far beyond the scope of one single manuscript.

      (4) The authors only presented regular movies to the dominant eye to maximize the ocular dominance shift. This critical information of design should be clarified, not only in the method section.

      Response: Thanks for the reviewer’s suggestion! In the Results section of Experiment 2, we have added a description of this critical information of design (see page 11 last paragraph to page 12 first paragraph or below):

      “Then, participants adapted to the “dichoptic-backward-movie” in which regular movie images were presented to the dominant eye to maximize the effect of eye dominance shift (Song et al., 2023). Meanwhile they were asked to detect some infrequent blob targets presented on the movie images in one eye at the same time.”

      (5) The frame rate of the movie is 30 fps, which is much lower than a typical 60 fps visual presentation, does this have an effect on the adaptation outcome?

      Response: To our best of knowledge, there is no evidence that the frame rate of the movie influences the aftereffect of attention-induced ocular dominance shift. In our previous research, the frame rate of the movie during adaptation was 25 fps, which still produced a stable adaptation aftereffect (Song et al., 2023). And the frame rate of the movie was 30 fps in our monocular deprivation work (Lyu et al., 2020), which showed a similar monocular deprivation effect we previously observed in an altered reality study (Bai et al., 2017). The frame rate of the altered-reality video in Bai et al.’s (2017) work was 60 fps. All these clues suggest that the frame rate does not have an effect on the adaptation outcome.

      (6) Figure 5: The ODSE derived from ODI in Experiment 3 should also be illustrated, for a better comparison with results from Experiment 2.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have added the results of ODSE in Experiment 3 to Figure 5 (see page 15 or below):

      Author response image 1.

      Figure 5. The results of (A) the ocular dominance index (ODI), (B) the ocular dominance shift effects (ODSE) in Experiment 2, (C) the ODI and (D) the ODSE in Experiment 3. The bars show the grand average data for each condition. The individual data are plotted with gray lines or dots. The dashed gray line represents the absolute balance point for the two eyes (ODI = 0.5). Error bars indicate standard errors of means. * p < .05; ** p < .01; n.s. p > .05.

      (7) Spelling issues: "i.e." → "i.e.,"

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have changed “i.e.” to “i.e.,”.

      Reviewer #2 (Recommendations For The Authors):

      Linked to weakness 3: Ideally, a control experiment with cTBS and dichoptic stimulation without sound but with the blob discrimination task should be performed to be able to make important claims about the neural mechanisms involved in eye-based attention.

      Response: Thanks for the comment. We have performed a new experiment as the reviewer suggested. Please see our response to the Weakness #3 raised by Reviewer #2 in the Public Review section.

      Reviewer #3 (Recommendations For The Authors):

      (1) The neural mechanisms are so apparent. We all know the FEF\IPS\SC matter in vision and attention and gaze. This is not groundbreaking.

      Response: As we addressed in our response to Reviewer #3’s public comment, the current study aimed at investigating the causal mechanism for eye-based attentional modulation of ocular dominance plasticity rather than simply the role of FEF\IPS\SC in visual attention. Moreover, eye-based attention is a less investigated aspect of visual attention. The neural mechanism underlying eye-based attention is still largely unknown, and seeking the brain areas for controlling eye-based attention is the necessary preparation work for applying the cTBS. We have responded in detail to Reviewer #3’s public comment why we think both the fMRI and TMS experiments are novel to the field, which we will not reiterate it here to avoid redundancy.

      (2) Why does the "dichoptic-backward-movie" adaptation matter? Is playing a backward movie to one eye realistic? Does that follow the efficient coding? Is that a mere consequence of information theory?

      Response: Thanks for the comments. We have added the description of the “dichoptic-backward-movie” adaptation paradigm in the revised manuscript (see page 3 last paragraph and page 4 first paragraph or our response to this reviewer’s Public comment).

      Is it realistic to play backward movie to one eye? We feel this question is somehow ambiguous to us. If the reviewer means the technical operability for such stimulus presentation, we can assure it since we have used this paradigm in both the current and previously published studies. To be more specific, we made the video stimuli in advance. The left half of the video was the regular movie and the right half was the backward version of the same movie (or vice versa). When viewing such video stimuli through stereoscopes, participants could only see the left half of the video with the left eye and the right half of the video with the right eye. In other words, the regular movie and backward movie were viewed dichoptically. Alternatively, if the reviewer means that such dichoptic presentation rarely happens in real world thus not realistic, we agree with the reviewer on one hand. On the other hand, we have explained on page 3 last paragraph and page 4 first paragraph why it is a particular useful paradigm for the main purpose of the present study. Let us make a similar example. The phenomenon of binocular rivalry rarely happens in everyday life. So people may say binocular rivalry is not realistic. However, our visual system does have the ability to deal with such conflicting visual inputs across the eyes, even binocular rivalry is unrealistic! Sometimes it is fun to investigate those seemingly unrealistic functions of our brains since those may also reveal the mystery of our neural system. As we know, despite binocular rivalry is uncommon in daily life, it is frequently used to investigate awareness. And in our work, we use binocular rivalry to measure perceptual ocular dominance.

      Finally, the reviewer queried about if the "dichoptic-backward-movie" adaptation paradigm follow efficient coding and information theory. The information theory and efficient coding assume that messages with low expectedness or of rare occurrence would attract more attention and induce larger neural responses than those with high expectedness. In the "dichoptic-backward-movie" adaptation paradigm, the backward movie should be less expected since the actions of the characters in the backward movie appeared illogical. Thus, according to the information theory and efficient coding, it would be expected that more attention was paid to the backward movie and thus the backward movie might dominate the awareness for a longer period during adaptation (Zhang et al., 2012). However, we instructed participants to follow the regular movie during adaptation. The results of blob detection task also showed a better task performance when the targets appeared in the eye presented with the regular movie, which contradicted with the prediction of the information theory and efficient coding. Thus, it seems not very likely that the "dichoptic-backward-movie" adaptation followed efficient coding and information theory.

      References

      Bai, J., Dong, X., He, S., & Bao, M. (2017). Monocular deprivation of Fourier phase information boosts the deprived eye’s dominance during interocular competition but not interocular phase combination. Neuroscience, 352, 122-130. https://doi.org/10.1016/j.neuroscience.2017.03.053

      Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1), 289-300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

      Choe, E., & Kim, M.-S. (2022). Eye-specific attentional bias driven by selection history. Psychonomic Bulletin & Review, 29(6), 2155-2166. https://doi.org/10.3758/s13423-022-02121-0

      Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215. https://doi.org/10.1038/nrn755

      Dong, X., Gao, Y., Lv, L., & Bao, M. (2016). Habituation of visual adaptation. Sci Rep, 6, 19152. https://doi.org/10.1038/srep19152

      Duecker, F., Formisano, E., & Sack, A. T. (2013). Hemispheric differences in the voluntary control of spatial attention: direct evidence for a right-hemispheric dominance within frontal cortex. Journal of Cognitive Neuroscience, 25(8), 1332-1342. https://doi.org/10.1162/jocn_a_00402

      Esterman, M., Liu, G., Okabe, H., Reagan, A., Thai, M., & DeGutis, J. (2015). Frontal eye field involvement in sustaining visual attention: evidence from transcranial magnetic stimulation. Neuroimage, 111, 542-548. https://doi.org/10.1016/j.neuroimage.2015.01.044

      Gallotto, S., Schuhmann, T., Duecker, F., Middag-van Spanje, M., de Graaf, T. A., & Sack, A. T. (2022). Concurrent frontal and parietal network TMS for modulating attention. iScience, 25(3), 103962. https://doi.org/10.1016/j.isci.2022.103962

      Lega, C., Ferrante, O., Marini, F., Santandrea, E., Cattaneo, L., & Chelazzi, L. (2019). Probing the neural mechanisms for distractor filtering and their history-contingent modulation by means of TMS. Journal of Neuroscience, 39(38), 7591-7603. https://doi.org/10.1523/JNEUROSCI.2740-18.2019

      Lunghi, C., Burr, D. C., & Morrone, C. (2011). Brief periods of monocular deprivation disrupt ocular balance in human adult visual cortex. Curr Biol, 21(14), R538-539. https://doi.org/10.1016/j.cub.2011.06.004

      Lyu, L., He, S., Jiang, Y., Engel, S. A., & Bao, M. (2020). Natural-scene-based Steady-state Visual Evoked Potentials Reveal Effects of Short-term Monocular Deprivation. Neuroscience, 435, 10-21. https://doi.org/10.1016/j.neuroscience.2020.03.039

      Mayrhofer, H. C., Duecker, F., van de Ven, V., Jacobs, H. I., & Sack, A. T. (2019). Hemifield-specific correlations between cue-related blood oxygen level dependent activity in bilateral nodes of the dorsal attention network and attentional benefits in a spatial orienting paradigm. Journal of Cognitive Neuroscience, 31(5), 625-638. https://doi.org/10.1162/jocn_a_01338

      Rezec, A., Krekelberg, B., & Dobkins, K. R. (2004). Attention enhances adaptability: evidence from motion adaptation experiments. Vision Res, 44(26), 3035-3044. https://doi.org/10.1016/j.visres.2004.07.020

      Sack, A. T. (2010). Using non-invasive brain interference as a tool for mimicking spatial neglect in healthy volunteers. Restorative neurology and neuroscience, 28(4), 485-497. https://doi.org/10.3233/RNN-2010-0568

      Said, C. P., & Heeger, D. J. (2013). A model of binocular rivalry and cross-orientation suppression. PLoS computational biology, 9(3), e1002991. https://doi.org/10.1371/journal.pcbi.1002991

      Song, F., Lyu, L., Zhao, J., & Bao, M. (2023). The role of eye-specific attention in ocular dominance plasticity. Cerebral Cortex, 33(4), 983-996. https://doi.org/10.1093/cercor/bhac116

      van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2023). Bayesian Repeated-Measures Analysis of Variance: An Updated Methodology Implemented in JASP. Advances in Methods and Practices in Psychological Science, 6(2), 25152459231168024. https://doi.org/10.1177/25152459231168024

      van Doorn, J., van den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š., Ly, A., Marsman, M., Matzke, D., Gupta, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E. J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5

      Wagenmakers, E. J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Selker, R., Gronau, Q. F., Dropmann, D., Boutin, B., Meerhoff, F., Knight, P., Raj, A., van Kesteren, E. J., van Doorn, J., Šmíra, M., Epskamp, S., Etz, A., Matzke, D., de Jong, T., van den Bergh, D., Sarafoglou, A., Steingroever, H., Derks, K., Rouder, J. N., & Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. https://doi.org/10.3758/s13423-017-1323-7

      Watanabe, M., Cheng, K., Murayama, Y., Ueno, K., Asamizuya, T., Tanaka, K., & Logothetis, N. (2011). Attention but not awareness modulates the BOLD signal in the human V1 during binocular suppression. Science, 334(6057), 829-831. https://doi.org/10.1126/science.1203161

      Wong, S. P., Baldwin, A. S., Hess, R. F., & Mullen, K. T. (2021). Shifting eye balance using monocularly directed attention in normal vision. J Vis, 21(5), 4. https://doi.org/10.1167/jov.21.5.4

      Yuval-Greenberg, S., & Heeger, D. J. (2013). Continuous flash suppression modulates cortical activity in early visual cortex. J Neurosci, 33(23), 9635-9643. https://doi.org/10.1523/jneurosci.4612-12.2013

      Zhang, P., Jiang, Y., & He, S. (2012). Voluntary attention modulates processing of eye-specific visual information. Psychol Sci, 23(3), 254-260. https://doi.org/10.1177/0956797611424289

      Zhou, J., Reynaud, A., & Hess, R. F. (2014). Real-time modulation of perceptual eye dominance in humans. Proc Biol Sci, 281(1795). https://doi.org/10.1098/rspb.2014.1717

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In chicken embryos, the counter-rotating migration of epiblast cells on both sides of the forming primitive streak (PS), a process referred to as polonaise movements, has attracted longstanding interest as a paradigm of morphogenetic cell movements. However, the association between these cell movements and PS development is still controversial. This study investigated PS development and polonaise movements separately at their initial stage, showing that both could be uncoupled (at least at the initial phase), being activated via Vg1 signaling.

      Strengths of this study

      Polonaise movements, i.e., the circular cell migration of epiblast cells on both sides of the forming PS in avian embryos, have been the subject of research through live imaging and promoted the development of new tools to analyze quantitatively such movements. However, conclusions from previous studies remain controversial, at least partly due to the nature of perturbations to PS development and polonaise movements.

      This study performed the challenging technique of electroporation to successfully mark and manipulate Wnt/PCP pathways in unincubated chicken embryo cells at the initiation phase of these two processes. In addition, the authors separately altered PS development and polonaise movements: PS development was perturbed by inhibiting either the Wnt/PCP pathway or DNA synthesis using aphidicolin, while polonaise movements were modified by the development of a second PS after engrafting Vg1-expressing COS cells located at the opposite end of the blastoderm. The study concluded that Vg1 elicits both PS development and polonaise movements, which occur in a parallel and are not inter-dependent.

      To support these conclusions, particle image velocimetry (PIV) of cell trajectories captured by live imaging was performed. These tools delineated visually appealing cell movements and gave rise to vorticity profiles, adding more value to this study.

      Weaknesses of this study

      Engrafted Vg1-expressing COS cells located at the anterior end of the blastoderm elicited both the development of a second PS and marked bilateral polonaise movements while perturbing these movements along the original PS. How do polonaise movements along the second PS dominate over those along the normal PS? The authors suggested a model in which Vg1 acts in a graded or dose-dependent manner since engrafted COS cells over-expressed Vg1. This model can be tested by reducing the mass of engrafted COS cells. Although the authors propose performing this analysis in further investigations, it would be preferable to incorporate into this study for better consistency.

      We would like to express our gratitude to the editors and the reviewers for finding the valuable significances of our study and for giving thoughtful suggestions. We agree that it would be a logical next step to identify the driving mechanism(s) of the polonaise movements, although this is beyond the scope of the current study. Rather, it is the focus of ongoing studies, in which we are investigating how Vg1 works in this concentration context and resulting dose-dependent effect on downstream gene expression, in order to provide a comprehensive understanding of this interesting dual role of Vg1. The relationship between the intensity of Vg1 signaling and the polonaise movements can be tested by modifying the size of the Vg1/COS, as the reviewer pointed out.

      The authors claim that chicken embryo development is representative of "amniotes," but it does not hold for all groups. Avian and mammal species are exceptional among amniotes in the sense they develop a PS (e.g., Coolen et al. 2008). Moreover, in certain mammalian embryos like mouse embryos, cells laterally to the PS do not move much (Williams et al. 2012). The authors should avoid the generalization that chicken embryos unequivocally represent amniotes as opposed to the observed in non-amniote embryos. The observations in chicken embryos as they stand are significant enough.

      References:

      Coolen M, et al. (2008). Molecular characterization of the gastrula in the turtle Emys orbicularis: an evolutionary perspective on gastrulation. PLoS One. 3(7):e2676. doi: 10.1371/journal.pone.0002676

      Williams M, et al. (2012). Mouse primitive streak forms in situ by initiation of epithelial to mesenchymal transition without migration of a cell population. Dev Dyn. 241(2):270-283. doi: 10.1002/dvdy.23711

      We modified the following sentences to the summary and introduction of the revised version as below:

      In Summary:

      (p.1, Lines 9-11.) “Large-scale cell flow characterizes gastrulation in animal development. In amniote gastrulation, particularly in avian gastrula, a bilateral vortex-like counter-rotating cell flow, called ‘polonaise movements’, appears along the midline.”

      In Introduction:

      (p.2, Lines 43-46.) “In amniotes, particularly in avian gastrula (i.e. embryonic disc), a bilateral vortex-like counter-rotating cell flow, termed ‘polonaise movements’, occurs within the epiblast along the midline axis, prior to and during primitive streak (PS) formation.”

      Reviewer #2 (Public Review):

      Summary:

      The authors are interested in large-scale cell flow during gastrulation and in particular in the polonaise movement. This movement corresponds to a bilateral vortex-like counter-rotating cell flow and transport the mesendodermal cells allowing ingression of cells through the primitive streak and ultimately the formation of the mesoderm and endoderm. The authors specifically wanted to investigate the coupling of the polonaise movement and primitive streak to understand whether the polonaise movement is a consequence of the formation of the primitive streak or the other way around. They propose a model where the primitive streak elongation is not required for the cell flow but rather for its maintenance and that robust cell flow is not required for primitive streak extension.

      Strengths:

      Overall, the manuscript is well written with clear experimental designs. The authors have used live imaging and cell flow analysis in different conditions, where either the formation of the primitive streak or the cell flow was perturbed.

      Their live imaging and PIV-based analyses convincingly support their conclusions that primitive streak deformation or mitotic arrest do not impact the initiation of the polonaise movement but rather the location or maintenance of these rotations. They additionally showed that disruption of the polonaise movement in the authentic primitive streak by elegant addition of an ectopic primitive streak does not impact the original primitive streak elongation.

      Weaknesses:

      • When using the delta-DEP-GFP construct, the authors showed that they can manipulate the shape of the primitive streak without affecting the identity and number of primitive streak cells. It is not clear however how this can affect the shape, volume or adhesion of the cells. Some mechanistic insights would strengthen the paper.

      We appreciate the reviewer’s invaluable feedback. We agree that it would be informative to know how the ΔDEP-GFP construct led to PS deformation. This approach has been previously introduced by Voiculescu et al., (2007) to demonstrate an involvement of the Dsh(DEP) in PS shape regulation as described in text (please see pp4-5, lines 91-94 in Results and p13, lines 279-281 in Discussion). The previous study suggested that the Wnt/PCP pathway through Dsh(DEP) is a major regulator of cell intercalation, which plays an important role in PS morphogenesis (Voiculescu et al., 2007).

      • Overall, frequencies of observation are missing for a better view of the phenomenon. For example, do Vg1/Cos cells always disrupt the flow at the authentic primitive streak? Can replicate vector fields be integrated to reflect quantification?

      We agree and have added the numbers of embryos examined. In our experimental system, the Vg1/COS-implanted embryos always exhibited that the original polonaise movements along the authentic PS were always disrupted by the induced polonaise movements (n=4/4 embryos). The replicated vector fields were integrated to the Streamline and Vorticity plots (please see Fig. 1-4, Fig. S1, S4-7).

      • Since myosin cables have been shown to be instrumental for the polonaise movement, it would be interesting to better investigate how the manipulations by the delta-DEP-GFP construct, or Vg1/Cos affect the myosin cables (as shown in preliminary form for the aphidicolin-treated embryos).

      We agree that investigations of cytoskeletons and motor proteins would provide deeper understandings as to how the ΔDEP-GFP construct and perhaps Wnt/PCP components work in PS formation and morphogenesis. We plan to examine, as a future study, the patterns of the myosin cables in the ΔDEP-GFP-misexpressing or Vg1/COS-implanted embryos to get better understanding the mechanism(s) of the polonaise movements as the reviewer pointed out.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • The authors named the dominant-negative Dsh lacking DEP [dnDsh(deltaDEP)]-fused GFP as deltaDEP-GFP, presumably to distinguish it from the construct dnDsh-deltaPDZ previously reported. However, the prefix "dnDsh" conveys the critical function in the present study. The reviewer recommends spelling out dnDsh(deltaDEP)-GFP to clarify to readers which signal was manipulated.

      We agree that it is necessary to distinguish our construct used in this study from the dnDsh-deltaPDZ construct. We have, therefore, clarified the abbreviation in the main text as follows (please see pp 4-5, lines 91-97): ‘The DEP domain of Dishevelled (Dsh; a transducer protein of Wnt signaling) is responsible for the non-canonical Wnt/PCP pathway (43, 44), and misexpression of dominant-negative Dsh lacking DEP [dnDsh(ΔDEP)] leads to deformation of the midline structures, including the PS (21). Further, the Wnt/PCP pathway is involved in cellular polarity and migration, while the canonical Wnt pathway regulates cell proliferation (45). We refer the dnDsh(ΔDEP)-GFP construct that we generated, as ΔDEP-GFP, and tested its ability to alter cellular polarity, resulting in PS deformation’.

      • The authors described the "Vg1 plasmid DNA" as a gift from Claudio D. Stern and Jane Dodd. However, they should indicate the vector backbone, especially whether the vector carries the SV40 ori sequence. Ori-containing plasmids multiply after transfection as COS cells express the SV40T antigen, leading to protein overexpression.

      We added the name of the plasmid ‘pMT23-Vg1-myc-GDF1’ to the ‘Material and methods’ section (please see p25, line 574). pMT23 expression vector is a derivative of pMT21 (Hume and Dodd, 1993) and contains SV40 ori (Wong et al., 1985).

      Reviewer #2 (Recommendations For The Authors):

      • Most of the comments are indicated in the public review.

      • There are additionally minor modifications that would help readers interpret the figures. In Figure S1B and D, it is not clear to the reader what the asterisks indicate.

      We added the sentence ‘The white asterisks indicate GFP-expressing cells.’ to the figure legend of the Fig. S1 B and D (please see p34, line 874).

    1. Author Response

      The following is the authors’ response to the current reviews.

      Joint Public Review

      This study is concerned with the general question as to how pools of synaptic vesicles are organized in presynaptic terminals to support different types of transmitter release, such as fast synchronous and asynchronous release. To address this issue, the authors employed the classical method of load- ing synaptic vesicle membranes with FM-styryl dyes and assessing dye destaining during repetitive synapse stimulation by live imaging as a readout of the mobilization of vesicles for fusion. Among other 1ndings, the authors provide evidence indicating that there are multiple reserve vesicle pools, that quickly and slowly mobilized reserves do not mix, and that vesicle fusion does not follow a mono-exponential time course, leading to the notion that two separate reserve pools of vesicles - slowly vs. rapidly mobilizing - feed two distinct releasable pools - reluctantly vs. rapidly releasing. These 1ndings are valuable to the 1eld of synapse biology, where the organization of synaptic vesicle pools that support synaptic transmission in different temporal and stimulation regimes has been a focus of intense experimentation and discussion for more than two decades.

      On the other hand, the present study has limitations, so that the authors’ key conclusions remain incompletely supported by the data, and alternative interpretations of the data remain possible. The approach of using bulk FM-styryl dye destaining as a readout of precise vesicle arrangements and pools in a population of functionally very diverse synapses bears problems. In essence, the approach is ’blind’ to many additional processes and confounding factors that operate in the back- ground, from other forms of release to inter-synaptic vesicle exchange. Further, averaging signals over many - functionally very diverse - synapses makes it diicult to distinguish the dynamics of separate vesicle pools within single synapses from a scenario where different kinetics of release originate from different types of synapses with different release probabilities.

      We thank the editors and reviewers for their time and patience, and are happy that they found our results valuable.

      We do not have a clear understanding of what the alternative interpretations might be - beyond those already addressed - but would like to. At present, we believe that the evidence for parallel processing of slowly and quickly mobilized reserve vesicles is solid and hope that people who are open to the possibility will evaluate the reasoning described within our report. The hypothesis that reserves are kept separate because they feed distinct subdivisions of the readily releasable pool remains to be tested.

      Beyond that, we have used FM-dye de-staining as a bulk measurement of sub-synaptic events in the sense that we have made no attempt to measure mobilization of isolated individual vesicles. We do not see how this necessarily leaves viable alternative interpretations, but this is diZcult to evaluate without knowing what the alternatives might be. On the other hand, the FM-dye technique has had good resolution at the level of distinguishing between individual synapses since at least Murthy et al. (2001). For our part, we are con1dent that our analysis in Figure 3 combined with the results in Figures 4-11 shows that the multiple reserve pools co-occur in many individual presynaptic terminals. We did not use electron microscopy to con1rm that all of the punctae analyzed in Figure 3 were indeed single synapses, but the reviewers did not recommend this, and we believe there is already enough published about the spatial distribution of synapses in cell culture to be con1dent that many of the punctae that are smaller than 1.5 µm were individuals.

      Overall, we have attempted to address all of the individual concerns raised by reviewers, and our understanding is that these concerns and our responses will be available on the eLife website. The reviewers were not convinced on every point, but these are cases where the nature of the concern was not clear to us. We hope that people who share these concerns will check out our responses and contact us with any further questions or alternative interpretations.

      (1) The authors sincerely addressed many of the previous concerns, mainly by clari1cation. The data are consistent with the authors’ hypothesis. The pool concept is somewhat similar to that of Richards et al (2000) and Rey et al (2015). The authors further propose that two reserve pools feed vesicles to two readily-releasable pools independently.

      To clarify further: The possibility that distinct reserve pools feed distinct readily releasable pools is predicted by our working model, and is something that we would like to test in the future, but is not a conclusion of the present study. Instead, in the present study, we tested the prediction that quickly and slowly mobilized reserve vesicles are processed in parallel without making assumptions about the the underlying mechanism.

      Unfortunately, the heterogeneity among individual synapses remains a concern as shown in (some of) the raw data (Fig. 3 and supplements).

      We emphasize that we have not attempted to minimize the extensive heterogeneity among synapses, but actually highlight this. In fact, we chose the image in Figure 3 for an example in part because of the lower left region replicated in Figure 3 supplement 2 demonstrating extensive heterogeneity along what appears to be a single axon. We are not the 1rst to notice the heterogeneity (see Waters and Smith, 2002), but we do provide a new possible explanation which, if correct, might be impor- tant for understanding biological computation (see our Discussion). At the same time, we believe that our evidence for multiple reserve pools within individual synapses with heterogenous properties is compelling. We see no contradiction, and indeed, our conclusion that the ratio of slowly to quickly mobilized varies extensively between synapses can only be correct if individual synapses contain mul- tiple types. We hope that people who are interested in our conclusions will evaluate the evidence and reasoning presented in our report.

      Bulk imaging of FM de-staining does not really measure the fraction of non-stained vesicles, which changes dynamically during stimulation, so that the situation calls for an independent readout of stained and non-stained vesicles. Moreover, direct correspondence between two speci1c stimulation frequencies (with long stimulation) and vesicle pools is not straightforward. These issues make the experimentally measured pools not well-de1ned.

      We think that the reviewer is suggesting an alternative scenario where decreases in the fractional rate of FM-dye de-staining seen during 1 Hz stimulation might be caused by a large (4-fold) increase in the total size of the reserve pool that dilutes the stained vesicles by mixing. This scenario is consis- tent with the results in Figures 2 and 4-7, and initially seems plausible because previous studies have shown that many vesicles are not mobilized, and therefore are not stained, during our standard load- ing protocol of 100 s at 20 Hz (Harata et al., 2001). However, liberation of this "deep reserve" as an explanation for the decrease in fractional destaining is not compatible with the results in Figures 10-11 that rule out mixing. For example, liberation of the deep reserve would cause fractional destaining to appear equally depressed during subsequent 20 Hz stimulation, and Figure 10 shows that this is not the case. The scenario cannot be rescued by postulating that the subsequent 20 Hz stimulation caused the deep reserve to quickly recapture the liberated vesicles because Figure 11D-E shows that fractional de-staining continues to be depressed at the very beginning of a second 1 Hz train that follows the 20 Hz stimulation.

      (2) The authors’ latest round of responses did not alleviate most of my major previous concerns. The additional data now shown in Fig 3 rely on conceptually the same type of bulk measurements and thus suffer from the same limitations as outlined in the earlier review.

      We believe that the new evidence in Figure 3 for multiple reserve pools at individual synapses is strong when evaluated in combination with the results in Figures 4-11. We do not, at present, see how the fact that FM-dye destaining is used as a bulk measurement at the sub-synaptic level could undercut our logic.

      Moreover, the image of neuronal cultures shown in Fig. 3 might be problematic. It shows very bright staining with large round lumps, which may be indicative of unhealthy cultures.

      Unhealthy cultures are not a concern because we used strict quantitative criteria to assess health that are better than we have seen elsewhere (details below). We think the reviewer might be reacting to the way we rendered the image; i.e., as “overexposed”. We did this to highlight the dimmest punctae, which is a key element of the analysis. The same image rendered with less contrast is now displayed in Author response image 1 (3rd panel from left).

      Author response image 1.

      Image to left is a reproduction of the example image in Figure 3, which was the average of 120 time lapse raw data images; scale bar is 20 µm. The second image is a replicate except all 69 punctae that were included in the study are occluded by 1.5 µm × 1.5 µm yellow squares. The third image is another replicate except with a different brightness setting. The rightmost image is one of the raw data images with brightness matched to the third image.

      More details (relevance to in vivo is in point 4):

      (1) Identifying unhealthy cultures is straightforward with our technique because synapses in un- healthy cultures destain spontaneously. Our criteria for accepting experiments for further analy- sis was less than 1.5 % spontaneous rundown/minute. This is a better way to judge health than we have seen elsewhere because it eliminates subjective decisions, and would be equally appli- cable for microscopes and imaging software of any quality. For our part, we used a 25X objective with a low numerical aperture and low intensity illumination that allowed us to completely avoid photobleaching. The images will look worse to some compared to when acquired with a higher quality microscope, but the absence of photobleaching is an important bene1t because it allowed us to avoid complicated corrections.

      (2) Stained areas larger than 1.5 µm across - such as the ones noted by the reviewer - were expressly excluded from our study because they could have been clusters of multiple synapses. The size criteria are detailed in the Legend of Figure 3. Punctae and larger areas that were excluded are the ones that are not occluded by yellow squares in the 2nd image from the left, above; at least two of the largest were likely clusters of synapses that were out of focus. Nevertheless, despite being excluded, it is unlikely that the stained areas larger than 1.5 µm in the image in Figure 3 were characteristic of unhealthy cultures because these areas did not de-stain spontaneously, but instead de-stained in response to 1 and 20 Hz electrical stimulation much like the small punctae that were included in the analysis.

      (3) Electron microscopy results have shown that individual synapses vary >10-fold in size, so a large range of brightness is expected (Murthy et al., 2001). The large range would either make the brighter punctae and clusters appear to be overexposed in a printed image, or render the dimmer punctae invisible. We have opted to present an image with overall brightness adjusted so that the dimmest punctae are visible. This is appropriate because one of the concerns was that analyzing the dimmest punctae would reveal underlying populations where the rate of fractional destaining was constant. In the end, no evidence for underlying populations emerged, which supports the conclusion that the decreases in fractional destaining occur at individual synapses. Note that adjusting brightness for example images was unavoidable; we used the camera in a range that was far below saturation and, because of this, images presented without adjusting brightness would appear to be completely black.

      (4) Primary cell cultures are non-physiological by de1nition, so the concept of health is intrinsically arbitrary, and relevance to synapses in brains is questioned routinely. However, the new 1ndings in the present report are that: (1) individual hippocampal synapses contain multiple reserve pools; (2) the reserves remain separate but are not distinguishable by the timing of mobilization when the frequency of stimulation is high; and (3) the reserves are nevertheless processed in parallel even when the frequency of stimulation is high. Of these, 1nding (1) has been reported previously for other synapse types, but 1ndings (2) and (3) were both unexpected, and 1nding (3) was not compatible with current concepts. Nevertheless, all three 1ndings were predicted by a model that was developed to explain orthogonal results from studies of intact synapses in ex vivo slices that did not 1t with current concepts either, as referenced in the Introduction. Because of this, we think that the parallel processing of quickly and slowly mobilized reserve vesicles likely occurs in individual Schaffer collateral synapses in vivo, and is not a cell culture artifact; the alternative would be too much of an unlikely coincidence.

      References

      Harata N, Pyle JL, Aravanis AM, Mozhayeva M, Kavalali ET & Tsien RW (2001). Limited numbers of recycling vesicles in small CNS nerve terminals: implications for neural signaling and vesicular cycling. Trends in Neurosciences 24, 637–43.

      Murthy VN, Schikorski T, Stevens CF & Zhu Y (2001). Inactivity produces increases in neurotransmitter release and synapse size. Neuron 32, 673–82.

      Waters J & Smith SJ (2002). Vesicle pool partitioning in2uences presynaptic diversity and weighting in rat hippocampal synapses. Journal of Physiology 541, 811–23.


      The following is the authors’ response to the original reviews.

      Reviewer 1

      Mahfooz et al. investigated the time course of synaptic vesicle fusion of cultured mouse hippocampal synapses using FM-styryl dyes. The major finding is that the FM destaining time course deviates from a mono-exponential function during 1 Hz, but not 20 Hz stimulation. The deviation from a mono-exponential function was also seen during a second stimulus train applied after recovery periods of several minutes, or after depletion of the readily-releasable vesicle pool. Furthermore, this "decreased fractional destaining" was unlikely due to long-term synaptic depression, or incomplete dye clearance. Fractional destaining was enhanced when the dye was loaded with 1 Hz compared with 20 Hz stimulation, suggesting that vesicles recycled during 1 Hz stimulation are predominantly sorted into a rapidly mobilized pool. Finally, they show that 20 Hz stimulation does not affect the decrease in fractional destaining induced and recorded during 1 Hz stimulation. Based on these observations, they put forward a model in which slowly and quickly resupplied synaptic vesicles are mobilized in parallel.

      The demonstration that FM destaining time courses deviate from single exponentials during 1 Hz stimulation (Figs 2-3) is a starting point used to rule out simple models where vesicles intermix freely and to introduce a mathematical technique for quantifying the extent of the deviations that is essential for the analysis of later experiments, where curve fitting could not be used. We then:

      1) Show that the deviation from simple models is not caused by depletion of the readily releasable pool, as noted by the reviewer;

      2) rule out a number of explanations for the deviation that do not involve reserve pools at all, again as noted;

      3) provide affirmative evidence for the presence of multiple reserve pools by labeling them with distinct colors;

      4) show that the vesicles within the distinct reserve pools do not intermix even when activity is intense enough to drive destaining with single exponential kinetics.

      We believe that the 4th point - documented in Figs 10-11 - is a key element.

      Beyond that, we note that our working model arose from previous studies, as referenced in the Introduction, not from the present results. The model did predict the parallel processing of quickly and slowly mobilized reserves, and the present study was designed to test this prediction. In that sense, the evidence in the current study supports our working model, not the other way around.

      In any case, most readers in the near term will be more interested in the serial versus parallel question, and less in precisely what the present results mean for evaluating our working model. Because of this, we emphasize that evidence for parallel processing of separate reserve pools depends solely on experimental results within the study, and not on modeling. As a consequence, the evidence will continue to be equally strong even if problems with our working model arise later on (lines 382-386).

      We do have additional unpublished evidence for the working model that does not bear directly on the parallel versus serial question. Some of this was removed from an earlier version of the manuscript and some has been newly gathered since the original submission. We will publish the additional evidence at a later point. We decided not to include it in the present manuscript expressly to avoid confusion about the relationship between modeling and the evidence for parallel processing in general.

      The paper addresses an interesting question - the relationship between the resupply and release of synaptic vesicles. The study is based on a lot of data of high quality. Most data are solid. However, some of the major conclusions are not well supported by the data. Moreover, it remains unclear how speci1c the findings are to the experimental design.

      The following points should be addressed:

      1) Most traces display a decrease in fluorescence intensity before stimulation. Data with a decrease in baseline fluorescence intensity of up to 1.5 % were considered for the analysis (Fig 2-supplement 2). I may have missed it, but were the data corrected for the observed decrease in baseline fluorescence intensity? (In the model shown in Appendix 1 Figure 1, they correct for "rundown"). For instance, are the residuals shown in Fig 2D, E based on corrected data? In case the data would not be corrected for a decrease in baseline fluorescence, would the decay kinetics also deviate from a single exponential after correction?

      We did not correct for rundown - as now noted on lines 96-97 - except in the figure in the Appendix, noted by the reviewer, where the uncorrected and corrected time courses are plotted side by side for easy comparison. However, our study includes an analysis showing that correcting for rundown during 1 Hz stimulation would increase - not decrease - the deviation from a single exponential (2 bars in rightmost panel in Fig 2C, and lines 113-116 of Results), so the absence of a correction does not weaken our conclusions.

      2) The analysis of "fractional destaining" is not clear to me. How many intervals of which length were chosen and why? For instance, the intervals often differ in length, number and do not cover the complete decay (e.g., Fig 2B).

      We calculated fractional destaining from longer intervals at later times because the overall amount of stain was less, meaning signal/noise was less, and scatter was more. We did this because increased scatter at later times could be counteracted by estimating the slope of destaining from longer intervals. An additional bene1t is that elongating the later intervals allowed us to plot only 6 bars for 25 min of 1 Hz destaining, which works better visually than 17.

      Increasing the interval length for later times is mathematically sound because the key factor causing distortions related to deviations from linearity is not the length of the interval per se but, instead, the fractional destaining over the interval. The fractional destaining is greater at the start of 1Hz stimulation, thus requiring shorter intervals.

      It would be possible to choose inappropriately long intervals that would distort estimates of the change in fractional destaining. However, we now include Fig 2-supplement 6 – which includes all 17 1.5 min intervals - to con1rm that any distortions after the first interval were minimal. The Appendix predicts a biologically important distortion for the first interval which we are following up, but this would underestimate the true deviation from quickly mixing pools, so would not be problematic for the present conclusions.

      Sometimes, only the interval right after stimulation onset was considered (e.g., Fig 7, 8).

      Figs 7, 8 in the previous version are now Figs 8, 9.

      This is appropriate because the goal was to estimate the fractional destaining at the very start, before the quickly mobilized fraction has destained.

      How quickly fractional destaining is expected to revert to the lowest value seen after 15 min of 1Hz stimulation in Fig 2 (and elsewhere) depends very much on assumptions - such as the number of reserve pools, etc. We sought to avoid this kind of additional analysis because we are keen to avoid the impression that our main conclusions depend on the speci1cs of modeling.

      How sensitive are the changes in fractional destaining to the choice of the intervals?

      Minimally. This can be seen by eye because the magenta lines in Fig 2B 1t the data well, but see Fig 2-supplement 6 for a quantitative comparison.

      For instance, would fractional destaining be increased if later intervals would have been chosen for the second 20 Hz stimulus in the experiment shown in Fig 9B?

      Previous Fig 9B is now Fig 10B.

      We cannot be certain, but think it probably would not be different. Neither an increase nor a decrease would be problematic for our conclusions.

      More detail: There is not enough data to evaluate this specifically for Fig 10B because the total amount of stain remaining at later intervals is little, meaning signal/noise is low, which causes extensive experimental scatter. However, synapses were even more extensively destained prior to time course c of Figure2-supplement 2C, which nevertheless matches time courses a, b, and d.

      I propose fitting all baseline-corrected data with a single and a double-exponential function (as well as single exponential plus line?) and reporting the corresponding time constants (slopes) and amplitudes.

      As noted above, we purposefully do not baseline correct data in a way that would make this possible. However, we do include exponential fits when appropriate, in Fig 2D-E, Fig 2- supplement 1, Fig 2-supplement-7, Fig 2-supplement-8, and Fig 12B.

      Indeed, the absence of any change in the weighting parameter despite substantial changes for both time constants seen after raising the temperature to 35C (Fig 2-supplement-8 vs Fig12B) is notable because it suggests that the contents of the reserve pools are not altered by changing temperature, even though vesicle trafficking is accelerated. Fig 2-supplement-8 is a supplementary figure because the result is outside the scope of the main point, not because the quality is lower than for other figures.

      Beyond that, exponential fits would not be adequate for most of the study because many experiments - including the core experiments in Figs 10-11 - require discontinuous stimulation, such as when we stop stimulating at 1 Hz, rest for minutes, and then start up again at 1 or 20 Hz. And, although widely used, exponentials are non-linear equations after all. Even when they can be used to quantify time courses, the fractional destaining measurement is almost always more informative, in the technical sense, because it avoids complications when estimating the importance of deviations occurring at the two extremes versus deviations in the middle of the time course.

      3) Along the same lines, is the average slow time constant indeed around 40 min? (Are the data shown in Fig 2 S7 based on an average?) If this would be the case, I suggest conducting a control experiment with a recording time > 40 min. Would fitting an exponential or a line to baseline data (without stimulation) also give a similar slow component?

      Fig 2-supplement 7 in the previous version is now Fig 2-supplement 8.

      First, yes, the time course shown in Fig 2-supplement 8 is the mean across preparations. The time courses of the individual preparations were quanti1ed as the median value of the individual ROIs before averaging.

      Second, no, fitting baseline data would give an approximately 3-fold greater time constant (i.e., 120 min) because fractional destaining decreases by about 3-fold when we stop stimulating after 25 min of 1 Hz stimulation (i.e., Fig 2C, 3B, and many others).

      The key point is that fractional destaining decreases greatly over long trains of 1 Hz stimulation.

      For Fig 2, we saw a 2.7+/-0.1-fold decrease before accounting for baseline destaining (lines 106-110), which increased to a 4.4-fold decrease when we did account for baseline destaining (lines 113-116). Overall, the 2.7-fold value is simultaneously a safe minimum boundary, and much greater than the value of 1.0 expected from models where vesicles mix freely.

      Note that future studies will show that even the 4.4-fold value is probably an underestimate because 1 Hz stimulation misses a fast component at the very beginning of the time courses, as predicted in the Appendix.

      4) How speci1c are the findings to 1 Hz (and 20 Hz) stimulation? From which frequency onward can a decrease in fractional destaining be no longer observed?

      Our logic depends only on the premise that we are able to find some frequency where fractional destaining no longer decreases. We knew that 20 Hz was a good place to start because of previous electrophysiological experiments - frequency jumps (Fig 1 of Wesseling and Lo, 2002 and Fig 2C of Garcia-Perez and Wesseling, 2008), and trains of action potentials followed by osmotic shocks (Fig 2A of Garcia-Perez et al., 2008) - showing that 20 Hz stimulation is enough to nearly completely exhaust the readily releasable pool. This is noted in lines 202-203, and Box 2.

      would previous stimulation with frequencies <20 Hz interfere with fractional destaining? These control experiments would help assessing how general/speci1c the findings are.

      Yes (Figs 4 and 11A at 1 Hz). Also, we have done experiments at 0.1 Hz, which will be published later; some of these were actually removed from an earlier version of the manuscript because the results are primarily relevant to deciding between particular parallel models, and are not relevant to the conclusion of the present study that quickly and slowly mobilized reserves are processed in parallel.

      Similarly, a major conclusion of the paper - the parallel mobilization of two vesicle pools - is largely based on these two stimulation frequencies. Can they exclude that mixing between the two pools occurs at other frequencies?

      We cannot exclude the possibility of breakdown at a higher frequency, but this would not undercut our conclusions. We do not have plans to try this experiment because: (1) a positive result would be open to concerns about non-physiologically heavy stimulation; and (2) a negative result would be difficult to interpret because of the possibility that the axons cannot follow at higher frequencies.

      6) Some information in the methods section is lacking. For instance, which species is the cell culture based on?

      Mice from both sexes were used. This is now speci1ed in the Methods.

      Reviewer 2

      By using optical monitoring of synaptic vesicles with FM1-43 at hippocampal synapses, the authors try to show the evidence for two parallel reserve pools of synaptic vesicles, which feed the vesicles to the readily releasable pool. The major strength of the study is the use of a quantitative model, which can be readily testable by experiments: in the course of the study, the authors propose the best vesicle pool model, which fits the experimental data "averaged over synapses" nicely. On the other hand, the weak point of the study comes from the optical method and the data: bulk imaging of vesicle dynamics monitored at each synapse is noisy and the signals vary considerably among synapses. Therefore, the average signals over many synapses may not reflect the vesicle dynamics of two reserve pools within a synapse, but something else, such as the different kinetics of release from multiple synapses with different release probability. Nevertheless, a new framework of two reserve pools offers a testable hypothesis of vesicle dynamics, and the use of single vesicle tracking and EM may allow one to give a de1nitive answer in the future studies Therefore, the study may be of interest to the community of synaptic neurobiology.

      1) The current version includes a new figure (Fig 3) showing that the deviations from single pool models seen in populations are caused by deviations occurring at the level of single synapses. The heterogeneity between synapses actually causes population statistics to underestimate - not overestimate - the mean and median size of the deviations at individuals.

      We think the new evidence in Fig 3 and supplements is conclusive without follow-on EM of the same punctae given the substantial body of already published EM on similar cultures. Essentially, the only way to explain the results without invoking multiple reserve pools in individual synapses would be to say that individual synapses ALWAYS come in clumps containing multiple types and are NEVER separated from neighbors by more than 1.5 microns - even when the clumps are separated from each other by 5 microns. There is already clear evidence against this.

      2) No new model is proposed here, see the first response to the first reviewer.

      3) We are not aware of alternative hypotheses that could account for our results, so cannot evaluate if single vesicle tracking and EM could add meaningful additional support.

      1) The existence of non-stained vesicles complicates the interpretation of the data. Because the release by 20 Hz and 1 Hz stimulation do not entirely reflect the release from fast and slow vesicle pools. the estimation of non-stained vesicles using synaptopHluorin (+ba1lomycin) and EPSCs would be helpful to examine fraction of non-stained / stained vesicles over time (with stimulation, the ratio may change dynamically, which may bring complications).

      Non-stained vesicles are not a complication, but instead a key element of our logic which is included in the diagrams in Boxes 1 and 2 and Figure 9. That is, quickly and slowly mobilized reserves can be distinguished at 1 Hz precisely because 1 Hz is not intense enough to exhaust the readily releasable pool (Box 2). The corollary is that stained vesicles must be replaced by non-stained vesicles, because otherwise 1 Hz stimulation would exhaust the readily releasable pool. And this is why FM-dyes (plus a beta-cyclodextrin during washing) are ideal for the current questions whereas other techniques, such as electrophysiology or synaptopHluorin imaging are obviously indispensable for other questions, but could not replace the FM-dyes in the current study. This is now noted on lines 86-89.

      We are aware that synaptopHluorin + ba1lomycin could, in principle, accomplish some of the same goals. However, ba1lomycin ended up being toxic when applied for tens of minutes, as it would have to be in our experiments. And, we do not see what critical question is not already answered with strong evidence using FM dyes.

      2) Individual synapses show marked differences in the time course of de-staining, suggesting differences in release probability. The averaging of the whole data may reflect "average" behavior of synapses, but for example, bi-exponential time course may reflect high Pr and low Pr synapses, rather than vesicle recruitment.

      The authors may comment on this issue.

      See newly added Fig 3, and responses above.

      3) Some differences are very small (Fig 10, the same amplitude as bleaching time course), and I am not certain if the observed differences are meaningful, given low signal to noise ratio in each synapse.

      Fig 10 in the previous version is Fig 11 in the current version.

      Even if correct, this would not be problematic because 20 Hz stimulation clearly did not cause fractional destaining to return to the initial value when stimulation was resumed at 1 Hz (compare d and f in Fig 11E). In any case, Figs 2C, 3B, 5B, 7B, and Fig 10-supplement 2A all show that the minimum fractional destaining value during 1 Hz stimulation is about 3-fold greater than during subsequent rest intervals, which is not a small difference. Also, note that Fig 2-supplement 3 shows that photobleaching likely did not play a role.

      Reviewer 3

      Reviewer #3 (Recommendations For The Authors):

      This study attempts to conceptualize the long-standing question of vesicle pool organization in presynaptic terminals. Authors used classical FM dye release experiments to support a hypothesis that rapidly and slowly releasing vesicles are mobilized in parallel without intermixing. This modular model is also supported indirectly by the authors’ recent findings of molecular links that connect a subset of vesicles in linear chains (published elsewhere).

      Our study should be seen as a test of the hypothesis that quickly and slowly mobilized reserves are processed in parallel. The evidence is independent of any modeling, and would continue to be equally strong if our working model turns out to be incorrect (lines 382-386).

      The scope of the original model was limited by a number of caveats. The main concerns included a limited data set measured in bulk from a highly heterogeneous synapse population, and a complex interrelationship between vesicle mobilization and the bulk FM dye de-staining kinetics. The second major limitation was measurements being performed at room temperature, which inhibits or alters a number of critical synaptic processes that are being modeled. This includes the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory period, which are stimulus- and temperature-dependent, but were not accounted for in the original model.

      The present study contains experiments at body temperature (Fig 12 and Fig 12-supplement 1 in the current version) and analyses of individual synapses (especially Fig 3 in the current version). To our knowledge all results are consistent with everything that is known about the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory periods.

      The authors made strong efforts to address previous concerns. However, the main conceptual point, i.e. linking the bulk FM dye de-staining kinetics with precise arrangement of vesicle pools, is not well supported and is generally highly problematic because it ignores many additional processes and confounding factors.

      For example, vesicle exchange between neighboring synapses constitutes from 15% to over 50% of total recycling vesicle population, and therefore is a major contributing factor to FM dye loss/redistribution, but is not considered in this study. Additionally, this vesicle exchange process undergoes calcium/activity-dependent changes, contributing to difficulty in interpreting the current experiments comparing FM de-staining at different stimulation frequencies.

      We do not see how exchange of vesicles between synapses could be a problem for our logic, so cannot evaluate this without a more detailed description of the concern. Instead, our results rule out random inter-synaptic exchange between quickly and slowly mobilized reserve pools because this would show up in our assays as mixing, which does not occur. We think there are three remaining possibilities:

      1) vesicles are exchanged primarily between quickly mobilized reserve pools

      2) vesicles are exchanged primarily between slowly mobilized reserve pools

      3) vesicles in quickly mobilized reserve pools are targeted to quickly mobilized reserve pools in other synapses and vesicles in slowly mobilized reserve pools are targeted to slowly mobilized reserve pools in other synapses.

      It would be interesting to know which of these is correct, but this is outside the scope of the current study.

      Moreover, other forms of release, such as asynchronous release, contribute a large fraction of released vesicles, but are not factored in. Asynchronous release varies widely in synapse population from 0.1 to >0.4 of synchronous release, but is entirely ignored. Spontaneous release may also contribute to FM dye loss over extended 25min recordings used.

      Spontaneous release and asynchronous release are not caveats.

      First, spontaneous: We suspect that spontaneous release contributes to the background destaining rate, but this is 3-fold slower than the minimum during 1 Hz stimulation on average (Figs 2C, 3C, 5B etc), so we know that the slowly mobilized reserve is mobilized by low frequency trains of action potentials (lines 410-412). Note that a different outcome - where the rate of destaining decreased to a very low level during long trains of 1 Hz stimulation - would not have been consistent with the idea that slowly mobilized vesicles are only released spontaneously because the remaining fluorescence can always be destained rapidly by increasing the stimulation intensity to 20 Hz (e.g., see examples in Fig 3).

      Second, asynchronous: We know that slowly mobilized reserves must be released synchronously at 35C because the asynchronous component is eliminated at this temperature (Huson et al., 2019), without altering the quantity of slowly mobilized reserves that are mobilized by 1 Hz stimulation (lines 350-360 of Results, and 445-452 of Discussion; we can con1rm from our own unpublished experiments that the disappearance of asynchronous release at 35C is a robust phenomenon in these cell cultures). Asynchronous release of slowly mobilized vesicles might occur at room temperature, but this would not argue against the conclusion that slowly mobilized vesicles are processed in parallel with quickly mobilized.

      Speci1c comments:

      Points 1-4 are already addressed above.

      5) The notion of the chained vesicles is somewhat confusing: how does the "first" vesicle located at the plasma membrane/release site get released if it is attached to the chain? Wouldn’t this "first" vesicle be non-immediately releasable since it must first be liberated? Since all vesicles shown in the Figure 1 have chains attached to them, what vesicle population then give rise to sub-millisecond release?

      This is not a concern relevant to the present study because none of the conclusions rely on the model in any way (see Introduction, and lines 382-386 of the Discussion). Beyond that: We previously published clear evidence that docked vesicles are tethered to non-docked vesicles (Figure 8 of Wesseling et al., 2019). We see no reason to suspect that a tether to an internal vesicle would prevent the docked vesicle from priming for release.

      7) Model: For fitting de-staining during 20 Hz stimulation, authors state that it was necessary to allow >5-fold Facilitation. This seems to be non-physiologically relevant, since previous studies found only very mild facilitation at room temperature (typically below a factor of 1.5-2.0) and the authors themselves state that, at most, a 1.3 fold facilitation was found.

      If the 1.3-fold facilitation estimate comes from us, it must have been in a different context.

      Most estimates of facilitation that are published are heavily convolved with simultaneous depression, and there is additionally a saturation mechanism for readily releasable vesicles with high release probability that is not widely known (Garcia-Perez and Wesseling, 2008). The standard method for eliminating the depression is to lower the probability of release by lowering extracellular [Ca2+], which additionally relieves occlusion by the saturation mechanism. And, lowering [Ca2+] uncovers an enormous amount facilitation at synapses in hippocampal cell culture. For example, see Figure 2B of Stevens and Wesseling (1999), which shows a 7-fold enhancement during 9 Hz stimulation, and Figure 3 of the same study, which shows a linear relationship with frequency. Taken together these two results suggest 15-fold enhancement during 20 Hz stimulation, which far exceeds the 5-fold value needed at inefficient release sites to make our working model 1t the FM-dye destaining results.

      References

      Garcia-Perez E, Lo DC & Wesseling JF (2008). Kinetic isolation of a slowly recovering component of short-term depression during exhaustive use at excitatory hippocampal synapses. Journal of Neurophysiology 100, 781–95.

      Garcia-Perez E & Wesseling JF (2008). Augmentation controls the fast rebound from depression at excitatory hippocampal synapses. Journal of Neurophysiology 99, 1770–86.

      Huson V, van Boven MA, Stuefer A, Verhage M & Cornelisse LN (2019). Synaptotagmin-1 enables frequency coding by suppressing asynchronous release in a temperature dependent manner. Scienti1c reports 9, 11341.

      Stevens CF & Wesseling JF (1999). Augmentation is a potentiation of the exocytotic process. Neuron 22, 139–46.

      Wesseling JF & Lo DC (2002). Limit on the role of activity in controlling the release-ready supply of synaptic vesicles. Journal of Neuroscience 22, 9708–20.

      Wesseling JF, Phan S, Bushong EA, Siksou L, Marty S, Pérez-Otaño I & Ellisman M (2019). Sparse force-bearing bridges between neighboring synaptic vesicles. Brain Structure and Function 224, 3263–3276.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In this manuscript, the authors used machine learning algorithm to analyze published exosome datasets to find biomarkers to differentiate exosomes of different origin.

      Strengths:

      The performance of the algorithm are generally of good quality.

      Weaknesses:

      The source datasets are heterogeneous as described in Figure 1 and Figure 2, or Line 72-75; and therefore questionable.

      Response: We thank the reviewer for this assessment. The commonly used biomarkers of exosomes exhibit heterogeneous presence and abundance within the exosomes derived from different cell lines, tissue, and biological fluids. The primary goal of this study was to identify universal exosomal biomarkers that remain consistent across different sources of exosomes, unaffected by potential isolation and quantification bias. This objective was achieved through an integration of datasets from different sources, which allowed for the subsequent identification of common proteins associated with exosomes. Among the 18 protein markers identified, it is noteworthy that they are universally abundant in all cell lines and their exosomes. We believe that despite the heterogeneity of the datasets used here, the identification of 18 universal protein markers in exosomes from diverse sources is a strength of this analysis.

      (1) Nomenclature: Extracellular vesicles (EVs) are small vesicles released by cells into the extracellular space, exhibiting high heterogeneity in origin across species. Exosomes are typically defined as being of multivesicular body origin. However, the absence of several crucial common exosomal markers, including CD63, suggests that the proteomics analysis may include various other vesicular and non-vesicular materials.

      Response: As we reported previously (Kugeratski et al., Nature Cell Biology, 2021), the commonly used exosomal markers, such as CD9, CD63 and CD81 exhibit heterogeneity with respect to presence and abundance in the exosomes derived from different cell types. For example, CD63 demonstrated remarkably lower abundance in the exosomes derived from Raji cell lines. In our study, the detection rate of CD63 (< 50%) is quite low in the tissue-derived exosomes, which is consistent with the observations made in another proteomics based study (Hoshino et al., Cell, 2020). Therefore, relying solely on these markers is inadequate for the comprehensive characterization of EVs as exosomes. Therefore, we conducted this study to identify universal protein markers of exosomes by integrating data from multiple sources, thereby circumventing potential confounding effects due to their diverse origins and other technical differences.

      (2) Line 90: IPA is not prior in the manuscript.

      Response: We provided the full definition of IPA (Ingenuity Pathway Analysis) in the revised manuscript.

      (3) Figure 2B: Considering the large number of variables, it is unsurprising that the 2D PCA (Principal Component Analysis) falls short in the classification task. Including a few additional dimensions (principal components) might have the potential to better distinguish the cancer groups from the control group.

      Response: Thank you for this insightful query. The purpose of utilizing PCA here is to appreciate the heterogeneity associated with exosomes from different studies. While we acknowledge that additional dimensions may be more useful in distinguishing between cancer and control exosomes, we believe that derived performance will remain inferior to the machine learning approach we developed here.

      (4) Figure 2D: Exosomes primarily derive from multivesicular bodies, rather than the plasma membrane. It remains unclear why the authors focus specifically on proteins in the plasma membrane. Is it intended to encompass all membrane proteins? Clarification is needed on this point.

      Response: A good point. This study attempted to identify protein biomarkers of exosomes originating from different sources. Our approach involved considering proteins present on the plasma membrane as potential biomarkers also because many of them have been detected on the surface of exosomes.

      (5) Figure 2F: The 18 identified proteins are also abundantly present in control cells, not solely in cancer-derived "exosomes." The statement in line 104 is misleading in this regard.

      Response: We apologize for the misleading sentence. We have revised the statement to state that “In total, we identified a set of 18 exosome protein markers that are present at a higher abundance in all exosomes examined”.

      (6) Figure 3B: Considering the definition of exosomes, CD63 and TSG101 should be present in all samples, and their absence raises concerns.

      Response: We understand the concern of the reviewer. In this Figure, we analyzed CD63 and TSG101 in tissue-derived exosomes. Our results are consistent with the previous study also shows the paucity of these makers in the tissue-derived exosomes (Hoshino et al., Cell, 2020). Our study highlights that CD63 and TSG101 cannot always identify exosomes from diverse cell lines and tissues. Such initial observations motivated us to conduct this study to identify the universal biomarkers of exosomes across different sources.

      (7) Figure 6G&H: Achieving an accuracy of 80% cannot be deemed "excellent."

      Response: We employed the word “excellent” in line 225 to describe the sensitivity and specificity associated with AUROC.

      (8) Other comments on methods: The manuscript lacks an explanation of the neural network structure and why it outperforms other methods. Additionally, details about the calculation of MI (mutual information), IPA, and other methods should be provided.

      Response: This is a good suggestion but in this work we did not employ the neural networks for the analysis. We provided additional details and explanations regarding the methodology for mutual information score calculation, as well as insights into the improved use of IPA and other relevant methods in the revised manuscript.

      Reviewer #2:

      Summary:

      This is a fine work on the development of computational approaches to detect cancer through exosomes. Exosomes are an emerging biomarker resource and have attracted considerable interests in the biomedical field. Kalluri and co-workers collected a large sample pool and used random forest to identify a group of protein markers that are universal to exosomes and to cancer exosomes. The results are very exciting and not only added new knowledge in cancer research but also a new and advanced method to detect cancer. Data was presented very nicely and the manuscript was well written.

      Strengths:

      Identified new biomarkers for cancer diagnosis via exosomes.

      Developed a new method to detect cancer non-invasively.

      Results were presented nicely and manuscript were well written.

      Weaknesses:

      N/A.

      Response: We appreciate the the enthusiastic assessment of our study by the reviewer.

      Reviewer #3:

      In the current study, Li et al. address the difficulty in early non-invasive cancer diagnosis due to the limitations of current diagnostic methods in terms of sensitivity and specificity. The study brings attention to exosomes - membrane-bound nanovesicles secreted by cells, containing DNA, RNA, and proteins reflective of their originating cells. Given the prevalence of exosomes in various biological fluids, they offer potential as reliable biomarkers. Notably, the manuscript introduces a new computational approach, rooted in machine learning, to differentiate cancers by analyzing a set of proteins associated with exosomes. Utilizing exosome protein datasets from diverse sources, including cell lines, tissues, and various biological fluids, the study spotlights five proteins as predominant universal exosome biomarkers. Furthermore, it delineates three distinct panels of proteins that can discern cancer exosomes from non-cancerous ones and assist in cancer subtype classification using random forest models. Impressively, the models based on proteins from plasma, serum, or urine exosomes achieve AUROC scores above 0.91, outperforming other algorithms such as Support Vector Machine, K Nearest Neighbor Classifier, and Gaussian Naive Bayes. Overall, the study presents a promising protein biomarker signature tied to cancer exosomes and proposes a machine learning-driven diagnostic method that could potentially revolutionize non-invasive cancer diagnosis.

      Response: We appreciate this positive assessment of our work.

      (1) The authors should clarify why they focused solely on protein markers. Why weren't RNA transcripts also considered? Do the authors see value in incorporating RNA/micro RNA transcripts to enhance diagnostic capabilities?"

      Response: This is a very important point for further consideration. The current datasets for exosomal proteins are extensive and generally proteins might offer distinct advantages in cancer diagnostics compared to nucleic acids due to their stability in exosomes and extended half-life (Schey et al., Methods, 2015). We do agree that the power of analysis can only get better if also add DNA, RNAs and other constituents and we hope to pursue such analysis in the future.

      (2) Can the identified exosomal markers also be evaluated as prognostic indicators?

      Response: We appreciate this intriguing question. Indeed, proteins such as apolipoprotein E (APOE) may serve as a potential prognostic marker in various cancers (Ren et al., Cancer Medicine, 2019). APOE is being extensively studied as a prognostic and diagnostic marker for multiple cancer types, including colorectal cancer (Martin et al., BMC Cancer, 2014), gastric cancer (Sakashita et al., Oncology Reports, 2008), pancreatic cancer (Chen et al., Medical Oncology, 2013; Xu et al., Tumor Biology, 2016), and human hepatocellular carcinoma (Yokoyama et al., International Journal of Oncology, 2006). In these studies, APOE levels were found to be elevated in the serum of cancer patients and correlated with survival outcomes.

      (3) The discussion should emphasize if the identified protein markers are tumor-specific or if they indicate, for instance, the patient's immune reaction to the tumor.

      Response: A good point. We believe that the identified biomarkers are tumor-specific and a significant number of these proteins have been previously associated with tumor initiation and progression. Further studies will likely identify immune response-related biomarkers when more in-depth tumor-level analyses are performed.

      References:

      Chen, J., Chen, L. J., Yang, R. B., Xia, Y. L., Zhou, H. C., Wu, W., Lu, Y., Hu, L. W., & Zhao, Y. (2013). Expression and clinical significance of apolipoprotein E in pancreatic ductal adenocarcinoma. Med Oncol, 30(2), 583. https://doi.org/10.1007/s12032-013-0583-y

      Hoshino, A., Kim, H. S., Bojmar, L., Gyan, K. E., Cioffi, M., Hernandez, J., Zambirinis, C. P., Rodrigues, G., Molina, H., Heissel, S., Mark, M. T., Steiner, L., Benito-Martin, A., Lucotti, S., Di Giannatale, A., Offer, K., Nakajima, M., Williams, C., Nogues, L., . . . Lyden, D. (2020). Extracellular Vesicle and Particle Biomarkers Define Multiple Human Cancers. Cell, 182(4), 1044-1061 e1018. https://doi.org/10.1016/j.cell.2020.07.009

      Kugeratski, F. G., Hodge, K., Lilla, S., McAndrews, K. M., Zhou, X., Hwang, R. F., Zanivan, S., & Kalluri, R. (2021). Quantitative proteomics identifies the core proteome of exosomes with syntenin-1 as the highest abundant protein and a putative universal biomarker. Nat Cell Biol, 23(6), 631-641. https://doi.org/10.1038/s41556-021-00693-y

      Martin, P., Noonan, S., Mullen, M. P., Scaife, C., Tosetto, M., Nolan, B., Wynne, K., Hyland, J., Sheahan, K., Elia, G., O'Donoghue, D., Fennelly, D., & O'Sullivan, J. (2014). Predicting response to vascular endothelial growth factor inhibitor and chemotherapy in metastatic colorectal cancer. BMC Cancer, 14, 887. https://doi.org/10.1186/1471-2407-14-887

      Ren, L., Yi, J., Li, W., Zheng, X., Liu, J., Wang, J., & Du, G. (2019). Apolipoproteins and cancer. Cancer Med, 8(16), 7032-7043. https://doi.org/10.1002/cam4.2587

      Sakashita, K., Tanaka, F., Zhang, X., Mimori, K., Kamohara, Y., Inoue, H., Sawada, T., Hirakawa, K., & Mori, M. (2008). Clinical significance of ApoE expression in human gastric cancer. Oncol Rep, 20(6), 1313-1319. https://www.ncbi.nlm.nih.gov/pubmed/19020708

      Schey, K. L., Luther, J. M., & Rose, K. L. (2015). Proteomics characterization of exosome cargo. Methods, 87, 75-82. https://doi.org/10.1016/j.ymeth.2015.03.018

      Xu, X., Wan, J., Yuan, L., Ba, J., Feng, P., Long, W., Huang, H., Liu, P., Cai, Y., Liu, M., Luo, J., & Li, L. (2016). Serum levels of apolipoprotein E correlates with disease progression and poor prognosis in breast cancer. Tumour Biol. https://doi.org/10.1007/s13277-016-5453-8

      Yokoyama, Y., Kuramitsu, Y., Takashima, M., Iizuka, N., Terai, S., Oka, M., Nakamura, K., Okita, K., & Sakaida, I. (2006). Protein level of apolipoprotein E increased in human hepatocellular carcinoma. Int J Oncol, 28(3), 625-631. https://www.ncbi.nlm.nih.gov/pubmed/16465366

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports comprehensive multi-omic data on the changes induced in young and aged male mouse tail fibroblasts after treatment with chemical reprogramming factors. The authors claim that chemical reprogramming factors induce changes consistent with a reduction of cellular 'biological' age (e.g., correlations with established aging markers in whole tissues). However, the study relies on previously identified aging markers (instead of aging in the tail fibroblast system itself), and thus, at this stage, the evidence in support of the observed molecular changes truly reflecting changes in biological age in the study system is still incomplete.

      Essential revisions

      After discussion with reviewers, we believe that the conclusions of the manuscript would be significantly strengthened with the following revisions:

      (1) Rather than basing the analysis of age-related markers on public tissue data, it is recommended that authors use their own data on pre-reprogramming fibroblasts to define molecular aging-related markers/signatures specifically for male tail fibroblasts at 4 vs 20 months. This should also always be included in figures as reference points.

      We appreciate these helpful comments. Please refer to our responses to Reviewers #1 and #2 concerning these suggestions and the corresponding changes we have made in the revised manuscript.

      (2) In general, the methods as written lack the details necessary to fully understand the study/reproduce it independently, notably in terms of data analysis choices (e.g. use of FWER/FDR type correction for multiple testing, use of raw vs normalized RNA counts for PCA, etc).

      Thank you for this feedback. We have modified our text to address this issue. Please refer to our responses to Reviewer #1 for the specific changes we have made.

      (3) More generally, the authors should better outline the limitations/caveats of their experimental design in the discussion and/or abstract, including the specific cell type and the choice of using only male data (since aging itself is very sex-dimorphic, and the impact of partial reprogramming on aging phenotypes may also be sex-dimorphic).

      Thank you for this important feedback. We have now added a section to our Discussion in which we directly address potential limitations of our study concerning sex-specific differences and the cell type used.

      Public Reviews:

      Reviewer #1:

      Summary:

      The investigators employed multi-omics approach to show the functional impact of partial chemical reprogramming in fibroblasts from young and aged mice.

      Strengths:

      Multi-omics data was collected, including epigenome, transcriptome, proteome, phosphoproteome, and metabolome. Different analyses were conducted accordingly, including differential expression analysis, gene set enrichment analysis, transcriptomic and epigenetic clock-based analyses. The impact of partial chemical reprogramming on aging was supported by these multi-source results.

      We appreciate the reviewer noting the strength and comprehensiveness of our approach.

      Weaknesses:

      More experimental data may be needed to further validate current findings.

      We thank the reviewer for this suggestion. To further validate our findings, we have proceeded as follows: (1) First, we have investigated the role of Prkaca activation during partial chemical reprogramming with 7c (see updated Fig. 5C, Fig. 5 – figure supplement 1B). By confocal microscopy, we show that partial chemical reprogramming with 7c does not cause Prkaca to localize to mitochondria; rather, its cellular distribution is altered to favor nuclear localization. We also use RNAi to knockdown Prkaca and find that Prkaca is not necessary for mediating the increase in mitochondrial membrane potential upon partial chemical reprogramming with 7c.

      (2) We have determined the effect of partial chemical reprogramming with 7c on apoptosis using Annexin V assay (see updated Fig. 5 – figure supplement 1C). We show that during the course of partial chemical reprogramming, the proportion of apoptotic cells steadily increases to about 20 percent.

      (3) We have re-analyzed our multi-omics data to determine the molecular differences (e.g. at the epigenome, transcriptome, proteome, and metabolome levels) between fibroblasts isolated from young and old mice (see updated Fig. 2 – figure supplement 1, Fig. 6 – figure supplement 1, and Fig. 7 – figure supplement 2). Additionally, we have updated Fig. 7A to include statistical comparisons of transcriptomic age of 4-month-old and 20-month-old fibroblasts. Finally, we have updated Fig. 3D to include functional enrichment of gene and protein expression levels of aged fibroblasts.

      (4) We have more thoroughly characterized the effects of partial chemical reprogramming on the epigenome (see Fig. 7 – figure supplement 3).

      (5) Julie Y. Chen was added on as an additional co-author for producing the analyses shown in Fig. 7 – figure supplement 2, and Fig. 7 – figure supplement 3.

      Reviewer #2:

      The short-term administration of reprogramming factors to partially reprogram cells has gained traction in recent years as a potential strategy to reverse aging in cells and organisms. Early studies used Yamanaka factors in transgenic mice to reverse aging phenotypes, but chemical cocktails could present a more feasible approach for in vivo delivery. In this study, Mitchell et al sought to determine the effects that short-term administration of chemical reprogramming cocktails have on biological age and function. To address this question, they treated young and old mouse fibroblasts with chemical reprogramming cocktails and performed transcriptome, proteome, metabolome, and DNA methylation profiling pre- and post-treatment. For each of these datasets, they identified changes associated with treatment, showing downregulation of some previously identified molecular signatures of aging in both young and old cells. From these data, the authors conclude that partial chemical reprogramming can rejuvenate both young and old fibroblasts.

      The main strength of this study is the comprehensive profiling of cells pre- and post-treatment with the reprogramming cocktails, which will be a valuable resource for better understanding the molecular changes induced by chemical reprogramming. The authors highlighted consistent changes across the different datasets that are thought to be associated with aging phenotypes, showing reduction of age-associated signatures previously identified in various tissues. However, from the findings, it remains unclear which changes are functionally relevant in the specific fibroblast system being used. Specifically:

      (1) The 4 month and 20 month mouse fibroblasts are designated "young" vs "old" in this study. An important analysis that was not shown for each of the profiled modalities was a comparison of untreated young vs old fibroblasts to determine age-associated molecular changes in this specific model of aging. Then, rather than using aging signatures defined in other tissues, it would be more appropriate to determine whether the chemical cocktails reverted old fibroblasts to a younger state based on the age-associated changes identified in this comparison.

      In our study, we have used 4 biological samples per group for young and old untreated fibroblasts, and these samples have been used to calculate the effect of 7c and 2c cocktails on gene expression in each age group. Therefore, the correlation between logFC induced by 7c/2c treatment and logFC between young and old fibroblasts would be biased, since the same untreated samples would be used in both calculations: estimates B-A and C-B will be, on average, negatively correlated even if A, B and C are independent random variables. For this reason, to investigate the effect of cocktails on biological age, we utilized gene expression signatures of aging, estimated based on more than 2,600 samples of different ages from 25 data sources (PMID: 37269831). Notably, our multi-tissue signatures of aging were identified based on data from 17 tissues, including skin. Therefore, these biomarkers seem to represent more reliable and universal molecular mechanisms of aging. Since they have been identified using independent data, the signatures also don’t introduce the statistical bias described above. For these reasons, we think that they are more applicable for the current analysis. To demonstrate that the utilized aging signatures are overall consistent with the changes observed in studied fibroblasts, we performed GSEA-based analysis, testing association between logFC in aged fibroblasts and various signatures of aging and reprogramming (similar to our analysis in Fig. 2E). We found that the changes in aged fibroblasts from the current study demonstrated positive association with the majority of aging signatures (kidney, liver and multi-tissue signatures in mouse and rat) (Fig. 2 – figure supplement 1A) and were negatively associated with signatures of reprogramming. In addition, we characterized functional changes perturbed in untreated aged fibroblasts at the level of gene expression and protein concentrations and observed multiple changes consistent with the aging signatures, such as upregulation of genes and proteins involved in inflammatory response and interferon signaling (Fig. 3D, Fig. 2 – figure supplement 1C). Therefore, changes observed in untreated aged fibroblasts seem to agree with age-related molecular changes identified across mammalian tissues in our previous studies.

      We would also like to mention that the epigenetic clocks used in this study consistently show that the fibroblasts from 20-month-old fibroblasts are significantly older than the fibroblasts from 4-month-old mice (Fig. 7B). Moreover, we have revised the manuscript to show that these epigenetic differences between young and old untreated fibroblasts are not due to overall changes in mean DNA methylation (Fig. 7 – figure supplement 2). In contrast, in the revised manuscript, we observe that 7c treatment is reducing the epigenetic age of cells by decreasing mean DNA methylation levels (Fig. 7 – figure supplement 3).

      (2) Across all datasets, it appears that the global profiles of young vs old mouse fibroblasts are fairly similar compared to treated fibroblasts, suggesting that the chemical cocktails are not reverting the fibroblasts to a younger state but instead driving them to a different cell state. Similarly, in most cases where specific age-related processes/genes are being compared across untreated and treated samples, no significant differences are observed between young and old fibroblasts.

      We agree that our data shows that partial chemical reprogramming seems to induce a similar effect on young and old fibroblasts. In Fig. 2 – figure supplement 1B, the Spearman correlation coefficients for the effects on gene expression in young and old fibroblasts are 0.80 and 0.85 for 2c and 7c, respectively. It is important to note that the effect of partial chemical reprogramming is a magnitude higher (say in terms of number of differentially expressed genes) than the effect of aging in the untreated fibroblasts. Partial chemical reprogramming with 7c, we believe, is pushing the cells to a younger state as a byproduct of producing a different cellular metabolic state with a strong increase in OXPHOS capacity.

      (3) Functional validation experiments to confirm that specific changes observed after partial reprogramming are indeed reducing biological age is limited.

      Functional validation of rejuvenating interventions is limited in vitro, as cells do not completely maintain their “aged” phenotype once isolated and cultured, and pursuing partial chemical reprogramming in vivo in naturally-aged mice was beyond the scope of the study. One of the best reporters of biological age that are preserved in primary cells in vitro are epigenetic and transcriptomic clocks, which were both utilized in this manuscript to show that 7c treatment, but not 2c, reduces biological age. We show that splicing-related damage is marginally elevated in old fibroblasts compared to young, and that 7c reduces splicing damage by reducing intron retention. Moreover, the epigenetic clocks used in this study show that the 20-month-old fibroblasts are significantly older than the 4-month-old fibroblasts, indicating that the “aged” phenotype is at least partially preserved. Furthermore, according to previous studies (PMIDs: 37269831, 31353263), one of the strongest functional biomarkers of aging is downregulation of mitochondrial function and energy metabolism, including oxidative phosphorylation, while upregulation of these functions is usually associated with extended lifespan in mice. For this reason, we have focused on these pathways in our study and assessed them with functional assays.

      (4) Partial reprogramming appears to substantially reduce biological age of the young (4 month) fibroblasts based on the aging signatures used. It is unclear how this result should be interpreted.

      This is a caveat of all reprogramming strategies/”anti-aging” interventions developed and tested to date. Currently, there are no genetic or pharmacological methods that target only the “aged” state and not the “young” state as well (i.e. an intervention that would only cause a change in old cells and revert them to a younger state). However, “young” cells in our study and many other studies are still the cells of an intermediate age, as aging appears to begin early during development. Therefore, perhaps unsurprisingly, partial chemical reprogramming seemed to have similar effects on fibroblasts isolated from young and old mice, which is in line with OSK/OSKM reprogramming. These results should be interpreted as follows: partial chemical reprogramming does not depend on the epigenetic state (biological age) of adult cells to induce rejuvenation. We have updated the discussion section of our manuscript accordingly.

      Recommendations for the authors:

      Reviewer #1:

      (1) How was the PCA conducted for RNA-seq data? Were the raw or normalized counts used for PCA?

      Normalized counts were used for PCA of the RNA-seq data.

      (2) Supplementary Fig 3c, why was the correlation between the red rows and red columns low? Was the color of group messed up? Why was the Pearson correlation used instead of Spearman correlation? Most of the correlation analyses in the manuscript used Spearman correlation.

      We thank the reviewer for noticing this mistake. The colors of the groups have now been corrected. Furthermore, to be consistent with the rest of the manuscript, we have performed a Spearman correlation analysis on the normalized proteomics data to evaluate sample-to-sample similarities and updated Fig. 3 – figure supplement 1 accordingly. Overall, the results are similar to those obtained by Pearson correlation.

      (3) Were the significant metabolites tested by one-way ANOVA adjusted for family-wise type I error rate? It is surprising that over 50% metabolites were significant.

      Yes, the significant metabolites were adjusted for family-wise type I error rate (with a 5% significance threshold) in Fig. 6B.

      (4) Missing full names of several abbreviations, such as NIA, RLE, PSI, etc.

      Thank you for noticing the missing abbreviations. We have corrected this by writing out the full term in the first instance in which each abbreviation appears.

      (5) Methods section may be too long. Some paragraphs could be moved to supplementary text.

      eLife does not have a limit to the number of figures or amount of text. Therefore, we have kept the methods section largely unaltered as we feel that they would be helpful to the scientific community.

      Reviewer #2:

      (1) As discussed in the public review, I would recommend first establishing what differences exist between 4 month and 20 month fibroblasts to identify potential age-related changes in these fibroblasts.

      We thank the reviewer for this suggestion. We have now thoroughly characterized the molecular differences between fibroblasts taken from young and old mice at the epigenome, transcriptome, proteome, and metabolome levels. Please refer to previous responses for more specific details.

      We have also attempted to establish aging-related differences at the phosphoproteome level, particularly in regards to mitochondrial processes (see figure below), but only GOcc: mitochondrion and GObp: mitochondrial transport come close to being statistically significant (raw p-values of 0.05 and 0.08, respectively) in the control comparison.

      Author response image 1.

      (2) While the global changes currently highlighted in the study are informative and should remain in the revised manuscript, additional analyses to show which age-related changes identified in point 1 are reverted upon 2c or 7c treatment would better address the question of whether these cocktails revert age-related changes seen in fibroblasts. These analyses should be performed for each dataset (i.e transcriptomic, proteomic, epigenomic, metabolomic) generated.

      Thank you for this comment. We have now evaluated the effects of partial chemical reprogramming on the specific molecular differences between fibroblasts isolated from young and old mice (see updated Fig. 2 – figure supplement 1, Fig. 6 – figure supplement 1, Fig. 7 – figure supplement 2, and Fig. 7 – figure supplement 3). For functional enrichment of aged fibroblasts at the gene and protein level, please refer to updated Fig. 3D.

      (3) Comparisons between partial reprogramming and OSKM reprogramming signatures are repeatedly made in the paper, but it is not clear from the text whether similarity to OSKM reprogramming signatures is a desired or undesired feature. Since there are likely both rejuvenating and oncogenic aspects of the OSKM signatures, it is unclear what conclusions can be made from these comparisons.

      Two central questions of this study were (1) if partial chemical reprogramming could induce cellular rejuvenation, and (2) if so, would it do so by merely chemically activating expression of Yamanaka factors. In this study, we find that 7c, the cocktail that demonstrated the most profound effect on biological age, only minorly upregulates Klf4, downregulates c-Myc, and has no effect on Sox2 or Oct4 expression. Thus, partial chemical reprogramming seems to operate through a mechanism independent of upregulating OSK/OSKM gene expression. This is crucial as it suggests that there are other transcription factors outside of OSKM that can be targeted to induce cellular rejuvenation and reversal of biological age. However, the direct transcriptional targets of partial chemical reprogramming are currently unknown and require further investigation.

      Partial reprogramming with OSK/OSKM has several limitations, including low efficiency, oncogenic risk, and differences in the speed of reprogramming according to cell/tissue type. These risks could be inherently tied to the transcription factors OSKM themselves; thus, partial chemical reprogramming, by avoiding strong activation of these genes, could potentially avoid these risks and provide a safer means for reversing biological age in vivo. However, extensive follow-up studies beyond the scope of this manuscript are certainly required to determine this.

      We have addressed this comment by modifying the discussion to include these points.

      (4) When analyzing the phospho-proteomics data, results are discussed as general changes in phosphorylation of proteins involved in different cellular processes. However, phosphorylation can either activate or inhibit a specific protein, and can depend on the specific residue in a protein that is modified. Different proteins in a cellular process can also respond in opposite directions to phosphorylation. Treating activating and inactivating phosphorylation events separately in describing these results would be more informative.

      We agree that an analysis that considers for each specific phosphosite whether it activates or inactivates a particular pathway would in principle be preferable over our current enrichment analysis that only accounts for the increase or decrease in phosphorylation of each site without knowing its biological meaning. However, unfortunately, we think it is currently practically not possible to conduct such an analysis. The proposed analysis would require a database with information on which residues are (de-)phosphorylated when a certain pathway is activated. However, as far as we know, there are currently no databases that link activation or inactivation of specific phosphosites to pathways in repositories like KEGG, HALLMARK, GObp, GOcc, GOmf, Reactome, etc.

      Some databases link phosphosites to drugs, diseases and kinases (e.g. PTMsigDB (PMID: 30563849)). However, these authors explicitly state: “We note that we do not capture functional annotations of PTM sites in PTMsigDB, such as activating or inactivating effect on the modified protein.” Furthermore, even in these databases, for the vast majority of the registered phosphosites, the responsible kinases are unknown, especially in mice. In our work, we made use of PhosphoSitePlus for kinase substrate enrichment analysis (see Fig. 5B). Such analyses, where kinase activity is inferred based on activated phosphosites are indeed commonly performed (see PMIDs: 34663829, 37269289, 37585503).

      In the absence of a repository that assigns activity to phosphosites, if enrichment analysis is being done for biological pathways, it is standard practice to so without accounting for whether phosphosites are activating or inactivating (see PMID: 34663829), as we have done in our manuscript (Fig. 5A).

      Despite the drawbacks, we believe our analysis is relevant, as it demonstrates important biological activity in these pathways uopn 2c/7c treatments as compared to controls. For example, the observed increase in abundance in mitochondrial OXPHOS complexes (Fig. 3E) combined with an increase in general phosphorylation of mitochondrial proteins (Fig. 5A) likely points to an increase mitochondrial activity, although one cannot exclude that some individual phosphorylation events might have inhibitory effects on certain mitochondrial proteins, while others might indicate increases in activity.

      (5) For the transcriptomic and epigenetic aging clocks used in Fig 7, significance tests need to be included for untreated 4 month vs 20 month fibroblasts. Particularly for the transcriptional clock, the differences are small and suggest that it may not be a strong aging signature.

      We have updated our clock analysis with the most recent versions of the clocks and added statistical significance between 4-month-old and 20-month-old untreated fibroblasts there (Fig. 7A). The difference is statistically significant for the chronological clock. However, when the lifespan-adjusted clock was applied, no statistical significance was observed, suggesting that 20-month-old fibroblasts do not exhibit substantial changes in gene expression associated with decreased healthspan and increased mortality.

      (6) For heatmaps shown in Figure 3D and Figure 4, please include untreated 4 month and 20 month fibroblasts as well to determine if pathways being compared are different between young and old fibroblasts.

      We have updated Figure 3D with functional enrichment results for aged fibroblasts at gene and protein expression levels, as requested. As for Fig. 4, we explained in our reply to point 1 of Reviewer #2 in the public review why addition of aged fibroblasts there would be biased there. Instead, we have performed GSEA-based association analysis for changes observed in aged fibroblasts and signatures of aging (Fig. 2 – figure supplement 1), confirming that our signatures are overall consistent with patterns of 20-month-old fibroblasts from the current study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      The authors investigated the antigenic diversity of recent (2009- 2017) A/H3N2 influenza neuraminidases (NAs), the second major antigenic protein after haemagglutinin. They used 27 viruses and 43 ferret sera and performed NA inhibition. This work was supported by a subset of mouse sera. Clustering analysis determined 4 antigenic clusters, mostly in concordance with the genetic groupings. Association analysis was used to estimate important amino acid positions, which were shown to be more likely close to the catalytic site. Antigenic distances were calculated and a random forest model was used to determine potential important sites.

      This has the potential to be a very interesting piece of work. At present, there are inconsistencies in the methods, results and presentation that limit its impact. In particular, there are weaknesses in some of the computational work.

      Strengths

      (1) The data cover recent NA evolution and a substantial number (43) of ferret (and mouse) sera were generated and titrated against 27 viruses. This is laborious experimental work and is the largest publicly available neuraminidase inhibition dataset that I am aware of. As such, it will prove a useful resource for the influenza community.

      (2) A variety of computational methods were used to analyse the data, which give a rounded picture of the antigenic and genetic relationships and link between sequence, structure and phenotype.

      Weaknesses

      (1) Inconsistency in experimental methods

      Two ferret sera were boosted with H1N2, while recombinant NA protein for the others. This, and the underlying reason, are clearly explained in the manuscript. The authors note that boosting with live virus did not increase titres. Nevertheless, these results are included in the analysis when it would be better to exclude them (Figure 2 shows much lower titres to their own group than other sera).

      As an exercise, we have excluded the H1N2 boosted ferrets sera and no major impact was observed in the antigenic grouping (see Author response image 1a). Another way to control for differences in immunogenicity is to normalize the NAI values with the homologous ELISA titers for each antigen. Clustering based on these ELISA normalized NAI titers reveals the same 4 distinct antigenic groups but with one change: Kan17 is shifted from group 1 to group 2 (Author response image 1b). Note that a homologous ELISA titer is not available for A/West-Virginia/17/2012 and thus this serum sample is not included in Author response image 1b.

      Author response image 1.

      Antigenic and phylogenetic relatedness of N2 NAs. Phylogenetic tree based on the N2 NA head domain amino acid sequences and heat-map representing the average of normalized neuraminidase inhibition titer per H6N2 [log2 (max NAI/NAI)] determined in ferret sera after the boost (listed vertically). The red-to-blue scale indicates high-to-low NAI observed in ELLA against the H6N2 reassortants (listed at the bottom). UPGMA clustering of H6N2s inhibition profiles are shown on top of the heat map and colored according to the phylogenetic groups.(a) Based on the ferret sera with exclusion of the sera that were obtained following prime-boost by infection with H1N2 (A/Estonia/91625/2015 and A/Stockholm/15/2014). (b) Based on serum NAI titers that were normalized by the homologous ELISA titer.

      (2) Inconsistency in experimental results

      Clustering of the NA inhibition results identifies three viruses which do not cluster with their phylogenetic group. Again, this is clearly pointed out in the paper. Further investigation of this inconsistency is required to determine whether this has a genetic basis or is an experimental issue. It is difficult to trust the remaining data while this issue is unresolved.

      We understand the concern of the reviewer. It is important to keep in mind that discrete grouping of antigens allows to visualize major antigenic drifts. However, within closely related groups the cross reactivity of antisera is more likely distributed in a spectrum. When we constructed an antigenic map based on the antigenic cartography algorithm (as described by Smith D. et al, 2004), Kansas17, Wis15, and Ala15 are positioned more closely to antigenic group 1 than the majority of other antigens that were classified as group 2 (Author response image 2a). Similar results were obtained when individual ferret sera from the biological duplicates were used (Author response image 2b). This antigenic cartography map is now added as Figure 2. Figure supplement 3 to the revised manuscript.

      Author response image 2.

      The antigenic cartography was constructed using averaged data from pairs of ferrets (a). Similar analysis was performed on individual ferrets sera (b).

      (3) Inconsistency in group labelling

      A/Hatay/4990/2016 & A/New Caledonia/23/2016 are in phylogenetic group 1 in Figure 2 and phylogenetic group 1 in Figure 5 - figure supplement 1 panel a.

      Our apologies: there was indeed a mistake in labeling of Figure 5. A new antigenic cartography was constructed and included in the revised manuscript. As a result Figure 5 - figure supplement has now become redundant and was removed from the manuscript.

      A/Kansas/14/2017 is selected as a representative of antigenic group 2, when in Figure 2 it is labelled as AC1 (although Figure 2 - supplement 4 which the text is referring to shows data for A/Singapore/Infimh-16-0019/2016 as the representative of AC2). A/Kansas/14/2017 is coloured and labelled as AC2 in Figure 2 - supplement 5.

      Thank you for pointing out this inconsistency. Kan17 clustered antigenically in group 1 based on the NAI values that were normalized relative to the serum with the maximal NAI value against the H6N2 virus that was tested. When using NAI titers that are normalization with the homologous ELISA titer, Kan17 is positioned in group 2. Likewise, antigenic cartography mapping positions Kan17 in group 2. Therefore, we conclude that A/Kansas/14/2017 NA is a representative of group 2.

      The colouring is changed for Figure 3a at the bottom. A/Heilongjiang-Xiangyang/1134/2011 is coloured the same as AC4 viruses when it is AC1 in Figure 2. This lack of consistency makes the figures misleading.

      We apologize for this mistake. The coloring in Figure 3a has been corrected.

      (4) Data not presented, without explanation

      The paper states that 44 sera and 27 H6N2 viruses were used (line 158). However, the results for the Kansas/14/2017 sera do not appear to be presented in any of the figures (e.g. Figure 2 phylogenetic tree, Figure 5 - figure supplement 1). It is not obvious why these data were not presented. The exclusion of this serum could affect the results as often the homologous titre is the highest and several heatmaps show the fold down from the highest titre.

      Serum against A/Kansas/14/2017 was not prepared. For that reason, it is not included in the analysis. We agree that such homologous serum ideally should have been included and in the NAI assay would have resulted in a high if not the highest titer. However, we noticed that homologous sera did not always have the highest titers, especially in panels like ours were some antigens are closely related. The highest titer obtained against Kan17 H6N2 was from A/Bris/16 sera: 1/104, a titer that is in the range of other, homologous titers observed in the panel (Table S3). The Bris16 and Kan17 NAs have five amino acid differences. In summary, inclusion of Kan17 homologous sera would likely not impact the analysis and interpretation of the results because there are multiple highly cross-inhibiting heterologous serum samples against Kan17.

      (5) The cMDS plot does not have sufficient quality assurance A cMDS plot is shown in Figure 5 - figure supplement 1, generated using classical MDS. The following support for the appropriateness of this visualisation is not given. a. Goodness of fit of the cMDS projection, including per point and per titre. b. Testing of the appropriate number of dimensions (the two sera from phylogenetic group 3 are clustered with phylogenetic group 2; additional dimensions might separate these groups). c. A measure of uncertainty in positioning, e.g. bootstrapping. d. A sensitivity analysis of the assumption about titres below the level of detection (i.e. that <20 = 10). Without this information, it is difficult to judge if the projection is reliable.

      We agree with these comments. We have removed Figure 5 – figure supplement 1, and added new figure 2 – figure supplement 3 (antigenic cartography) instead.

      (6) Choice of antigenic distance measure

      The measure of antigenic distance used here is the average difference between titres for two sera. This is dependent on which viruses have been included in the analysis and will be biased by the unbalanced number of viruses in the different clusters (12, 8, 2, 5).

      To verify the impact of the number of antigens on our analysis, the matrix of differences was generated with only 4 H6N2s representing at least one phylogenetic group (Per09, Sin16, Hel823 and Ind11) (Author response image 3a). This matrix is very similar to the one calculated based on all 27 antigens (Author response image 3b). The obtained matrix (Author response image 3a) was used in random forest to model antigenic distances and the result of prediction was plotted against real differences calculated based on the full data. The correlation coefficient (R2) of predicted vs observed values dropped from 0.81 to 0.71, suggesting that the number of antigens tested does not drastically affect the antigenic differences calculated based on serum values (Author response image 3e). Importantly, amino acid substitutions potentially associated with increased antigenic distances are similarly identified (Author response image 3c, d and f).

      Author response image 3.

      Matrix of differences was calculated using only 4 H6N2 antigens (a) or the full panel (b). The matrixes from (c) 4 or (d) 27 antigens were used in random forest modeling to estimate the impact of amino acid changes, respectively. The rf modeling data generated from 4 H6N2 only was plotted and correlated with values calculated from the full panel of 27 H6N2s (e). The multi-way importance plot indicates in red that 7 out of the 10 most important substitutions were identified by the analysis using only 4 H6N2s (f).

      Interestingly, when matrix of differences is calculated using only 4 H6N2s data but not including at least one representative of antigenic group 1 and 2, the correlation coefficient between the predicted values and values obtained from the full panel is dramatically impacted (R2 values drops from 0.81 to 0.5 and 0.57. It is important to note that most of the sera also belong to phylogenetic antigens from groups 1 and 2. As a consequence, poorer prediction of those antigens would more drastically impact the correlation. No drastic drop was observed when representative H6N2s from group 3 or 4 were excluded from the data (from 0.81 to 0.75 and 0.73, Author response image 4 c and d).

      Author response image 4.

      Random forest analysis was repeated using only 4 antigens, but excluding representatives of one of the phylogenetic groups (a) no group 1, (b) no group 2, (c) no group 3, and (d) no group 4.

      We also used Euclidean distances as a measure of differences (Author response image 5). The predictive values obtained in rf have a slightly reduced R2 compared to the values obtained using average of differences.

      In conclusion the unbalanced number of antigens used per group and metric of distance does not seem to impact per se our analysis.

      Author response image 5.

      Antigenic distances were calculated using Euclidian distances of sera to sera. Those antigenic distances were used in rf for estimation of antigenic distance and importance of each amino acid substitution.

      (7) Association analysis does not account for correlations

      For each H6N2 virus and position, significance was calculated by comparing the titres between sera that did or did not have a change at that position. This does not take into account the correlations between positions. For haemagglutinin, it can be impossible to determine the true antigenic effects of such correlated substitutions with mutagenesis studies.

      Most of the potential correlated effects cannot be addressed with the panel of N2s, except for combinations of substitution that are included in the panel, such as 245/247 with or without 468. Only mutagenesis studies would shed light on the epistatic effects. However, it is important to keep in mind that those individual substitutions in such kind of study likely do not reflect natural evolution of N2 (cfr. the importance of the NA charge balance (Wang et al., 2021: 10.7554/eLife.72516).

      (8) Random forest method

      25 features are used to classify 43 sera, which seems high (p/3 is typical for classification). By only considering mismatches, rather than the specific amino acid changes, some signals may be lost (for example, at a given position, one amino acid change might be neutral while another has a large antigenic effect). Features may be highly, or perfectly correlated, which will give them a lower reported importance and skew the results.

      The number of features were optimized in the range from 5 to 80, with 25 being optimal (best R-value in predicted vs observed antigenic distances). Those features refer to the number of amino acid substitutions used in each tree. The number of trees was also optimized in the range of 100 to 2000.

      In random forest the matrix of differences is made considering only position based and not the type of substitution in pairs of NA. Indeed, substitutions with distinct effects may skew results by indicating lower reported importance.

      We have highlighted such potential bias in our discussion:

      “Also, our modelling does not consider that substitution by other amino acids can have a distinct impact on the antigenic distance. As a consequence, predictions based on the model could underestimate or overestimate the importance of a particular amino acid residue substitution in some cases.”

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the antigenicity of N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 using ferret and mice immune sera. Four antigenic groups were identified, which correlated with their respective phylogenic/ genetic groups. Among 102 amino acids differed by the 44 selected N2 proteins, the authors identified residues that differentiate the antigenicity of the four groups and constructed a machine-learning model that provides antigenic distance estimation. Three recent A(H3N2) vaccine strains were tested in the model but there was no experimental data to confirm the model prediction results.

      Strengths:

      This study used N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 and generated corresponding panels of ferret and mouse sera to react with the selected strains. The amount of experimental data for N2 antigenicity characterization is large enough for model building.

      Weaknesses:

      The main weakness is that the strategy of selecting 44 A(H3N2) viruses from 2009-2017 was not explained. It is not clear if they represent the overall genetic diversity of human A(H3N2) viruses circulating during this time. A comprehensive N2 phylogenetic tree of human A(H3N2) viruses from 2009-2017, with the selected 44 strains labeled in the tree, would be helpful to assess the representativeness of the strains included in the study.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method calculates MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization of in a phylogenetic tree, only 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 6.

      The second weakness is the use of double-immune ferret sera (post-infection plus immunization with recombinant NA protein) or mouse sera (immunized twice with recombinant NA protein) to characterize the antigenicity of the selected A(H3N2) viruses. Conventionally, NA antigenicity is characterized using ferret sera after a single infection. Repeated influenza exposure in ferrets has been shown to enhance antibody binding affinity and may affect the cross-reactivity to heterologous strains (PMID: 29672713). The increased cross-reactivity is supported by the NAI titers shown in Table S3, as many of the double immune ferret sera showed the highest reactivity not against its own homologous virus but to heterologous strains. Although the authors used the post-infection ferret sera to characterize 5 viruses (Figure 2, Figure Supplement 4), the patterns did not correlate well. If the authors repeat the NA antigenic analysis using the post-infection ferret sera with lower cross-reactivity, will the authors be able to identify more antigenic groups instead of 4 groups?

      This is a very valuable remark. In their paper, Kosikova et al. (CID 2018) report that repeated infection of ferrets with antigenically slightly different H3N2 viruses results in a broader anti-HA response, compared to a prime infection of an influenza naïve ferret, which results in a narrower anti-HA response. In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Author response image 7). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      Author response image 7.

      Correlation obtained on NAI data from ferrets at day 14 after infection vs data from day 42 after boost.

      Another weakness is that the authors used the newly constructed model to predict the antigenic distance of three recent A(H3N2) viruses but there is no experimental data to validate their prediction (eg. if these viruses are indeed antigenically deviating from group 2 strains as concluded by the authors).

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 8 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      Author response image 8.

      Antigenic distances from Swe17 and HK17 calculated using the random forest algorithm that was constructed without experimental data from Swe17 and HK17. The predicted distances were plotted side by side to the experimental distances in (a) and correlations are shown in (b).

      Reviewer #3 (Public Review):

      Summary:

      This paper by Portela Catani et al examines the antigenic relationships (measured using monotypic ferret and mouse sera) across a panel of N2 genes from the past 14 years, along with the underlying sequence differences and phylogenetic relationships. This is a highly significant topic given the recent increased appreciation of the importance of NA as a vaccine target, and the relative lack of information about NA antigenic evolution compared with what is known about HA. Thus, these data will be of interest to those studying the antigenic evolution of influenza viruses. The methods used are generally quite sound, though there are a few addressable concerns that limit the confidence with which conclusions can be drawn from the data/analyses.

      Strengths:

      • The significance of the work, and the (general) soundness of the methods.

      • Explicit comparison of results obtained with mouse and ferret sera.

      Weaknesses:

      • Approach for assessing the influence of individual polymorphisms on antigenicity does not account for the potential effects of epistasis.

      Indeed, possible epistatic effects or individual polymorphisms were not assessed, which is limited by the nature of the panel of N2s selected in the study. We now emphasize this in the discussion as follows:

      “Also, our modelling does not consider that substitution by different amino acids can have distinct impact on antigenic distance. As a consequence, predictions based on the model could underestimate the importance of a particular amino acid residue substitution in some cases.”

      • Machine learning analyses were neither experimentally validated nor shown to be better than simple, phylogenetic-based inference.

      This is a valid remark and indeed we have found a clear correlation between NAI cross reactivity and phylogenetic relatedness. However, besides achieving good prediction of the experimental data (as shown in Figure 5 and in FigureR7), machine Learning analysis has the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. ML can also support the selection and design of broader reactive antigens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major corrections

      No major corrections, beyond the issues I touched on in the public review, for which I give a little more detail below:

      Point 2. If there's not a putative genetic basis for the unexpected clustering seen in the NAI, then reiterating a small subset of the data would show the reliability of the experimental methods and substantiate this unexpected finding.

      We thank the reviewer for this pertinent point and suggestion. We have modified our analysis by reiterating individual ferret data normalized with the homologous ELISA titers. This reiteration is shown in figure R1b. In this case both Kan17 and Wis15 are switched to antigenic group 2. The profile of sera inhibition against those 2 strains that shift from antigenic cluster 1 to 2, is clearly an intermediate between profiles observed in those 2 groups. Considering that antigenic evolution occurs gradually, it is not unexpected that those intermediate profiles would swing from one side to another when pushed to forced discrimination. Antigenic cartography mapping, as in Smith et al. (2004), also indicated that those H6N2s are located closer to G1 than overall antigens from G2. Raw data distribution (max and min EC50) also do not indicate potential bias in analysis.

      Point 5. If you want to use antigenic cartography (Smith et al 2004), there is the R CRAN package (https://CRAN.R-project.org/package=Racmacs) which can handle threshold titres (like <20) and has functions for the diagnostic tools I describe, in order to quality assure the resulting plot. It does use a different antigenic distance metric than the paper currently uses, so you might not want to take that route.

      Thank you for this suggestion. We have performed antigenic cartography using the methodology described by Smith et al made accessible by Sam Wilks. The outcome of this analysis has been added to the manuscript as Figure 2 – Figure supplement 3.

      Point 6. More robust measures of antigenic distance take into account the homologous titre, homologous and heterologous titres (Archetti & Horsfall, 1950) or use the highest observed titre for a serum (Smith et al 2004). A limitation of the first two is that the antigenic distance can only be calculated when you have the homologous titre, which will limit you as you only have this for 26/43 sera. They may give similar results to your average antigenic distance, in which case your analysis still stands. Calculating antigenic distance using the homologous or maximum titre only gives the antigenic distance between the antigen and the serum. If you want the distance between all the sera, then further analysis is required (making an antigenic map and outputting the serum-serum distances, see the point above).

      We thank the reviewer for these suggestions. A complete set of 43 H6N2 viruses that matches all 43 sera would have been ideal. This would require the generation of 17 additional H6N2 viruses and their testing in ELLA, a significant amount of work in terms of time and resources. Instead, we have generated an antigenic map of the 27 antigens and homologous sera (cfr. our response to point 5 above). Despite different methods the outcome showing 4 major antigenic groups is consistent.

      Minor corrections

      Table S1

      A/New_Castle/67/2016 should be A/Newcastle/67/2016

      A/Gambia/2012 is not the full virus name

      Corrected.

      Table S3 has multiple values of exactly 10.0. I think these should be <20 as they are below the threshold of detection for the assay.

      All the values lower than 20 in Table S3 were replaced by “< 20”.

      Line 376: A/Sidney/5/1997 should be A/Sydney/5/1997

      Corrected.

      Line 338: "25 randomly sampled data" is a bit vague, "25 randomly sampled features" would be better

      Corrected.

      Include RMSE of the random forest model.

      RMSE=19.6 RMSE/mean = 0.207 is now mentioned in the manuscript.

      Figure 5 - supplement 1: These plots are difficult to interpret as the aspect ratio is not 1:1, and panels a & b are difficult to compare as they have not been aligned (using a Procrustes analysis). It would be neater if they were labelled with short names.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Line 562: 98 variable residues, where it is 102 elsewhere in the text.

      There are 4 mutations near the end of the NA stalk domain, which are not resolved in the N2 structure. Therefore, amino acid distances to these residues cannot be calculated.

      No data availability statement. Some of the raw data is available in Table S3 and there is no link to the code.

      The data and code used for generation of rf modelling was uploaded to Github and made available. The following statement has been added to the manuscript: “The data and code used for the generation of the rf model is available at https://github.com/SaelensLAB/RF..”

      Reviewer #2 (Recommendations For The Authors):

      (1) More than 42,000 NA sequences are available for the mentioned period on GISAID, it is therefore important to understand the selection criteria for the 44 strains and if these strains represent the overall genetic diversity of N2 of human A(H3N2) viruses. To demonstrate the representativeness of the 44 selected strains, please construct a representative N2 phylogenetic tree for human A(H3N2) viruses circulated in 2009-2017 and label the 44 selected strains on the tree.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method uses MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization tree only of 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 9.

      (2) Double immune ferret sera may increase antibody binding affinity and cross-reactivity against heterologous strains. Using single-infection ferret sera may yield different antigenic grouping results (eg. may identify more antigenic groups). Can the authors repeat the NA antigenic grouping using single-infection ferret sera? Although data from a subset of 5 strains was presented (Figure 2, Figure Supplement 4), the information was not sufficient to support if the use of single-infection or double immune ferret sera will yield similar antigenic grouping results.

      In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Figure R6). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      (3) NA antigenicity data is presented in heat maps and the authors would often describe the heat map patterns matches without further explanations. Line 234-235, the heat map of mouse sera (Figure 2. Figure supplement 5) was described to match the results of ferret sera (Figure 2), but this tends to be subjective. A correlation analysis of 7 selected antigens showed a positive correlation, what about the other 37 antigens?

      The interpretation of heatmaps is indeed very subjective, for this reason the correlation of the 7 selected antigens was also provided. The other 37 antigens were not tested. Considering the results using post boost sera, a simulation of using random forest modeling indicate that the data from one antigen of each antigenic group is sufficient to achieve a reliable predictive output (R2=0.71) (Figure R3 of this rebuttal).

      (4) Can the authors explain in more detail how data in Figure 4a was generated? According to the authors, residues close to the catalytic pocket are more likely to impact NAI. Can the authors explain how they define if a residue is close to the catalytic pocket?

      The correlation of distances of amino acid residues with significance values is explained as follows. Consider 7 distinct elements that are distributed horizontally as shown by the squares in the figure below (Author response image 10a). The elements highlighted in yellow have a numerical propriety (in case of N2 neuraminidase this was the significance values obtained in the association study). Taking P1 as reference we can calculate the distance (red arrows) between P1 and P2, P4 and P7, those distances can them be correlated to intrinsic values of P2, P4 and P7, which enables the calculation of the correlation coefficient Tau. This same process is repeated for each position (or each amino acid), as a consequence every position will have a correlation coefficient calculated (Author response image 8b). This correlation coefficient can be represented as a heat map at the surface of N2.

      Author response image 10.

      The 2D scheme represents the strategy used to calculate the correlation (i.e. the Tau values) between distances and p-values. Tau values can then be presented in a heat map.

      (5) Can the authors provide experimental data using the three recent A(H3N2) viruses as antigens and perform NAI assay to confirm if they are antigenic all deviating from group 2 viruses?

      The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      (6) According to Ge et al. 2022 (PMID: 35387078), N2 NA's before 2014 (2007-2013) showed a 329-N-glycosylation and E344, and they were subsequently replaced by H3N2 viruses with E344K and 329 non-glycosylation changing the NI reactivity in ferret antisera towards later strains. Were these residues also predicted to be important to N2 antigenicity from your machine-learning method?

      Three of the N2 NAs used in our panel, A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in another 3 NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted in our modeling. However, the differences in NAI reported by Ge et al. are low (not even twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI. We have added the following to the discussion in our revised manuscript:

      “It has been reported that an N-glycosylation site at position 329 combined with E344 in NA from human H3N2 viruses from 2007 to 2013 was gradually lost in later H3N2 viruses (Ge et al., 2022). This loss of an N-glycosylation site at position 329 combined with an E344K substitution was associated with a change in NAI reactivity in ferret sera. Three N2 NAs in our panel, derived from A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in three other NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted by our modeling. However, the differences in NAI reported by Ge et al. are very modest (lower than twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI.”

      Reviewer #3 (Recommendations For The Authors):

      Specific suggestions:

      Line 132: Did the authors confirm the absence of compensatory mutations due to a heterologous H6 background that could potentially confound downstream NAI results?

      All NAs genes of the rescued H6N2 viruses were fully sequenced and were found to be identical to the expected NA sequences, with the only exception being the A/Tasmania/1018/2015 were a mixed population of wt and M467I was found. This substitution is located at the surface and at the top of the NA head domain, and thus could potentially impact NA antigenicity. However, A/Tasmania/1018/2015 H6N2s had a similar inhibition profile as other H6N2s in phylogenetic and antigenic group 1. This indicates that, at least in this mixed population, antigenicity was not drastically affected by the M467I substitution.

      Line 96: how do these data rule out variation in the fraction of properly folded protein across NAs? They certainly show that properly folded NA protein is present, but not whether amounts vary between the different NAs.

      SEC-MALS (size exclusion chromatography-Multiangle light scattering) data and enzymatic activity were considered as a proxy for correctly folded NA. Although the specific activity of the recombinant N2 NAs is expressed per mass unit (microgram), we cannot exclude that the fraction of properly folded protein across the different recombinant NAs may vary.

      Lines 262-269: this analysis approach (based on my reading) seems to consider each polymorphism in isolation and thus does not seem well suited for accounting for epistatic interactions within the NA. For example, the effect of a substitution on NAI may be contingent upon other alleles within NA that are not cleanly segregated between the two serum comparator groups. Can the authors address the potential of epistasis within NA to confound the results shown in Figure 3?

      Unfortunately, epistatic interactions cannot be solved using the panel of N2 selected for the study. This limitation is mentioned in our discussion:

      “It is important to highlight that co-occurring substitutions in our panel (the ones present in the main branches of the phylogenetic tree) cannot be individually assessed by association analysis or the random forest model. The individual weight of those mutation on NA drift thus remains to be experimentally demonstrated.”

      Line 331: is there a way to visualize and/or quantify how these two plots (F5 supplement 1a/b) reflect each other or not? Without this, it is hard to ascertain how they relate to each other.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Figure 4B structural images are not well labelled.

      The active site in 1 of the protomers is now indicated with an arrow in the top and side views of the NA tetramer.

      Lines 339-359: the ML predictions are just predictions and kind of meaningless without experimental validation of the predicted antigenic differences between recent NAs. This section would also be strengthened by an assessment of whether the ML approach obtains more accurate results than simply using phylogeny to predict antigenic relationships.

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in figure R7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively. A major advantage of antigenic modeling is the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. The support in selecting or designing broader reactive antigens is another advantage of machine learning analysis.

      Lines 416-421: appreciate the direct comparison of results obtained from ferrets versus mice.

      We thank the reviewer for expressing this appreciation.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths:

      The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing and provide new insights into the role of conserved side chains within the SLC18 members. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

      We thank reviewer #1 for their constructive comments and input which we feel has greatly improved the manuscript.

      Reviewer #2 (Public Review):

      This public review is the same review that was posted earlier and has not been updated in response to our comments or to the revised manuscript. Please see our earlier response to these comments. We thank reviewer #2 for their input and we have incorporated many of these suggestions into our revised manuscript. With regard to the question of ‘how TBZ got there’, we have revised this sentence in the discussion to be more speculative. As pointed out earlier, our interpretation of the structure is based on a wealth of experimental and structural data which support our interpretations. Thus, our conclusions have not been overstated. This has been explained in our earlier public response and these key studies have been cited throughout the manuscript. We also note that reviewer #3 found the AlphaFold comparisons to be quite meaningful.

      Overview:

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose involvement of these networks and hydrophobic residues in coupling of transport to proton translocation and conformational changes. However, these proposals are quite speculative in the absence of supporting structures and experimentation that would test specific mechanistic details.

      Critique:

      Although the structure presented in this MS is clearly important, I feel that the authors have overstated several of the conclusions that can be drawn from it. I don't agree that the structure clearly indicates why TBZ is a non-competitive inhibitor; the proposal that specific hydrophobic residues function as gates will depend on lumen- and cytoplasm-facing structures for verification; the polar networks could have any number of functions - indeed it would be surprising if they were all involved in proton transport. Several of these issues could be resolved by a clearer illustration of the data, but I believe that a more rigorous description of the conclusions and where they fall between firm findings and speculation would help the reader put the results in perspective.

      Non-competitive inhibition occurs when the action of an inhibitor can't be overcome by increasing substrate concentration. The structure shows TBZ sequestered in the central cavity with no access to either cytoplasm or lumen. The explanation of competitive vs non-competitive inhibition depends entirely on how TBZ got there. If it bound from the cytoplasm, cytoplasmic substrate should have been able to compete with TBZ and overcome the inhibition. If it bound from the lumen, or from within the bilayer, cytoplasmic substrate would not be able to compete, and inhibition would be non-competitive. The structure does not tell us how TBZ got there, only that it was eventually occluded from both aqueous compartments and the bilayer.

      The issue of how VMAT2 opens access to the central binding site from luminal and cytoplasmic sides is an important and interesting one, and comparison with other MFS structures in cytoplasmic-open or extracellular/luminal-open is a very reasonable approach. However, any conclusions for VMAT2 should be clearly indicated as speculative in the absence of comparable open structures of VMAT2. As a matter of presentation, I found the illustrations in ED Fig. 6 to be less helpful than they could have been. Specifically, illustrations that focus on the proposed gates, comparing that region of the new structure with the corresponding region of either VGLUT or GLUT4 would better help the reader to compare the position of the proposed gate residues with the corresponding region of the open structure. I realize that is the intended purpose of ED Fig. 6b and 6c, but currently, those show the entire protein and a focus on the gate regions might make the proposed gate movements clearer. I also appreciate the difference between the Alphafold prediction and the new structure, but I'm not convinced that ED Fig. 6a adds anything helpful.

      The polar networks described in the manuscript provide interesting possibilities for interactions with substrates and protons whose binding to VMAT2 must control conformational change. Aside from the description of these networks, there is little evidence presented to assess the role of these networks in transport. Are the networks conserved in other closely related transporters? How could the interaction of the networks with substrate or protons affect conformational change? Of course, any potential role proposed for the networks would be highly speculative at this point, and any discussion of their role should point out their speculative nature and the need for experimental verification. Some speculation, however, can be useful for focusing the field's attention on future directions. However, statements in the abstract (three distinct polar networks... play a role in proton transduction.) and the discussion (...are likely also involved in mediating proton transduction.) should be clearly presented as speculation until they are validated experimentally.

      The strongest aspect of this work (aside from the structure itself) is the analysis of TBZ binding. I will comment on some minor points below, but there is one problematic aspect to this analysis. The discussion on how TBZ stabilizes the occluded conformation of VMAT2 is premature without structures of apo-VMAT2 and possibly structures with other ligands bound. We don't really know at this point whether VMAT2 might be in the same occluded conformation in the absence of TBZ. Any statements regarding the effect of interactions between VMAT2 and TBZ depend on demonstrating that TBZ has a conformational effect. The same applies to the discussion of the role of W318 on conformation and to the loops proposed to "occlude the luminal side of the transporter" (line 131).

      The description of VMAT2 mechanism makes many assumptions that are based on studies with other MFS transporters. Rather than stating these assumptions as fact (VMAT2 functions by alternating access...), it would be preferable to explain why a reader should believe these assumptions. In general, this discussion presents conclusions as established facts rather than proposals that need to be tested experimentally.

      The MD simulations are not described well enough for a general reader. What is the significance of the different runs? ED Fig. 4d is not high enough resolution to see the details.

      Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of protein domains fused to its N- and C-terminal ends, including one fluorescent protein that facilitated protein screening and purification. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data, and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds. The authors provide additional biochemical analyses that further support their findings. The comparison with AlphaFold models is enlightening.

      We appreciate this summary and thank reviewer #3 for their helpful suggestions to improve the manuscript.

      Weaknesses:

      The authors follow up their structures with molecular dynamics simulations of the tetrabenazine-bound state, and test several protonation states of acidic residues in the binding pocket, but not all possible combinations; thus, it is not clear the extent to which tetrabenazine rearrangements observed in these simulations are meaningful. Additional simulations of the substrate dopamine docked into this structure were also carried out, although it is unclear whether this "dead-end" occluded state is a relevant state for dopamine binding. The authors report release of dopamine during these simulations, but it is notable that this only occurs when all four acidic binding site residues were protonated and when an enhanced sampling approach was applied.

      As an occluded neurotransmitter bound structure has yet to be solved experimentally, it is not possible to address whether this state resembles the docked dopamine structure. However, it is reasonable to hypothesize that this is a relevant state for dopamine binding and if so, these simulations would be of great interest. The MD simulations which were performed are logical, based on the calculated pKa of the residues and the known pH of the vesicle lumen (5.5). Note that we have carried out a total of more than 2 microseconds of simulations, which required a significant computing time/memory allocation for the current runs in explicit water and membrane. To investigate all possible combinations, it would require at least 16 independent simulations, to be performed in duplicates, to vary protonation status of the four highlighted acidic residues alone, not including proper experimental replicates. We do not believe this to be a feasible suggestion, nor necessary given that the selected combinations were based on rational evaluation of on-path amino acids that were assessed to be potentially protonated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor for organizing the review of our manuscript. We have carefully read and analyzed the reviewers’ comments, addressed each criticism point-by-point as outlined below, and modified the manuscript and figures accordingly. In this regard, we would also like to take the opportunity to thank both reviewers for their thoughtful suggestions for improvement of our manuscript. We believe that our manuscript has improved as a result, and hope that it is now suitable for publication.

      Public Reviews:

      Reviewer #1 (Public Review):

      Aiming at the problem that Staphylococcus aureus can cause apoptosis of macrophages, the author found and verified that drug (R)-DI-87 can inhibit mammalian deoxycytidine kinase (dCK), weaken the killing effect of staphylococcus aureus on macrophages, and reduce the apoptosis of macrophages. And increase the infiltration of macrophages to the abscess, thus weakening the damage of Staphylococcus aureus to the host. This work provides new insights and ideas for understanding the effects of Staphylococcus aureus infection on host immunity and discovering corresponding therapeutic interventions.

      The logic of the study is commendable, and the design is reasonable.

      Some data related to the conclusion of the paper need to be supplemented, and some experimental details need to be described.

      Response: We thank the reviewer for the positive feedback along with the detailed and knowledgeable analysis of this paper. Specific details and comments on all raised concerns can be found below.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Winstel and colleagues test if the deoxycytidine kinase inhibitor, (R)-DI-87 provides therapeutic benefit during infection with Staphylococcus aureus. The premise behind the current work is a series of prior studies that found that S. aureus can disable functional immune clearance by generating NET-derived deoxyribonucleosides to induce macrophage apoptosis via purine salvage. Here, the authors use in vitro and in vivo experiments with (R)-DI-87 to demonstrate that inhibition of deoxycytidine kinase prevents S. aureus-induced deoxyribonucleoside-mediated macrophage cell death, to bolster immune cell function and promote more effective clearance during infection. The authors conclude that (R)-DI-87 represents and potentially important Host-Directed Therapy (HDT) with good potential to promote natural clearance of infection without targeting the bacterium. Overall, the study represents an important next step in the exploration of purine salvage and deoxyribonucleoside toxicity as a targetable pathway to bolster infection clearance and provides early-stage evidence of the therapeutic potential of (R)-DI-87 during S. aureus infection.

      Response: We thank the reviewer for the thoughtful suggestions for improvement of our manuscript. Specific details and comments on all raised concerns can be found below.

      Strengths:

      The study has several strengths that support its conclusions:

      (1) Well-controlled in vitro studies that firmly establish (R)-DI-87 is capable of blocking deoxyribonucleoside-mediated apoptosis of immune cell lines and primary cells.

      (2) Solid evidence to support that administration of (R)-DI-87 can have therapeutic benefits during infection (reduced number of abscesses and reduced CFU).

      (3) Controls included to ascertain the degree to which (R)-DI-87 might have secondary effects on immune cell distribution.

      (4) Controls included to ascertain whether or not (R)-DI-87 has intrinsic antibacterial properties.

      Weaknesses:

      However, there are several important weaknesses related to the rigor of the research and the conclusions drawn. The most relevant weaknesses noted by this reviewer are:

      (1) Drawing firm conclusions about the therapeutic potential of (R)-DI-87 using only S. aureus strain Newman, a methicillin-susceptible S. aureus, that while a clinical isolate is not clearly representative of the strains of S. aureus causing infection in hospitals and communities. Newman also harbors an unusual mutation in a regulator that dramatically changes virulence factor gene expression. While the data with Newman remains valuable, the absence of consideration of other strains, including MRSA, makes it more difficult to support the relatively broad conclusions about therapeutic potential made by the authors.

      Response: We assume that this is a misunderstanding. S. aureus Newman is a patient-derived isolate and not a regulator mutant and/or laboratory strain (Duthie and Lorenz LL 1952, J Gen Microbiol 6(1-2), 95107). Its genome is fully sequenced (Baba et al. 2008, J Bacteriol 190(1):300-10) and it is highly virulent in mouse or human ex vivo models (e.g. Alonzo 3rd et al. 2013, Nature 493(7430):51-5.; DuMont et al. 2011, Mol Microbiol 79(3):814-25; Skaar et al. 2004, Science 305(5690):1626-8). Moreover, S. aureus Newman has served as a gold standard to study abscess formation in the past (e.g. Thammavongsa et al. 2013, Science 342(6160):863-6; Cheng et al. 2009, FASEB J 23(10):3393-404; Corbin et al. 2008, Science 319(5865):962-5) and has further also been used multiple times to test the therapeutic efficacy of antimicrobial or anti-infective agents in various animal models of infectious disease (e.g. Buckley et al. 2023, Cell Host Microbe 31(5):751-765.e11; Zhang et al. 2014, PNAS 111(37):13517-22; Richter et al. 2013, PNAS 110(9):3531-6). Apart from this, it is crucial to note that methicillin-sensitive isolates such as S. aureus Newman are typically more frequently isolated in hospitals as compared to MRSA. Specifically, public health system- and population-based surveillance studies clearly indicate that annual incidence rates for MSSA infections are dominant over those associated with MRSA infections (e.g. Gagliotti et al. 2021, Euro Surveill 26(46):2002094; Jackson et al. 2020, Clin Infect Dis 70(6):1021-1028; Laupland et al. 2013, Clin Microbiol Infect 19(5):465-71), even in groups at elevated risk (e.g. McMullan et al. 2016, JAMA Pediatr et al., 170(10):979-986; Ericson et al. 2015, JAMA Pediatr 169(12):1105-11). Although we understand and agree with the reviewer that certain MRSA clones can be a dominant cause of staphylococcal disease in specific geographic areas, we believe that S. aureus Newman adequately reflects staphylococcal isolates that cause the majority of infections in humans. In this regard, we would also like to highlight once more that (R)-DI-87 targets host dCK and not the bacterium. Accordingly, the antibiotic resistance status of S. aureus is not expected to impact our main findings and conclusions as (R)-DI-87 exclusively inhibits dCK, a key element of the mammalian purine salvage pathway.

      (2) In vitro (R)-DI-87 efficacy studies with dAdo and dGuo are strong, however, the authors do not test the in vitro efficacy of (R)-DI-87 using S. aureus. They have done this type of work in prior studies (See doi: 10.1073/pnas.1805622115 - Figure 5). If included it would greatly strengthen their argument that (R)-DI87 is directly affecting the S. aureus --> Nuclease --> AdsA macrophage-killing pathway. Without it, the evidence provided remains indirect, and several conclusions may be overstated.

      Response: We highly appreciate this comment and agree with the reviewer that such an experiment would support our main findings. Thus, we have performed additional experiments and took advantage of a previously described approach (Tantawy et al. 2022, Front Immunol 13:847171) to demonstrate that (R)DI-87-mediated inhibition of host dCK enhances macrophage survival upon treatment with culture media that had been conditioned by incubation with adsA-proficient or adsA-deficient staphylococci in the presence or absence of purine deoxyribonucleoside monophosphates. Our findings are described in the main text and in a new figure (Fig. 2K-L). Based on these new findings and together with our rAdsA-based approach (Fig. 2I-J), we are confident that (R)-DI-87 represents a suitable small molecule inhibitor of host dCK which can prevent host immune cell death induced by toxigenic products associated with the S. aureus Nuc/AdsA pathway.

      (3) Caspase-3 immunoblot experiments seem to suggest an alternative conclusion to what was made by the authors. They point out that Caspase-3 cleavage does not occur upon treatment with (R)-DI-87. However, the data seem to argue that there is almost no caspase-3 present in (R)-DI-87 treated cells (cleaved or uncleaved). Might this suggest that caspase-3 is not even produced when cells are not experiencing deoxyribonucleoside toxicity? Perhaps the authors could reconsider the interpretation of this data.

      Response: We believe that this is a misunderstanding. Our immunoblots (Fig. 3E-F) show only the processed forms of caspase-3. The antibody we have used can recognize full-length caspase-3 along with the p17 and p19 subunits that can result from cleavage. To clarify this point, we have slightly modified our main figure and provide the full immunoblots (Source data file) which clearly demonstrate that unprocessed caspase-3 (pro-caspase-3) is present in all samples. In this regard, we further note that caspase-3 can also form heterocomplexes with other proteins, presumably explaining some of the unknown bands in samples obtained from cells that have been exposed to death-effector deoxyribonucleosides. Additional bands are probably a result of cross-reactivity of the antibody and/or unspecific degradation of pro-caspase in cellular lysates.

      (4) There are some concerns over experimental rigor and clarity of the experimental design in the methods. The most important points noted by this reviewer are included here. (a.) There is no description of the number of replicates or representation of the Western blots and no uncropped blots are provided. (b.) the methods describing the treatment conditions for in vivo studies are not sufficiently clear. For example, it is hard to tell when (R)-DI-87 is first administered to mice. Is it immediately before the infection, immediately after, or at the same time? This has important implications for interpreting the results in terms of therapeutic potential. (c.) There are several statements made that (R)-DI-87 does not have a negative impact on the mice however, it is not sufficiently clear that the studies conducted are sufficient to make this broader claim that (R)-DI-87 has no impact on the animal, except as it relates to the distribution of immune cells, which is directly tested. (d.) there are no quantitative measures of apoptosis or macrophage infiltration, which impacts the rigor of these imaging experiments. (e.) only female mice are used in the in vivo studies. There is no justification provided for this choice; however, the rigor of the study design and the ability to draw conclusions about therapeutic potential is impacted in the absence of consideration of both sexes.

      Response: Thank you for raising these points here. (a) We have modified our figure legend and provide the full immunoblots (Source data file) in order to clarify this point. (b) Moreover, we now provide more experimental details on the treatment conditions that were used to administer (R)-DI-87 to mice (methods section). (c) Furthermore, we have conducted new experiments in order to demonstrate that administration of (R)-DI-87 has no impact on laboratory animals. Specifically, we provide new data along with additional text on organ cellularity following long-term exposure of mice to (R)-DI-87. In this regard, we have also applied our immuno-phenotyping approach to spleen tissues samples derived from mice that received (R)-DI-87 or vehicle. As outlined in our new results, neither developmental errors nor differences in lymphocyte development have been observed (new Fig. 4B-C; new supplementary Fig. 3). Together with our data on mouse body weight along with our immuno-phenotyping approach of blood cells (Fig. 4A and 4D) and the fact that (R)-DI-87 is extremely well tolerated in humans (personal communication; Kenneth A. Schultz, Trethera Corporation, Los Angeles, CA, USA), we are very confident that application of (R)-DI87 is safe and has no detrimental impact on the host. (d) Lastly, we would like to point out that due to the densely packed and extremely sticky cuff of immune cells within staphylococcal abscesses, it is technically not possible to extract enough abscess material required for a reliable quantification of apoptotic macrophages within infectious foci. Such an analysis would also not allow us to differentiate between lesion-infiltrating macrophages and macrophages that may reside at the periphery of the abscess. For these reasons, we have established a fluorescence microscopy-based approach to demonstrate increased macrophage infiltration rates into abscesses formed in organs of mice that have been treated with the dCK-specific inhibitor (R)-DI-87 (Fig. 5A-P). Nonetheless, we have slightly modified our figure and its legend in order to help the readership to localize S. aureus-derived tissue lesions and the periphery of abscesses in these images. (e) Finally, publicly available databases indicate that dCK is equally well expressed in various tissues in both sexes. Moreover, dCK is not encoded on a sex chromosome, neither in mice nor in humans. Thus, we believe that it is justified to test the in vivo efficacy of (R)-DI-87 in female mice. Nonetheless, we have conducted additional in vitro experiments to test whether (R)-DI-87 can protect male animal-derived BMDMs from death-effector deoxyribonucleosides in a manner similar to cells derived from female mice. As expected, we did not observe a sex-specific effect (new supplementary Fig. 5), and hope that this adequately addresses this point.

      (5) Animal studies show significant disease burden (CFU) even after administration of (R)-DI-87. Given the absence of robust clearance of infection, the author's claims read as an overstatement of the data. The authors may wish to reframe their conclusions to better highlight the potential benefit of this therapy at reducing severe disease but also to point out relevant limitations, especially considering that it does not lead to clearance in this model. In general, the consideration of the limitations of the proposed therapeutic approach, as uncovered by the data, is not present. A more nuanced consideration of the data and its interpretations, including both strengths and limitations, would greatly help to frame the study.

      Response: Thank you for raising this point here. To highlighting the limitations of our approach, we have modified several passages in the main text. Moreover, we have adjusted our discussion section accordingly.

      Reviewer #1 (Recommendations For The Authors):

      (1) In vivo experiments, the dose given to mice was 75mg/kg. How did the author determine the dose of this drug?

      Response: We thank the reviewer for this question, which gives us the chance to clarify this point. The experimental condition used to block host dCK in mice has been adopted from a previous publication (Chen et al. 2023, Immunology 168(1):152-169). To improve the overall quality of our current manuscript, we now included more background information addressing this point. Specifically, we have added additional in vivo and biochemical data along with more conclusive text to our results section to better explain the reason for the dose given to mice (new Fig. 4E).

      (2) The author established a mouse model of Staphylococcus aureus blood infection in vivo and divided four groups for related experiments. It is suggested that the authors should supplement the survival rate of mice in each group so that readers can understand the effect of the drug on the survival of mice with bloodstream infection.

      Response: While this is an interesting suggestion by the reviewer, we believe that this is beyond the scope of our study. In particular, the current study focused on analyzing the capacity of the dCK-specific inhibitor (R)-DI-87 to improve macrophage survival during staphylococcal abscess formation in an effort to lower bacterial loads in infected organ tissues. However, we agree with the reviewer that (R)-DI-87 might also help to improve further clinical syndromes of staphylococcal infections, including lethal bloodstream infection. We therefore modified parts of our discussion to address this point.

      (3) In the in vivo experiment, the author administered the drug by intragastric administration, but the treatment was for the bloodstream infection of Staphylococcus aureus, so the author needed to determine the actual effective concentration of the drug in the blood of mice.

      Response: We thank the reviewer for this comment and agree that inclusion of more background information and data would be a valuable addition to our manuscript. As outlined above, we have designed our in vivo experiments based on the methodology of a previous publication (Chen et al. 2023, Immunology 168(1):152-169). Similar to Chen and colleagues, we have also used a dose of 75 mg/kg of (R)-DI-87 that allows complete inhibition of host dCK in vivo. In this regard, we have now performed additional in vivo experiments to address this point. More precisely, we took advantage of a highly sensitive and LC-MS/MSbased method to measure accumulation of deoxycytidine, the natural substrate of host dCK, in mouse plasma upon administration of the dCK-specific inhibitor. As shown in our new Fig. 4E, administration of (R)-DI-87 at a dose of 75 mg/kg strongly increased deoxycytidine levels in mouse plasma thereby indicating that host dCK activity is completely blocked under these experimental conditions.

      (5) This work is to reduce the apoptosis of macrophages through drug inhibition of dck, but not directly inhibit the related virulence of Staphylococcus aureus. Therefore, it is suggested that the author modify the title to summarize the whole paper more accurately.

      Response: We agree with the reviewer that our manuscript’s title might be a bit misleading as (R)-DI-87 does not directly target the bacterium or staphylococcal virulence factors. Thus, we have modified the title of our revised manuscript to: “Targeting host deoxycytidine kinase mitigates Staphylococcus aureus abscess formation”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Adelus and colleagues investigates the snRNA sequencing of endothelial cells isolated from deceased heart donor aortic trimmings. From n=6 donors, the authors have identified 5 distinct endothelial cell (EC) populations. The expression levels of a set of genes are different among the different donors and different EC clusters. Furthermore, treatment with IL1B, TGFB, or ERGsi decreased the proportion of some of these clusters and increased others, with some migratory and ECM-producing capacity. Another interesting observation in this study is that IL-1B alone induces a shift in the clusters and that is different from the TGFB-induced cells. However, ex vivo analyses showed most of the TGFB-induced population matched the in vitro observations. Another interesting finding of the work is that the authors detected SNPs linked to chromatin accessibility to the set of genes identified within these EC populations.

      Strengths:

      Overall, the work is intriguing and has some novel aspects to it, especially the link between ECderived EndMT in culture and comparing that with ex vivo atherosclerotic samples.

      In summary, we thank we thank Reviewer #1 for raising questions that prompted new speculations and clarifications of our data. We hope this Reviewer will now find our revised manuscript suitable for publication.

      Weaknesses:

      The experiments are lacking in controls, the purity of the isolation, and the use of multiple donors (deceased hearts) to draw conclusions. The lack of validation of the work is a concern.

      We thank Reviewer #1 for raising these concerns. Controls were not available in the public in vivo data, likely due to the systemic nature of coronary artery disease (CAD) and the logistical difficulty in obtaining arterial samples from healthy participants. With respect to our in vitro data, controls were included in the design. We agree that it is critical to validate functions of endothelial cell (EC) populations with functional studies, and this is the subject of ongoing and future work. Regarding asymmetry of donors, we aimed to have at least three replicate donors per condition. In the study design, we had to load genetically different donors per 10x lane, which is why we utilized different donors for each condition. We address the purity of isolation in our response to Reviewer #2 below.

      Reviewer #2 (Public Review):

      This study by Adelus et al. profiled the transcriptome and chromatin accessibility in cultured human aortic endothelial cells (ECs) at single-cell resolution. They also stimulated these cells with EC-activating agents, such as IL1b, TGFB2, or si-EGR, to knock down this master transcription factor in ECs. The results show a subpopulation, EC3, with the highest plasticity and sensitivity to perturbations. The authors also reviewed and meta-analyzed three independent publicly available scRNA-seq datasets, identifying two distinct EC subpopulations. Additionally, they aligned CAD-related SNPs with open chromatin regions in EC subpopulations. This study provides fundamental evidence to enrich our understanding of vascular ECs and highlights potential subpopulations that may contribute to health and diseases. The work exhibits the potential impact in the field. While the manuscript is comprehensive, there are some concerns that should be addressed.

      (1) My major concern is whether EC4 is derived from ECs. It seems that EC4 showed a lesser reaction to those perturbations and had lower expression levels of EC marker genes. Did the authors evaluate the purity of their isolated HAECs? Please discuss the potential cell lineage mapping of EC4.

      We thank Reviewer #2 for raising the question on the purity of isolation. We have now included this in the Discussion:

      “A major question raised by this work is the origin of cells in the mesenchymal cluster EC4. We originally hypothesized this cluster was the result of EndMT, which led to our investigations as to whether we could leverage EndMT-promoting exposures (IL1B, TGFB2, siERG) in vitro observe an expansion of treated cells in the EC4 population. To our surprise, the EC4 population did not expand. If anything, these exposures reduced the proportion of cells in ECs (Figure 4). Nonetheless, it remains a possibility that EC4 represents cells that had undergone EndMT in vivo prior to culture and that the exposures we presented in vitro were not sufficient to elicit a complete EndMT transition. Another viable hypothesis is that cells in EC4 are of SMC origin and have persisted in culture alongside their EC counterparts. Cells used in this study were isolated by luminal collagenase digestion of explanted aortic segments and were tested at early passage for EC phenotypic markers including VWF expression, cobblestone morphology, and uptake of acetylated LDL. Notably, these rigorous metrics to ensure pure EC isolation occurred prior to our group’s studies. In addition, if some of the isolated cells had undergone EndMT in vivo prior to isolation, it would be nearly impossible to distinguish their cell of origin after isolation since their collective molecular phenotypes would appear as an SMC. Without lineage tracing, which is currently not possible in human tissue explants, it would not be possible to distinguish cell origin. Nonetheless, this remains an important issue that is the subject of ongoing investigations. What we can confidently discern from these data is that these distinct cell subpopulations respond differently to the disease-relevant exposures of IL1B, TGFB2, and ERG depletion.”

      (2) Although all the donors are de-identified, is there any information about the severity of their vascular impairment, particularly in the case of patient 5, who exhibits the unique EC5?

      All donors are de-identified, and we only have access to their genotypes. We have now clarified this in Methods, “Tissue Procurement and Cell Culture”:” Primary HAECs were isolated from eight de-identified deceased heart donor aortic trimmings (belonging to three females and five males of Admixed Americans, European, and East Asian ancestries) at the University of California Los Angeles Hospital as described previously (42) (Table S7 in the Data Supplement). The only clinically relevant information collected for each donor was their genotype (Methods, “Genotyping and Multiplexing Cell Barcodes for Donor Identification”).”

      (3) The meta-analysis of the published datasets is comprehensive. The identified EC heterogeneity corresponds to their in vitro data. I am wondering, in terms of transcriptome, is there any similarity between endo1 and EC1/EC2, and also endo2 and EC3/EC4?

      This was addressed in Results, “Ex Vivo-derived Module Score Analysis Reveals Differences among In Vitro EC Subtypes and EndMT Stimuli”: “Cells scoring high for Endo1 are concentrated in the in vitro EC1 cluster, while cells scoring high in Endo2 are concentrated to the in vitro EC3 locale (Figure S7B-E in the Data Supplement).”

      (4) The in vitro data indicates that EC3 shows the highest plasticity and sensitivity to perturbations, which may act as the major subtype of ECs responding to risk factors. It's very interesting that CAD-related SNPs do not seem to be enriched in EC3. Please discuss this discrepancy.

      We thank Reviewer #2 for bringing up this interesting point, which we have now included in our Discussion: “While EC3 was found to be more sensitive to perturbations in our in vitro experiments, we did not expect to see CAD-related SNPs enriched in EC3 because plasticity does not necessarily imply a pathological process. Moreover, while EC3 and EC4 both have mesenchymal phenotypes, EC3 may represent a reversible state that is lacking in EC4. This hypothesis would explain the enrichment of EC4, but not EC3, in CAD-related SNPs.”

      (5) The last sentence in the legend of Figure 1 seems incomplete: 'Module scores are generated for each cell barcode with Seurat function AddModuleScore().'

      We have made changes to this sentence so that it now reads: “Module scores are generated for each cell barcode with the Seurat function AddModuleScore().”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript by Adelus and colleagues investigates the snRNA sequencing of endothelial cells isolated from deceased heart donor aortic trimmings. From n=6 donors, the authors have identified 5 distinct endothelial cell (EC) populations. The expression levels of a set of genes are different among the different donors and different EC clusters. Furthermore, treatment with IL1B, TGFB, or ERGsi decreased the proportion of some of these clusters and increased others, with some migratory and ECM-producing capacity. Another interesting observation in this study is that IL-1B alone induces a shift in the clusters and that is different from the TGFB-induced cells. However ex vivo analyses showed most of the TGFB-induced population are the ones that matched the in vitro observations. Another interesting finding of the work is that the authors detected SNPs linked to chromatin accessibility to the set of genes identified within these EC populations. Overall, the work is intriguing and has some novel aspects to it, especially the link between EC-derived EndMT in culture and comparing that with ex vivo atherosclerotic samples. However, the experiments are lacking in controls, the purity of the isolation, and the use of multiple donors (deceased hearts) to draw conclusions. The lack of validations for the work is a huge concern. Additional major and minor concerns are:

      Major concerns:

      (1) Abstract: line 15: ECs are a major cell type in atherosclerosis progression - That is a bold statement: What about macrophages and VSMCs?

      We have made changes to this sentence so that it now reads: “Endothelial cells (ECs), macrophages, and vascular smooth muscle cells (VSMCs) are major cell types in atherosclerosis progression, and heterogeneity in EC sub-phenotypes are becoming increasingly appreciated.”

      (2) Methods: The cells were isolated from the deceased heart by a device? What kind of device? Is it a standard method, showing a figure or data suggesting the purity of the isolates. Also, the authors mentioned that they assessed EC function, but no single figure suggests that. Why were the cells treated with fibronectin?

      We thank Reviewer #1 for bringing this to our attention. We did not isolate and identify the cells ourselves. This was done in a prior study as described in reference 41. The only function of the device was to hold the aortic explanted tissue in place so the luminal surface of the ECs could be digested with collagenase. We have made edits to clarify these points in Methods, “Tissue Procurement and Cell Culture”: “HAECs were isolated from the luminal surface of the aortic trimmings using collagenase, and identified by Navab et al. using their typical cobblestone morphology, presence of Factor VIII-related antigen, and uptake of acetylated LDL labeled with 1,1’-dioctadecyl-1-3,3,3’,3’-tetramethyl-indo-carbocyan-ine perchlorate (Di-acyetl-LD) (42).”

      (3) Why did the authors elect to treat each donor cell with different treatment types and different concentrations, also why 1ng/ml of IL-1B?

      We have addressed the study design asymmetry above. We chose the treatments because we questioned whether HAECs responded heterogeneously to these stimuli. We were interested in using these stimuli, because they have previously been used in vitro to induce EndMT and/or inflammation, two major pathophysiological processes in CAD. This is outlined in the Introduction: “We also quantified single cell responses to three perturbations known to be important in EC biology and atherosclerosis. The first was activation of transforming growth factor beta (TGFB) signaling, which is a hallmark of phenotypic transition and a regulator of EC heterogeneity (20, 30). The second was stimulation with the pro-inflammatory cytokine interleukin-1 beta (IL1B), which has been shown to model inflammation and EndMT in vitro (31-35), and whose inhibition reduced adverse cardiovascular events in a large clinical trial (36). The third perturbation utilized in our study was knock-down of the ETS related gene (ERG), which encodes a transcription factor of critical importance for EC fate specification and homeostasis (37-41).”

      (4) The justification for comparing the EC population in ERGsi is unclear? This was detected as the highest in EC2 but EC2 is not the main cell type across the donors.

      We include a justification for comparing the EC populations with siERG in the Introduction:

      “There are notable benefits and limitations for studying heterogeneity using in vitro and in vivo approaches in atherosclerosis research. In vitro approaches provide unique opportunities for interrogating consequences of genetic and chemical perturbations in highly controlled environments and are adept at identifying mechanistic relationships on accelerated timelines.”

      …and…

      “We… quantified single cell responses to three perturbations known to be important in EC biology and atherosclerosis…The third perturbation utilized in our study was knock-down of the ETS related gene (ERG), which encodes a transcription factor of critical importance for EC fate specification and homeostasis (37-41).”

      Notably, we found the highest proportion of cells in EC3 with siERG, not EC2:

      The one cluster exhibiting increased proportions of cells upon EndMT perturbations was EC3, with 3 of 4 EC IL1B-exposed donors having increased proportions in EC3 (p = 0.08 by 2-sided paired t-test; Figure 3A), 4 of 5 TGFB2-exposed donors having increased proportions (p = 0.04 by 2-sided paired t-test; Figure 3A), and 3 of 3 donors having increased EC3 proportions upon ERG knock-down (Figure 3B).

      (5) The different proportions of clusters per donor and their responses are different. These donors are from deceased hearts, could the postmortem induce changes in the ECs? The presence of SMC pathways in their analysis may indicate SMC contamination within the isolation rather than EndMT?

      We have now included the possibility of postmortem effects in the Discussion:

      “We cannot exclude the possibility that EC3 is an EndMT cluster, although we would have expected more significant deviation from clusters EC1 and EC2. It is also possible that the postmortem could induce changes in the ECs, or that the duration and doses of perturbations chosen were not sufficient to elicit complete EndMT.”

      As aforementioned, we addressed the purity of isolation within the Discussion.

      (6) Figure 4A is confusing, what do the dots indicate and the intersection size mean? What is the difference between Figure 4 C and 4 E?

      We have added a description of rows and columns to the legend for Figure 4A:

      “(A), Upset plots of up- and down-regulated DEGs across EC subtypes with siERG (grey), IL1B (pink), and TGFB2 (blue). Upset plots visualize intersections between sets in a matrix, where the columns of the matrix correspond to the sets, and the rows correspond to the intersections. Intersection size represents the number of genes at each intersection.”

      Figure 4E depicts up- and down-regulated DEGs that are mutually exclusive and shared between IL1B and siERG in EC3, whereas Figure 4C depecits up- and down-regulated DEGs with IL1B alone compared to siSCR in EC2, EC3, and EC4. This is described within the legend for Figure 4C and Figure 4E:

      “C), PEA for EC2-4 up- and down-regulated DEGs with IL1B compared to control media… (E), PEA comparing up- and down-regulated DEGs that are mutually exclusive and shared between IL1B and siERG in EC3.”

      (7) VSMCS 5 in Figure 5 is interesting, but it could be contaminated with SMCs in your EC population and they are SMCs indeed with some mesenchymal transdifferentiation?

      As abovementioned, we addressed the purity of isolation within the Discussion.

      Minor concerns:

      (1) All growth supplements, kits, and reagents should be provided with their sources and catalogue numbers.

      Sources and catalogue numbers have now been added to the following Methods sections:

      “Tissue Procurement and Cell Culture”: “Cells were grown in culture in M-199 (ThermoFisher Scientific, Waltham, MA, MT-10-060-CV) supplemented with 1.2% sodium pyruvate (ThermoFisher Scientific, cat. no. 11360070), 1% 100X Pen Strep Glutamine (ThermoFisher Scientific, cat. no. 10378016), 20% fetal bovine serum (FBS, GE Healthcare, Hyclone, Pittsburgh, PA), 1.6% Endothelial Cell Growth Serum (Corning, Corning, NY, cat. no. 356006), 1.6% heparin, and 10µL/50 mL Amphotericin B (ThermoFisher Scientific, cat. no. 15290018). HAECs at low passage (passage 3-6) were treated prior to harvest every 2 days for 7 days with either 10 ng/mL TGFB2 (ThermoFisher Scientific, cat. no. 302B2002CF), IL1B (ThermoFisher Scientific, cat. no. 201LB005CF), or no additional protein, or two doses of small interfering RNA for ERG locus (siERG; Table S18 in the Data Supplement), or randomized siRNA (siSCR; Table S18 in the Data Supplement).”

      …and…

      “siRNA Knock-down, qPCR, and Western Blotting”: “Knockdown of ERG was performed as previously described (40) using 1 nM siRNA oligonucleotides in OptiMEM (ThermoFisher Scientific, cat. no. 11058021) with Lipofectamine 2000 (ThermoFisher Scientific, cat. no. 11668030).”

      (2) The quantification of western blot how?

      Methods, “siRNA Knock-down, qPCR, and Western Blotting” now reads: “Western blots were quantified using ImageJ (76).”

      (3) All the supplemental figures are listed incorrectly in the manuscript. For example, the authors refer to Figure S11B which should be S10. Please review the manuscript throughout to refer to the correct figures.

      We thank Reviewer #1 for bringing this to our attention. Figure S4 was missing, leading to incorrectly listed supplemental figures for Figures S4-S12. Figure S4 has now been included, and Figures S4-S12 are now listed correctly within the manuscript text.

      (4) Please refer to IL-1B as IL-1beta, same with TGFB.

      We have left the terms as is, since it is also routine to refer to IL-1beta as IL1B, and TGFbeta as TGFB.

      (5) here are typos throughout the manuscript, such as 4C, VW Fexpression, VWFand VCAM-1.

      We could not locate typos “VW Fexpression” or “VWFand VCAM-1”. We do not consider “4C” a typo, as it refers to the temperature at which the centrifuge was set to in Methods, “Nuclear Dissociation and Library Preparation”: “Samples were centrifuged at 500 rcf for 5 minutes at 4C…”

      (6) Please define the abbreviations: line 69 and also cite the source of the use of aSMA/PECAM1 as EndMT?

      We have now included abbreviation definitions and the cited source for ECs that co-express aSMA/PECAM-1 in atherosclerotic lesions within the Introduction: “These studies have described an unexpectedly large number of cells co-expressing pairs of endothelial and mesenchymal proteins, including fibroblast activating protein/von Willebrand factor (FAP/VWF), fibroblastspecific protein-1/VWF (FSP-1/VWF), FAP/platelet-endothelial cell adhesion molecule-1 (CD31 or PECAM-1), FSP-1/CD31 (20), phosphorylation of TGFB signaling intermediary SMAD2/FGF receptor 1 (p-SMAD2/FGFR1) (22), and α-smooth muscle actin (αSMA)/PECAM-1 (23).”

      (7) The changes in % cells in cluster per donor per condition in Figure 3 are interesting, have the authors observed a change of one cluster at the expense of another i.e. do they transdifferentiate into another with different treatments?

      Figure 3 shows that as percent of cells in EC3 go up with TGFB or IL1B, they go down in EC4 with these treatments. This has been added to the Discussion: “Moreover, as the percent of cells in EC3 go up with TGFB or IL1B, they go down in EC4, suggesting trans-differentiation from EC4 into EC3 with these perturbations.”

      (8) Functional analysis of these clusters with and without treatment is required to confirm the EndMT.

      We do not claim that the cells underwent EndMT. Rather, we use pro-EndMT perturbations previously described in the literature to test whether ECs respond heterogeneously to stimuli which are relevant to CAD. We found that EC subtype was a greater determinant of cell state than treatment.

      (9) No blank line at 266. The break is in the middle of the sentence, also cytoplasmic cytoplasmic ribosomal proteins (typo?).

      We have revised these sentences to read: “Shared IL1B- and siERG-upregulated genes were enriched in COVID-19 adverse outcome pathway (WP4891; p-value 1.9x10-9) (52). Shared IL1B- and siERG-attenuated genes are enriched in several processes involving ribosomal proteins, including ribosome, cytoplasmic (CORUM:306; p-value 3.3x10-7), cytoplasmic ribosomal proteins (WP477; p-value 5.3x10-7), and peptide chain elongation (R-HSA-156902; pvalue 5.9x10-7) (Figure 4E).”

      (10) The sentence in line 321 "These observations support ....of human, seems incomplete.

      We revised these sentences to read: “Expected pathway enrichments are observed for annotated cell types, including NABA CORE MATRISOME (M5884; p-value 4.8x10-41) for fibroblasts, blood vessel development (GO:0001568; p-value 5.6x10-33) for ECs, and actin cytoskeleton organization (GO:0030036; p-value 1.3x10-15) for VSMCs (Figure S5D-G in the Data Supplement). These observations support the diverse composition of human atherosclerotic lesions.”

      (11) What do the authors mean by (at least partially) line 444?

      We revised this sentence to read: “In fact, the limited correlation with ex vivo data supports this interpretation.”

      (12) Some unrelated data in the paper, like supplemental figure 10B and supplemental figure 11?

      These data are relevant to methods, and have been kept.

      Reviewer #2 (Recommendations For The Authors):

      We need this work to expand our knowledge of endothelial biology. Please address my concerns to further strengthen this work.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Revised manuscript

      The authors have addressed most of my points, but I still have one outstanding concern about the statistics:

      My Original Question:

      I have a few concerns and questions that I would like to see addressed: 1) Figure 1L - the statistics are a little unusual here as the errors are across visual areas rather than across mice or hemispheres. This isn't ideal as ideally, we want to generalize the results across animals, not areas, and the results seem to be driven mostly by V1/RSC. I would like to see comparisons using mice as the statistical unit either in an ANOVA with areas as factors or post-hoc comparisons per area.

      Author Reply:

      Based on the assumption that visual cortex should respond to visual stimuli, we would have expected to find a difference between closed and open loop locomotion onset responses in all cell types in visual areas of cortex (a closed loop locomotion onset being the combination of locomotion and visual flow onset, while an open loop locomotion onset lacks the visual flow component). Thus, the first surprise was that in most cell types we found very little difference between these two locomotion onset types. Conversely, in Tlx3-positive L5 IT neurons the difference was apparent well outside of the visual areas of cortex (even though the difference was indeed strongest in V1/RSC). To quantify the extent to which closed and open loop locomotion onsets result in different activity patterns across dorsal cortex we performed the analyses shown in Figures 1L and 2. To make the point that the effect was observable on average across cortical areas, we used cortical area as a unit in Figure 1L. We have added the analysis shown in Figure 1L with mice as the statistical unit as Figure S4J and have added the ANOVA information to Table S1, as suggested.

      My revised question:

      The authors have only partially addressed my concerns here. I disagree with the authors that they were making a point about the effect being observable across visual areas. The primary statistical statement they are trying to make is that the similarity between open and closed-loop stimulation is different for Tlx mice, e.g. Line 122: "However, comparing locomotion onsets in mice that expressed GCaMP6 only in Tlx3 positive L5 IT neurons, we found that the activation pattern was strikingly different between closed and open loop conditions" and Line 172-3: "Thus, excitatory neurons of deep cortical layers exhibited the strongest differences between closed and open loop locomotion related activation". These statements are not correctly supported by the statistical analysis as presented in Figure 1L as it is the variability across mice that is relevant to draw this conclusion.

      In the example "However, comparing locomotion onsets in mice that expressed GCaMP6 only in Tlx3 positive L5 IT neurons, we found that the activation pattern was strikingly different between closed and open loop conditions (Figure 1D)" we talk about the example mouse shown. We have not changed phrasing here.

      We have, however, changed the way we talk about Figure 1L and S4J (the second example given by the reviewer), and have rephrased much of this paragraph. Please note, we have also changed Figure S4J to quantify the difference only for V1.

      This is partially addressed by Figure S4J where the authors show standard-errors across mice and report statistics across mice. In Table S1 the statistical test is reported to be a bootstrap test with mice as the statistical unit, however, according to line 985 this was a non-hierarchical bootstrap test. Does this mean that the authors resampled onsets without regard to which mouse they came from to regenerate the response-curves and recalculate the correlation coefficient? Or did they directly resample from the distribution of correlation coefficient values? I suspect the latter, but for some comparisons (e.g. Tlx3 vs PV) there are only two mice in one group, yielding two correlation coefficients, and resampling 2 values 10,000 times would lead to very biased statistics. Either way the approach is far from ideal. There is also no protection against multiple-comparisons in these tests.

      We have adapted Figure S4J to include only V1, where we find the largest effect (the text is adapted to reflect this) and have added individual data points as suggested in the following comment. The reviewer is correct that we created a bootstrap distribution by resampling correlation values. This means we are resampling 2, 3, 4, 6, 7, or 14 values depending on the comparison. This should now be clearer in the text. We agree that this is not ideal, but when using mice as a statistical unit, analysis is almost always underpowered. To the best of our knowledge, bootstrap resampling is the best approach to alleviate this problem. Regarding the concern for multiple comparisons: We have now adjusted the significance threshold in Figures 1L and S4J by dividing through the number of groups (here: 9 genotypes).

      The ANOVA reported in Table S1 for Figure S4J isn't described in the methods so I can't say what they did and it doesn't seem to be referred to in the text and is non-significant in any case. Figure S4J also only shows summary statistics whereas individual mice should be plotted. The correct statistical test is either a one-way ANOVA with one factor (genotype) with post-hoc tests between the Tlx3 genotype and the others with suitable multiple-comparisons corrections (this may be the non-significant test in table S1). Alternatively, a linear mixed effects model with Genotype as a fixed effect and Mouse as a random intercept term. This approach is more powerful as it would allow them to use data from all locomotion onsets, but it may struggle to fit datasets with only 2 members for certain genotypes. If they wish to make the more extended point that the pattern across visual areas differs between Tlx3 and other mice they could include 'Area' as another (fixed) factor in the design and look for an interaction with Genotype.

      The ANOVA was indeed a one-way ANOVA with one factor. We have added this information to the methods. As suggested, we have added individual data points to Figure S4J.

      I also agree with the other reviewers that the presentation of standard-errors in Figures 1F-K and elsewhere is somewhat misleading as these are s.e.m. across onsets without taking into account the hierarchical nature of the data. Across mice s.e.m. would give a more accurate view of the variability in the data across the population. I also understand that first averaging across onsets within mice before taking a grand-average throws away a lot of data and s.e.m.s will be considerably larger. The authors should consider linear mixed effects models as an optimal solution for estimating s.e.m. If this is not feasible then the authors could consider showing data from individual mice in a supplementary figure or at least reporting the number of onsets that came from each mouse.

      We have now changed all plots in which we show time course data of widefield calcium imaging to show a hierarchical bootstrap estimate of mean and 90% confidence interval of the mean estimate.

      Reviewer #2 (Recommendations For The Authors):

      Congratulations to the authors on the revision! The revised article has substantially improved, and I have no further comments. I am particularly reassured by the new hierarchical bootstrap analyses as well as by the new analysis with mouse as a statistical unit that reproduces the key finding from the analyses with region as a statistical unit. Moreover, the authors added a vehicle control condition which does not yield any results. Therefore, I have no further methodological concerns and removed my mention of this previous weakness from my public review. Also, the readability of the manuscript has much improved in the revised version. Congratulations again on this important work!

      We thank the reviewer for the help in improving the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Comments on rebuttal:

      (1) It is greatly appreciated that the authors have improved aspects of their statistics, I have revised my comments accordingly.

      We are happy to hear.

      (2) However, I should clarify my comments regarding statistical concerns were not merely pertaining to a given Figure (e.g. Figure 1) I was only using it as an example. The authors have redone aspects of their analysis using N = number of mice (for statistics/trace figures), but is there a reason they cannot do this for other problematic figures/traces in the manuscript?

      Prompted also by reviewer 1, we have changed all time course plots in the manuscript to show a hierarchical bootstrap estimate of mean and 90% confidence interval of mean.

      Using mice as a statistical unit throughout the manuscript unfortunately is not viable in most cases, as we simply do not have enough mice in our dataset and statistical tests based on mice would be underpowered. The manuscript currently contains data from 77 mice, and we would likely need multiples of that to do statistics over mice.

      For Figure 1 - I do take the point why regions are being used as the independent N (though the authors justification should be made more clearly in the manuscript) making an N of 12 (though I am less clear why the same region across 2 hemispheres is counted as 2 Ns instead of 1; are they really independent?). However, I am less clear as to the choice in N in other figures. Could the authors clarify this more explicitly in the manuscript.

      We use regions as a statistical unit in Figures 6 and 7, S6-S8. Regarding the independence of hemispheres, this depends on cell type and region. E.g. activity in left V1 exhibits a higher correlation with activity left V2am than with right V2 (see Figure 5). On average callosal pairs exhibit correlation levels comparable to near cortical neighbors. See also, other work on the topic, for example (Calhoun et al., 2023).

      Regarding choice of N in other figures, this is either “recording session” or “pairs of regions”. We have made this clearer in the figure legends. In the case of testing using recording sessions, the idea is that each recording session constitutes a measurement. Measurements in the same mouse are not independent, and hence we use hierarchical bootstrap for all testing on recording sessions. The choice of “pairs of regions” for the correlation analysis follows from the use of regions as a statistical unit.

      (3) Regarding using N = locomotion onsets (or other definitions other than N = mice) when deriving trace averages/SEMs (for example, as in Figure 1) is visually misleading for the reader as it masks the true variability of the data, and even more misleading given that the authors do necessarily use that definition of N in their statistical tests associated with the data (as the authors commented). Whilst the authors have shown some traces with N=mice for some data, is there a reason they cannot do this for all figures in the manuscript? At the very least the practice of using other definitions of N for the purpose of showing trace averages/SEMs should be justified in the MS.

      We have replaced all time course plots that used SEM over events (for example locomotion onsets or visual stimuli) with a hierarchical bootstrap estimate of mean and 90% confidence interval of the mean throughout the manuscript. See also response to comment 2 above, and to reviewer 1, comment 4.

      References

      Calhoun, G., Chen, C.-T., Kanold, P.O., 2023. Bilateral widefield calcium imaging reveals circuit asymmetries and lateralized functional activation of the mouse auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 120, e2219340120. https://doi.org/10.1073/pnas.2219340120

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The modeling approaches are very sophisticated, and clearly demonstrate the selective nature of acute ketamine to reduce the impact of trial losses on subsequent performance, relative to neutral or gain outcomes. The authors then, not unreasonably, suggest that this effect is important in the context of the negative bias in interpreting events that is prominent in depression, in that if ketamine reduces the ability of negative outcomes to alter behavior, this may be a mechanism for its rapid acting antidepressant effects.

      However, there is a very strong assumption in this regard, as shown by the first sentence of the discussion which implies this is a systematic study of ketamine's acute antidepressant effects. In actuality, this is a study of the acute effects of ketamine on reinforcement learning (RL) modeled parameters. A primary concern here is that an effect presented as a "robust antidepressant-like behavioral effect" should be more enduring than just an alteration during the acute administration. As it is, the link to an "anti-depressant effect" is based solely on the selective effects on losses. This is not to say this is not an interesting observation, worthy of exploration. It is noted that a similar lack of enduring effects on outcome evaluation is observed in humans, as shown in supplemental fig. S4, but there is not accompanying citation for the human work.

      We agree with the reviewer that the way we linked the study results to ketamine’s antidepressant action can be misleading and based on a rather strong assumption which was not systematically tested in the study. We made the following changes to the manuscript:

      (1) These results constitute a rare report of a robust antidepressant-like behavioral effect produced by therapeutic doses of ketamine during acute phase (<1 hour) after injection (Introduction, 3rd paragraph, line 8-9 in the original manuscript).

      Changed to: These results constitute a rare report of an acute effect of therapeutic dose of ketamine on the processing of affectively negative events during dynamic decision-making.

      (2) We clarified in the Discussion that our study is to gain insights into, but not a systematic investigation of ketamine’s antidepressant action as follows:

      (2.1) A sentence was added (1st paragraph of Discussion): Using a token-based decision task and extensive computational modeling, we examined the behavioral modulation induced by therapeutic doses of ketamine to gain insights into possible early signs of ketamine’s antidepressant activity.

      (2.2) Consistent with the findings from humans, ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4) (Discussion, 2nd paragraph, line 6-7 in the original manuscript).

      Changed to: While ketamine’s antidepressant effect is reported to be sustained over a week of period (5), ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4). This discrepancy might be attributable to the possible differences in the state of brain network between healthy subjects and those with depression as well as the type of measures taken to assess ketamine’s effect.

      (2.3) A sentence was added (Discussion, last sentence of the 2nd paragraph) : Nevertheless, systematic studies are required to understand whether the reduced aversiveness to loss in our task might share the same mechanisms that underlie ketamine’s antidepressant action.

      One question that comes to mind in terms of the selectivity observed is whether similar work has been done to examine the acute effects of any other drugs. If ketamine is unique in this regard, that would be quite interesting.

      We think this is an interesting idea. However, comparing ketamine’s effect to that of other drugs is not the scope of the current study. We hope that we will be able to answer this question with future studies.

      Reviewer #2 (Public Review):

      Oemisch and Seo set out to examine the effects of low-dose ketamine on reinforcement learning, with the idea that alterations in reinforcement learning and/or motivation might inform our understanding of what alterations co-occur with potential antidepressant effects. Macaques performed a reinforced/punished matching pennies task while under effects of saline or ketamine administration and the data were fit to a series of reinforcement learning models to determine which model described behavior under saline most closely and then what parameters of this best-fitting model were altered by ketamine. They found a mixed effect, with two out of three macaques primarily exhibiting an effect of ketamine on processing of losses and one out of three macaques exhibiting an effect of ketamine on processing of losses and perseveration. They found that these effects of ketamine appeared to be dissociable from the nystagmus effects of the ketamine.

      The findings are novel and the data suggesting that ketamine is primarily having its effects on processing of losses (under the procedures used) are solid. However, it is unclear whether the connection between processing of losses and the antidepressant effects of ketamine is justified and the current findings may be more useful for those studying reinforcement learning than those studying depression and antidepressant effects. In addition, the co-occurrence of different behavioral procedures with different patterns of ketamine effects, with one macaque tested with different parameters than the other two exhibiting effects of ketamine that were best fit with a different model than the other two macaques, suggests that there may be difficulty in generalizing these findings to reinforcement learning more generally.

      (1) First, the authors should be more explicit and careful in the connection they are trying to make about the link between loss processing and depression. The authors call their effect a "robust antidepressant-like behavioral effect" but there are no references to support this or discussion of how the altered loss processing would relate directly to the antidepressant effects.

      We agree with the reviewer’s point on the way we made the connection between the study results and ketamine’s antidepressant action. This concern overlaps with the reviewer #1’s concern. Please refer to our response 2, 2-1, 2-2 and 2-3.

      (2) It appears that the monkey P was given smaller rewards and punishers than the other two monkeys and this monkey had an effect of ketamine on perseveration that was not observed in the other two monkeys. Is this believed to be due to the different task, or was this animal given a different task because of some behavioral differences that preceded the experiment? The authors should also discuss what these differences may mean for the generality of their findings. For example, might there be some set of parameters where ketamine would only alter perseveration and not processing of losses?

      Although the best-fitting ketamine model for monkey P includes an additional element – perseveration, we believe that monkey P’s baseline behavior and ketamine’s effect are not significantly different from the other two monkeys for the following reasons.

      First, monkey P was the first animal that we tested ketamine’s effect, and therefore we aimed to match the other two monkeys’ baseline behavior similar to monkey P’s behavior in order to reduce variability in ketamine’s effect potentially attributable to the difference in baseline behavior before pharmacological manipulation. We had to adjust the payoff matrix for the subsequent animals (Y and B) because these monkeys were more sensitive to loss, and seldom chose “risky” target (yielding loss). In order to make the other two monkeys’ behavior similar to that of monkey P, we adjusted the asymmetry between the risky and the safe target in the way that loss (neutral) outcome occurred from the safe (risky) target as well. Eventually, this adjustment made the baseline behavior similar across all three monkeys. The goal of the study was to reliably measure the ketamine’s effect, and not to study individual differences that can naturally occur with the same task parameters. Therefore, we believe that the adjustment of payoff matrix helped to reliably detect ketamine’s effect starting from the common baseline behavior.

      Second, the best-fitting model for monkey P (K-model 7) and that for the other two monkeys (K-model 4) make very similar predictions both qualitatively and quantitatively as are seen in the revised Figure 4. The parameters for outcome values estimated from these two models in monkey P are very similar as is seen in the revised Table 3. In addition, the difference in BIC between the model which includes only perseveration modulation (K-model 6) and the model incorporating outcome value modulation as well (K-model 7) is 441, whereas the difference in BIC between K-model 7 and the model that includes only outcome value modulation (K-model 4) is as small as 4. These BIC results indicate that the variability explained by ketamine’s modulation of outcome evaluation is remarkably larger that that explained by its modulation of perseveration in monkey P.

      Therefore, we conclude that ketamine’s effect was not significantly different between monkey P and the other two monkeys. We clarified this in the revised manuscript by adding the following paragraph in the Result section:

      “Unlike monkey Y and B, the best-fitting model for monkey P indicated that ketamine increased overall tendency to switch choice in addition to outcome-dependent modulation of outcome evaluation. However, BIC differed only slightly (dBIC = 3.99) between the best-fitting (K-model 7) and the second-best model (K-model 4) and the model predictions for choice behavior were very similar both qualitatively and quantitatively (Table 3, Figure 4). We conclude that the behavioral effects of ketamine were consistent across all three monkeys.”

      (3) The authors should discuss whether the plasma ketamine levels they observed are similar to those seen with rapid antidepressant ketamine or are higher or lower.

      We added a sentence in the first paragraph of the Result section as follows with a reference.

      “Plasma concentration and its time course over 60 minutes were also comparable to those measured after 0.5mg/kg in human subjects (35).”

      (35) Zarate CA, Brutsche N, Laje G, Luckenbaugh DA, Venkata SLV, Ramamoorthy A, et al (2012): Relationship of ketamine’s plasma metabolites with response, diagnosis, and side effects in major depression. Biol Psychiatry, 72: 331-338.

      (4) For Figure 4 or S3, the authors should show the data fitted to model 7, which was the best for one of the animals.

      We added the parameters and model predictions from both K-model 7 and K-model 4 for monkey P to help comparison between two models in Table 3, and Figure 4. Revised Table 3 and Figure 4 are as follows:

      Author response table 1.

      Maximum likelihood parameter estimates of the best models for saline and ketamine sessions.

      In all three animals, the model incorporating valence-dependent change in outcome evaluation best fit the choice data from ketamine sessions with (K-model 7 in the parenthesis, P) or without (K-model 4, P and Y/B) additional change in the tendency of choice perseveration (Figure 3, Table 3).

      Author response image 1.

      ketamine-induced behavioral modulation simulated with differential forgetting model (for saline session) and best-fitting K-model (for ketamine session).

    1. Author Response

      The following is the authors’ response to the original reviews.

      To Reviewer #1

      We sincerely appreciate the constructive and insightful comments provided by the reviewer. Their valuable suggestions have been meticulously considered, leading to comprehensive modifications within the article.

      In addition, we want to stress that we have implemented a significant additional modification by introducing a new figure (Fig. 6). This figure highlights the collaborative impact of FMRP and Map1B on the microtubular structure of migrating neurons. We firmly believe that this molecular elucidation of the migration phenotype constitutes a noteworthy addition to our work.

      Public Review

      (1) We have taken the necessary steps to enhance the material and methods section of our neuronal migration analysis. We apologize for any initial lack of detail, including the omission of information on sinuosity index and directionality radar. Regarding the query about speed, we want to clarify that it indeed encompasses the percentage of pausing time. The speed is calculated by dividing the total distance traveled by the cell by the total time it migrated.

      (2) We would like to provide a clarification regarding the statistical analysis in our figures. The figures now represent the median, and the legend indicates the median along with the interquartile range. This approach is in line with the use of non-parametric analysis for variables that do not adhere to a normal distribution. Regrettably, in the previous version, there was an oversight in the figure legends where the mean, along with the standard error of the mean, was incorrectly stated instead of the intended representation of the median. We sincerely apologize for any confusion this may have caused. Moving forward, the corrected legend now accurately reflects the statistical measures used in the analysis.

      The global Kruskal Wallis analysis, followed by Dunn’s post hoc analysis, does indeed indicate that Fmr1 KD globally replicates the Fmr1-null phenotype. However, we concur with the reviewer's point regarding directionality, and we apologize for any lack of precision in the initial version. Upon further analysis, we have identified a significant difference in directionality (Fisher test p < 0.001). This more pronounced directionality defect in the KD could potentially be indicative of a lack of compensation, a factor that may not be at play in the Fmr1 null context. We appreciate the opportunity to address this issue and our revised version includes the necessary details to accurately convey these findings.

      (3) We appreciate the referee's agreement with our perspective.

      (4) In response to the recommendations from all referees, we have expanded both the introduction and discussion sections of our manuscript. The initial brevity of these sections was due to the short format we had initially chosen. We believe that these expansions contribute to a more comprehensive and nuanced presentation of our work, addressing the concerns raised by the referees.

      Recommendations for the authors

      The time stamp and scale bars were added.

      The median versus mean issue is addressed above.

      Figure numbering has been corrected (sorry for the mistake). The efficiency of CK is defined in the Mat and Met section.

      To Reviewer #2

      Public review

      We express our gratitude to the referee for their positive appreciation of our work. We have carefully considered their suggestions and have modified the article accordingly.

      In addition, as said to Referee #1, we want to stress that we have implemented a significant additional modification by introducing a new figure (Fig. 6). This figure highlights the collaborative impact of FMRP and Map1B on the microtubular structure of migrating neurons. We firmly believe that this molecular elucidation of the migration phenotype constitutes a noteworthy addition to our work.

      Recommendations for the authors

      (1) In light of the referee's recommendation, we conducted more resolutive staining of FMRP in SVZ neurons cultured in Matrigel, providing a more precise depiction of its subcellular localization (see Figure 1). Additionally, we have removed the sentence referring to growth cone staining, as it was not visibly present in cultured neurons. We appreciate the guidance from the referee in refining our study.

      (2) We have also added a new figure 4 with better staining of MAP1B in the RMS as well as a more resolutive MAP1B staining in cultured neurons.

      With all due respect, we maintain that the western blot experiments, conducted in three independent experiments, unequivocally support the conclusion of a 1.6X increase in MAP1B in the RMS of Fmr1null mutants, a trend observed in other systems.

      In accordance with the referee's suggestion, we endeavored to quantify RMS immunostainings. Regrettably, the results proved inconclusive. This outcome is not entirely unexpected, as immunostainings are recognized for their inherent challenges in quantification. The additional complexity introduced by neonate perfusion further contributes to the notable interindividual variability observed.

      (3) The efficiency of the two interfering RNAs is now documented in the text. Regarding the directionality radar, as highlighted for Ref 1 (public review, point #2), we acknowledge that, while Fmr1KD generally recapitulates the migratory phenotype of the Fmr1 mutants, more precise statistical analysis reveals differences in directionality, which is now documented. We apologize for the previous lack of precision.

      (4) The suggested experiment of overexpression is interesting but we faced challenges in its execution. Attempts to overexpress MAP1B through intraventricular electroporation of a CMV-MAP1B plasmid resulted in the immobilization of transfected cells in the SVZ, hindering further analysis of migration. We hypothesize that this outcome may be attributed to a discrepancy in the actual dosage of MAP1B in the mutants.

      (5) Concerning this point, and as mentioned above, we have incorporated a crucial piece of information into the manuscript, presented in Figure 6. The data reveal a severe disruption in the microtubular cage surrounding the nucleus of migrating neurons in Fmr1 mutants, a phenomenon rescued by MAP1B knock-down. Based on these findings, we believe we can confidently conclude that the microtubule-dependent functions of MAP1B play a role in the migratory phenotype of Fmr1 mutants. We consider this experiment to be a highly valuable addition to our work, shedding light on the underlying molecular mechanisms.

      To Reviewer #3

      We thank the referee for their insightful comments and have taken their consideration with great considerations.

      In addition and as said above, we want to stress that we have implemented a significant additional modification by introducing a new figure (Fig. 6). This figure highlights the collaborative impact of FMRP and Map1B on the microtubular structure of migrating neurons. We firmly believe that this molecular elucidation of the migration phenotype constitutes a noteworthy addition to our work.

      Public review

      With regard to the perceived 'incompleteness' of our work, we believe that the addition of Figure 6, illustrating the molecular underpinnings of the Fmr1 mutation on the microtubular cytoskeleton and its rescue in the MAP1B KD, significantly enhances the completeness of our study.

      In response to the comment on the introduction and discussion sections, we acknowledge that their brevity was due to the Short Format initially chosen. We have since expanded these sections, incorporating additional information about FMRP and MAP1B and their influences on migration.

      Regarding the La Fata article, as highlighted in our discussion, it's important to note that while the study did not strongly indicate an impact on radial locomotion per se, drawing conclusive results is challenging due to the relatively low number of analyzed neurons. Consequently, we do not believe that it poses a challenge to our findings.

      With respect to MAP1B overexpression, as previously mentioned in response to Ref #2, point 4, our attempts resulted in the inhibition of migration, potentially due to an overdosage of the protein.

      In terms of anatomical consequences, as highlighted in our discussion, while our neurons experience a delay in migration, they eventually reach their destination. Although a delay in migration may not directly result in significant anatomical anomalies, we acknowledge that the timing of differentiation can be crucial. As noted by Bocchi et al. (2017), a delay in the timing of differentiation for neurons reaching their target could lead to notable functional consequences. In any case, we have tOned down any references to the implication for the pathology.

      Recommendation for the authors

      • The size of the figures has been modified

      • The pausing time and sinuosity are now defined

      • The centrin-RFP labeling was indeed too weak in the previous version, which we corrected. We apologize for this.

      • Fig S3 has been revised to address concerns. Notably, the decision to present the two bands for Vinculin and MAP1B separately is intentional. The blot is cut to allow independent development due to the substantial difference in their development times. We believe this approach provides a more accurate representation of the data.

      • The numbering of the figures has been corrected. Sorry for the initial mistake.

      • The Mat and Meth section has been corrected. Please note that we did not use any culture insert in this study.

      • The tittle has been modified

      • Comments about the Map1B overexpression experiment are expressed above and in replies to ref #2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Lee et al. compared encoding of odor identity and value by calcium signaling from neurons in the ventral pallidum (VP) in comparison to D1 and D2 neurons in the olfactory tubercle (OT).

      Strengths:

      They utilize a strong comparative approach, which allows the comparison of signals in two directly connected regions. First, they demonstrate that both D1 and D2 OT neurons project strongly to the VP, but not the VTA or other examined regions, in contrast to accumbal D1 neurons which project strongly to the VTA as well as the VP. They examine single unit calcium activity in a robust olfactory cue conditioning paradigm that allows them to differentiate encoding of olfactory identity versus value, by incorporating two different sucrose, neutral and air puff cues with different chemical characteristics. They then use multiple analytical approaches to demonstrate strong, low-dimensional encoding of cue value in the VP, and more robust, high-dimensional encoding of odor identity by both D1 and D2 OT neurons, though D1 OT neurons are still somewhat modulated by reward contingency/value. Finally, they utilize a modified conditioning paradigm that dissociates reward probability and lick vigor to demonstrate that VP encoding of cue value is not dependent on encoding of lick vigor during sucrose cues, and that separable populations of VP neurons encode cue value/sucrose probability and lick vigor.

      Weaknesses:

      The conclusions of the data are mostly well supported by the analyses, but the statistical analysis is somewhat limited and needs to be clarified and extended.

      (1) The manuscript includes limited direct statistical comparison of the neural populations, and many of the comparisons between the subregions are descriptive, including descriptions of the percentage of neurons having specific response types, or differences in effect sizes or differing "levels" of significance. An additional direct comparison of data from each subpopulation would help to confirm whether the differences reported are statistically meaningful.

      Response: We thank the reviewer for their helpful suggestions. As the reviewer noted, the first version of our manuscript had limited direct comparisons of single-neuron metrics across subpopulations. These analyses were also limited to the supplementary figures: 1) {SK vs. XK} and {SK vs. ST} decoder auROC (S10F), 2) Valence scores (S10G), and 3) S-cue confusion after MNR classification (S11D). We have now included the following statistical comparisons of single-neuron metrics across subpopulation: 1) % of neurons that respond to both S cues (Tables S10, S11), 2) % of neurons that have auROC >0.75 for {SK vs. XK}, {SK vs. PK}, and {SK vs. ST} (Tables S12-S17), 3) response magnitudes to S cues (Table S38), and 4) valence scores (Tables S44-46).

      (2) When hypothesis tests are conducted between the neural populations, it is not clear whether the authors have accounted for the random effect of the subject, or whether individual units were treated as fully independent. For instance, pairwise differences are reported in Figures 4I, 5G/I/L, and others, but the statistical methods are unclear. Assessment of the statistics is further limited by the lack of reporting of degrees of freedom. If the individual neurons are treated as independent in these analyses, it could increase the likelihood of

      Response: We have clarified when statistical analyses are comparing individual neurons vs. simultaneously recorded populations. Per the reviewer’s recommendation, we have also incorporated linear mixed-effects models when statistically analyzing individual neurons. Lastly, to further clarify the statistical analyses used, we have added multiple supplementary tables that better describe the statistical tests used and the relevant outputs.

      Reviewer #2 (Public Review):

      Summary:

      This work is interesting since the authors provide an in vivo analysis into how odor-associations may change as represented at the level of olfactory tubercle (presynaptic) and next at the level of the ventral pallidum (postsynaptic). First the authors start-off with a seemingly careful characterization of the anterograde and retrograde connectivity of dopamine 1 receptor (D1) and dopamine 2 receptor (D2) expressing medium spiny neurons in the olfactory tubercle and neurons in the ventral pallidum. From this work they claim that regardless of D1 or D2 expression, tubercle neurons mainly project to the lateral portion of the ventral pallidum. Next, to compare how odor-associated neuronal activity in the ventral pallidum and the olfactory tubercle (D1 vs D2 MSNs) transforms across association learning, the authors performed 2photon calcium imaging while mice engaged in a lick / no-lick task wherein two odors are associated with reward, two odors are associated with no outcome, and two odors are associated with an air puff.

      This manuscript builds off of prior work by several groups indicating that the olfactory tubercle neurons form flexible learned associations to odors by looking at outputs into the pallidum (but without looking specifically at palladial neurons that truly get input from tubercle I should highlight) and with that, this work is novel. We appreciated the use of a straight-forward odoroutcome behavioral paradigm and the careful computational methods and analyses utilized to disentangle the contributions of single neurons vs population level responses to behavior. With one exception from the Murthy lab, 2P imaging in the tubercle is a new frontier and that is appreciated - as is the 2P imaging in the pallidum which was well-supported by the histology. The anatomical work is also well presented.

      Overall the approach and methods are superb. The issues come when considering how the authors present the story and what conclusions are made from these data. Several key points before going into specifics about each are: 1) The authors can not conclude that their results are contradictory to prior results, 2) The authors over-interpret the results and do not discuss several key methodological issues. We were concerned with the ability to make strong claims regarding the circuitry presented, especially given how much the presented claims contradict prior work. There were also issues with the interpretability of neuronal encoding of value vs valence based on the present behavior (in which a distinction between the air puff and neutral trial types was not clear) and the imaging methodology (in which the neuronal populations analyzed were not clearly defined). In addition to toning down and rectifying some of the language and interpretations, we suggest including a study limitations section where these methodological and interpretation issues are discussed. Over-interpreting and playing up the significance of this work is unnecessary, especially given eLife's new review and publication policy. Readers should be given a sufficiently detailed and nuanced presentation of these thought-provoking results, and from there allowed to interpret the results as they want.

      Strengths:

      State-of-the-art approaches (as detailed above)

      Possible conceptual innovation in terms of looking into output from the olfactory tubercle which has yet to be investigated in this avenue.

      Weaknesses:

      On the first point regarding the authors repeated and unsupported claims that their results are contradictory. There are papers by numerous groups, in respected journals including this one, all together which used 5 different methods (cfos, photometry, 2P, units, fMRI), in animals ranging from humans to mice, which support that tubercle neurons reflect the emotional association of an odor, whether spontaneous or learned. With that, it is on the authors to not claim that their results contradict as if the other papers are suspect, but instead, from our standpoint it is on the authors to explain how and why their results differ from these other papers versus just simply saying they found something different [which at present is framed in a way that is 'correct' due to primacy if nothing else].

      Response: We acknowledge that the first version of the manuscript contained unnecessary disagreeing language. We do not think that our results are broadly in disagreement with the existing literature, but we do come to different conclusions about what the OT is representing. Namely, our comparison of valence encoding in OT to that in the VP strongly indicates that the anteromedial OT has a less robust representation of valence, and we argue that this reflects either an intermediate form of valence representation or potentially might not be important for valence representation at all. We have toned down our conclusions, made clear that we are only recording from one domain of the OT, limited our speculation to the discussion and added a “speculations” section.

      Second, onto the points of interpretation of results, there are several specific areas where this should be rectified. As is, the authors overinterpret their results and draw too far-reaching conclusions. This needs to be corrected.

      In particular, the claims that D1 and D2 neurons of the olfactory tubercle nearly exclusively send projections to the ventral pallidum must be interpreted with caution given that the authors injected an anterograde AAV into the anteromedial olfactory tubercle, and did not examine the projections from either the posterior or lateral portions of the olfactory tubercle. This is especially significant since the retrograde tracing performed from the ventral pallidum indicates that the lateral olfactory tubercle, not the medial olfactory tubercle, primarily projects to the ventral pallidum (Fig 1D-F), however this may be due to leakage into the nucleus accumbens, as seen in the supplementary figure, S1G.

      Response: We thank the reviewer for the point of caution. We have now made it clear that our conclusions are limited to the anteromedial portion of the OT, and other areas may have other projections.

      The same caution must be advised when interpreting the retrograde tracing performed in Fig 1G-I, since the neuronal tracer used and the laterality and rostral-caudal injection site within the VTA could result in different projection patterns and under- or over-labelling. Additionally, the metric used, %Fiber Density (Figure 1C), as in the percentage of 16-bit pixels within the region of interest with an intensity greater than 200, is semi-quantitative, and is more applicable for examining axonal fibers that pass through a region rather than the synaptic terminals (like with a synaptophysin fusion protein-based tracing paradigm) found within a region (puncta). The statements made in contrast to prior studies should therefore be softened, and these concerns should be addressed in the introduction, discussion, and the limitations section if added.

      Response: We have added statements to address these limitations.

      The other major concern is whether the behavioral data generated is indicative of the full spectrum of valence. The authors appropriately state that the mice "perceive" the air puff, yet based on their data the mice did not clearly experience the puff-associated odor as emotionally aversive (viz., negative valence). The way the authors describe these results, it seems they agree with this. With that, the authors can't say the puff is aversive without data to show such - that is an assumption which, while seemingly intuitive, is not supported by the data unfortunately. To elaborate more since this is important to the messaging of the paper: The authors utilized a simple behavioral design, wherein two molecular classes of odors were included in either a sucrose rewarded, neutral no outcome, or air puff punished trial type. The odor-outcome pairs were switched after three days, allowing the authors to compare neuronal responses on the basis of odor identity and the later associated outcome. While the mice showed clear learning of the rewarded trial types by an increase in anticipatory licking during the odor, they did not show any significant changes in behavior that indicated learning of the air puff trial type (change in running velocity or % maximal eye size), especially in contrast to the neutral trial type. This brings up the concern that either the odor-air puff aversive associations (to odors) were not learned, or that the neutral trial types, in which a reward was omitted, were just as aversive as the air puff to the rear, despite the lack of startle response - perhaps due to stimulus generalization between neutral and air puff odor. The possibility of lack of learning is addressed in the paragraph starting at line 578, but does not account for the possibility that the lack of reward is also sufficiently punishing. The authors also address the possibility that laterality in the VP contributed to the lack of neural responsivity observed, but should also include a statement regarding laterality in the olfactory tubercle, as described in https://doi.org/10.7554/eLife.25423 and https://doi.org/10.1523/JNEUROSCI.0073-15.2015, since the effects of modulating the lateral portion of the olfactory tubercle are not yet reported. Lastly, use of the term "reward processing" should be avoided/omitted since the authors did not specifically study the processing of reinforcers.

      Response: As the reviewer points out, we tried to be cautious interpreting the “aversive” odor response, and focused mainly on the reward association. This was discussed in the discussion. We don’t see the need to further add a redundent statement to a “limitations section”. We have also added a note about the previously identified laterality of the OT, which might account for lack of aversive responsive neurons in the OT. The reviewer makes an interesting suggestion that behavioral responses to airpuff-associated odors are not significantly different from un-associated because the lack of reward in this context is already aversive. We note that the walking velocity between reward- and puff-associated odor is significantly different, but not that to unassociated. This is in agreement with the suggestion, and we have added a statement to reflect this.

      Also, I would appreciate justification of the term "value". How specifically does the assay used assess value versus a more simplistic learned association which influences perceived hedonics or valence of the odors.

      Response: We have removed the term “value” with the exception of areas where we cite the work of others. We acknowledge that the word value is complicated in the incentive learning field and appreciate the suggestion. Our experimental design was meant to investigate learned association for positive and negative stimuli, thus valence is more appropriate and we have used this term.

      More information is needed regarding how neurons are identified day-to-day, both in textual additions to the Methods and also in terms of elaborating more in the results and/or figure legends about what neurons are included:

      (a) The ROI maps for identifying/indicating cells in the FOVs are nice to see and at the same time raise some concerns about how cells are identified and/or borders for those specific ROIs drawn. For instance, Figure 4, A & D, ROI #13 (cell #13) between those two panels is VERY different in shape/size. Also see ROIs 15 and 4. Why was an ROI map not made on day 1 and then that same map applied and registered to frames from consecutive imaging days in that same mouse? As it is new ROIs are drawn, smaller for some "cells" and larger for others. And at least in ROI #13 above, one ROI is about twice as large as the other. This inconsistency in the work flow and definition of the ROIs is needing to be addressed in Methods. Also, the authors should address if and how this could possibly impact their results.

      Response: We have added details and clarified the methods section to make this more clear. We note that we extracted calcium transients from the raw data with the the widely used Constrained Nonnegative Matrix Factorization (CNMF) algorithm. This processing algorithm simultaneously identifies spatial and temporal components using modeled kinetics of calcium transients and pre-trained CNN classifiers. Using 2-photon microscopy the optical resolution in the z plane is narrow and we may not always capture components of a neuron that look like “neurons”, but all ROIs were confirmed manually to ensure they were not artifacts.

      (b) Also, more details are needed in results and/or figure legends regarding the changes in cell numbers over days that are directly compared in the results. Some days there are 10% or more or less cells. Why? It is not the same population being compared in this case and so some Discussion of this is needed.

      Response: The shapes of the spatial components can vary across days due to nonrigid motion in the brain and/or miniscule differences in the imaging angle across days. Although we visually verified that we are imaging approximately the same z plane across days, we cannot (and do not) claim to image identical populations of neurons across days.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes a study of the olfactory tubercle in the context of reward representation in the brain. The authors do so by studying the responses of OT neurons to odors with various reward contingencies and compare systematically to the ventral pallidum. Through careful tracing, they present convincing anatomical evidence that the projection from the olfactory tubercle is restricted to the lateral portion of the ventral pallidum.

      Using a clever behavioral paradigm, the authors then investigate how D1 receptor- vs. D2 receptor-expressing neurons of the OT respond to odors as mice learn different contingencies. The authors find that, while the D1-expressing OT neurons are modulated marginally more by the rewarded odor than the D2-expressing OT neurons as mice learn the contingencies, this modulation is significantly less than is observed for the ventral pallidum. In addition, neither of the OT neuron classes shows significant modulation by the reward itself. In contrast, the OT neurons contained information that could distinguish odor identities. These observations have led the authors to conclude that the primary feature represented in the OT is not reward.

      Strengths:

      The highly localized projection pattern from olfactory tubercle to ventral pallidum is a valuable finding and suggests that studying this connection may give unique insights into the transformation of odor by reward association.

      Comparison of olfactory tubercle vs. ventral pallidum is a good strategy to further clarify the olfactory tubercle's position in value representation in the brain.

      Weaknesses:

      The authors' interpretation of the physiologic results - that a novel framework is needed to interpret the OT's role - requires more careful treatment.

      Response: We thank the reviewer for their recommendation. We have toned down the conclusiveness of our language in the discussion. Additionally, we have removed several speculative sentences from the concluding paragraph.

      Reviewer recommendations for Authors:

      We thank the reviewers for this helpful list of recommended changes to the manuscript.<br /> Regrettably, a few of the recommendations were overlooked in the revision, as indicated below.<br /> We do agree with the suggestions and plan to add appropriate changes to the version of record.

      Reviewer #1 (Recommendations For The Authors):

      If the comparisons mentioned in point 2 in the public review do not account for the lack of independence of individual neurons, I suggest the authors do so by either running linear mixed effects models with a random effect for subject, or one-way ANOVAs with a random effect of subject, where appropriate. The authors could also run analyses on summarized individual subject data (averages, % of neurons, etc.), though the authors would lose substantial power when assessing whether average changes differ between subjects in each recording group.

      We have clarified when statistical analyses are comparing individual neurons vs. simultaneously recorded populations. Per the reviewer’s recommendation, we have also incorporated linear mixed-effects models when statistically analyzing individual neurons. Lastly, to further clarify the statistical analyses used, we have added supplementary tables for every statistical test that better describe the parameters used and the relevant outputs.

      Reviewer #2 (Recommendations For The Authors):

      Of minor note, there are some symbols/special characters that did not translate in the figure caption for Figure 6C, repeated text between lines 700-705 and 707-712, and some other small grammatical errors. Additionally, the source of the anterograde tracing virus (AAV9-phSyn1FLEX-tdTomato-T2A-SypEGFP-WPRE) needs to be stated.

      Thank you for pointing these out. We have added description to the figure legend, and deleted the repeated lines and fixed grammatical errors. During the revision, we Regrettably overlooked the request to provide the source for the AAV9-phSyn1-FLEX-tdTomato-T2A-SypEGFP-WPRE. We agree that this small detail is important and will add it before publication of the version of record. This viral vector was purchased from The Salk Institute GT3 Core.

      Reviewer #3 (Recommendations For The Authors):

      The authors' interpretation of the physiologic results - that a novel framework is needed to interpret the OT's role - requires more careful treatment. As the authors note, there is rewardcontingency modulation in OT, especially when D1 neurons are compared against D2, as shown in Fig. 3D,E, Fig. 4I, and Fig. F,J. Though small in effect size, presumably, these modulations cannot be explained by the odor identity. These observations, to this reviewer, suggest the D1 neurons of OT have a component of cue-reward representation. In other words, rather than developing an entirely new framework, an alternative possibility that D1 neurons of OT occupy an intermediate stage in associating cues with reward (i.e., under the same framework, but occupying a different position in the emergence of value representation) should be considered.

      We thank the reviewer for this thoughtful comment. We have eliminated the statement that “novel framework is needed” and have been more conservative in our interpretations. We have also acknowledged that our results are not necessarily in conflict with existing literature, but we do draw different conclusions, namely that the anteromedial OT is not a robust valence encoding population in comparison to that in the VP. We appreciate the suggestion of the term “intermediate stage” in reward association and have now included this in the discussion. Lastly, we have limited broader speculation to a “speculation” section of the discussion.

      Related to the above point, have the authors analyzed if the similarities in the chemical structures correspond to perceptual and neural similarities? In the data presented in Figure S4, there are greater similarities in the population patterns within the same rewarding condition than within chemical groups. A comparison of the reward vs. chemical group (a simpler version of Fig. 5B) may be beneficial and take full advantage of the experimental design.

      This comparison already exists in 5B and lines 285-289 of results. In VP populations, the distribution was structured such that intervalence pairwise comparisons between sucrose-paired and not sucrose-paired odors (e.g. ||SK-PK|| and ||SK-XK||) were larger than intravalence pairwise comparisons (e.g. ||SK-ST||, or ||XK-XT||). OTD1 populations showed an intermediate trend where most intravalence pairwise distances were smaller than intervalence pairwise distances with the exception of ||SK-ST||.

      Related to the point about chemical similarities - is the smaller effect size (amount of modulation associated with reward contingency) in this study, compared to the study by Martiros et al, explained by the similarities of odorants used?

      This is an interesting point. Although the odorants we use are different from those in Martiros et al, we think it is unlikely to the basis of smaller effect size due to reward modulation. If OT represents odor in a population code, whereby identity is encoded in unique ensembles of activity, then variation in the expression of D1R between OT neurons could account for different effects in different ensembles. However, there is no evidence for such varied expression and it doesn’t seem like an ideal mechanism for the OT to broadly associate odor with reward. Moreover, we do not observe any differences in effect size of reward association between the different odorants used in our study. Rather, we think the difference between our findings is more likely to result from recording in different populations of neurons, which is addressed in lines 522-535.

      Regarding the data presented in Fig. 3I - the rewarded odor responses (Sk) are compared against neutral ones (Xk responses), but an S vs. P comparison may be informative, too. Even though the authors mention that the effect of air puff is subtle, the behavioral data presented in Fig. 2F and G suggest that these serve as aversive stimuli. For example, on day 4, the first day after the reward contingency switch, the licking levels seem the lowest for the P odors.

      We have added the S vs P comparison. Indeed, we had originally omitted this because the neural and behavioral response to puff cues was not robust. This is discussed in the discussion (lines 563-579), and our conclusions about aversive conditioning are cautious.

      Regarding the data presented in Fig. 4G: it is difficult to interpret the data when the data for day 1 reward period and day 3 reward cue period are combined. Or do the authors mean day 1 S cue and day 3 S cue?

      These data were based on an observation that some neurons in the VP only responded to sucrose (not odor) on day 1, but later became responsive to the associated odor on day 4. To quantify this, Fig. 4G shows the percentage of these neurons by reporting the percentage that were both responsive to sucrose (not odor) on day 1 and also rewarded odor on day 3. This is described in lines 260-274.

      Figure 6 presentation would benefit from a revision. For example, it is unclear if the water port becomes available for the "N" odors with 100% or 50% chance of reward delivery, and if so, how that happens. There are some errors e.g., colormap used for panel G; odors listed may be wrong in line 752 etc. It was unfortunately not possible to understand what was presented.

      We have added a schematic (Fig 6B) to better describe the movement of the port and details to the methods. The color scale was indeed inverted in panel G (now H), and it has been corrected. We have verified that the odors listed in the methods are correct. Although not included in the revision, in the version of record we will also add corresponding descriptors (e.g., LHi & Lx) to the odors in the methods for easier comparison.

      Minor comments

      For Figure 2H, an alternative description in the legend may be beneficial, as the phrasing is not intuitive. A suggested alternative is "licks in response to sugar-associated odors expressed as fraction of all odors".

      We appreciate the suggestion and have changed this to “licks during either sucrose cue expressed as a fraction of all licks during any odor.”

      Figure 2H: please explain the color code for crosses in the legend and the statistical comparison shown in the figure.

      We have added a legend to explain the color code and included a statement about the statistics in the legend with a link to a supplemental table for statistical parameters.

      Figure 3D: may contain mislabeling in the legend - the legend for 3D does not match the plot (legend refers to bar graph while plot shows line graphs)

      Unclear what is meant. 3D legend says: “Percentage of total neurons that were significantly excited or inhibited by each odor (Bonferroni- adjusted FDR < 0.05) as a function of time relative to odor. Lines represent the mean across biological replicates and the shaded area reflects the mean ± SEM.” This is not a bar plot and is not referred to as one. 3E does show bar plots and is correctly described in the legend.

      Figure 3M: uses letters to refer to cell populations that are identical to the roman numerals used in Fig 3 A-C as well as colours similar to the ones in Fig 3C. However, the cell groups are unrelated; splitting the figures or using a different nomenclature might help

      We have adapted a different color code that we think makes this more distinct.

      Figure 4I: statistical comparison shown in figure not explained (neither in main text nor legend)

      We have added a statement about the statistical comparison and referenced a supplementary table.

      Figure 5 D: color code appears to have a different range than the values shown (i.e. lower limit is 0.7 while the plot shows values below 0.7)

      We confirm this is not a mistake but a stylistic choice. The displayed color scale does only show values to lower limit of 0.7, while the lower limit of values is 0.67. Although the color for 0.67 is not shown in the scale it is approximately the same as the lower limit. The values are reported for full transparency and accuracy.

      Figure 5 G, I, & L: statistical comparison shown in figure not explained

      The comparisons have been explained in supplemental tables (S22-29) and referenced in the legend.

      Figure 5 I: meaning of symbols overlayed over bars not explained

      “Markers represent the mean across biological replicates” has been added.

      Figure 5 J&K: please state if error bars show SEM or SD; also please describe individual thinner lines in the legend

      This has been added to describe 5I. The same format applies to J&K.

      Figure 5L: please describe the individual crosses overlayed over bars in the legend

      Described in 5I.

      Figure S6A-C: please mention the odors used.

      S6A-C shows kinetics for the odor a-terpinene, which is now indicated in the legend.

      Line 129: mentions a 70 psi airpuff but methods say 75 psi - please clarify This has been corrected. 70 psi is the correct value.

      Line 134 typo: SP should be PK

      This has been corrected.

      Line 428: typo; should be cluster 3, not 2

      This has been corrected.

      Line 474 (and figure 6O): please explain what "P" is

      “P” is probability, used as P(S), as in probability of sucrose. This is defined in in line 466.

      Line 692: please describe the staining protocol in the methods (rather than just listing the antibodies and concentrations)

      We have added more details (lines 692-699).

      Line 707-712: duplicate text (identical to Line 700-705)

      This has been deleted.

    1. Author Response

      Comments on eLife Reviews

      We thank the reviewers for their positive comments and constructive feedback following their thorough reading of the manuscript. In this provisional reply we will briefly address the reviewer’s comments and suggestions point by point. In the forthcoming revised manuscript, we will more thoroughly address the reviewer’s comments and provide additional supporting data.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we will revise the text to clarify that the random clustering is random with respect to any future, novel environment. The cause of clustering could be prior experiences (e.g. Bourjaily M & Miller P, Front. Comput. Neurosci. 5:37, 2011) or developmental programming (e.g. Perin R, Berger TK, & Markram H, Proc. Natl. Acad. Sci. USA 108:5419, 2011).

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (2.1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2.2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Based on our finding that preplay occurs only in networks that sustain cluster activity over multiple decoding time bins (Figure 5d-e), our understanding of the model’s function is consistent with the reviewers first explanation. We will provide additional analysis in the forthcoming revised manuscript in order to directly test the first explanation and will also test the intriguing possibility that the reviewer’s second suggestion contributes to above-chance preplay.

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We will revise the text and add illustrative figures.

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We will test another type of small-world network if time permits.

      Our discussion of “cluster overlap” is specific to our type of small-world network in which there is no pre-determined spatial dimension (unlike the ring network of Watts and Strogatz). Therefore, because clusters map randomly to location once a particular spatial context is imposed, the random overlap between clusters produces long-range connections in that context (and any other context) so one can think of the amount of overlap between clusters as representing the number of long-range connections in a Watts-Strogatz model, except, we wish to iterate, such models involve a spatial topology within the network, which we do not include.

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Yes, we are highly confident that the clusters in our network would correspond to the functional assemblies that have been studied through assembly analysis and will present the relevant data in a revision.

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally and will test this if time permits.

      Reviewer # 2

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We agree that this is an important question, and we plan to run further simulations where we test the effects of varying the simulated speed. We will present results in the resubmission.

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to different models of feedforward input is important and we plan to do this in our revised manuscript for the linear track and W-track.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 7b the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson RE et al, Hippocampus 6:281, 1996; Pavlides C, et al, Neurobiol Learn Mem 161:122, 2019). Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated. In our revised manuscript we will address this point more carefully and cite the relevant experimental support.

      Reviewer # 3

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input so will present such additional results in our revised version. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      First, we apologize for a lack of clarity if we have caused confusion about the type of inputs (linear and cluster-dependent as we had attempted to portray prominently in Figure 1, where it is described in the caption, l. 156-157, and Results, l. 189-190 & l. 497-499, as well as in the Methods, l. 671-683) and if we implied an absence of spatially-tuned information in the network. In the revision we will clarify that for reliable place fields to appear, the network must receive spatial information and that one point of our paper is that the information need not arrive as peaks of external input already resembling place cells or grid cells. We chose linearly ramping boundary inputs as the minimally place-field like stimulus (that still contains spatial information) but in our revision we will include alternatives. We should note that during sleep, when “preplay” occurs, there is no such spatial bias (which is why preplay can equally correlate with place field sequences in any context). In the revision, we will update Figure 1 to show more clearly the cluster-dependent linearly ramping input received by some specific cells with both similar and different place fields.

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells.

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we will revise the text to specifically address CA3 connectivity (Guzman et al., Science 353 (6304), 1117-1123 2016) and the small-world structure therein due to the presence of “assemblies”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study investigated transcriptional profiles of midbrain dopamine neurons using single nucleus RNA (snRNA) sequencing. The authors found more nuanced subgroups of dopamine neurons than previous studies, and idenfied some genes that are preferenally expressed in subpopulaons that are more vulnerable to neurochemical lesions using 6-hydroxydopamine (6OHDA). The reviewers found the results are solid, and the study is overall valuable, providing crical informaon on the heterogeneity and vulnerability of dopamine neurons although the scope is somewhat limited because the result with snRNA is similar to previous results and cell deaths were induced by 6OHDA injecons.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study by Yaghmaeian Salmani et al., the authors performed single-nuclei RNA sequencing of a large number of cells (>70,000) in the ventral midbrain. The authors focused on cells in the ventral tegmental area (VTA) and substana nigra (SN), which contain heterogeneous cell populaons comprising dopaminergic, GABAergic, and glutamatergic neurons. Dopamine neurons are known to consist of heterogeneous subtypes, and these cells have been implicated in various neuropsychiatric diseases. Thus, idenfying specific marker genes across different dopamine subpopulaons may allow researchers in future studies to develop dopamine subtype-specific targeng strategies that could have substanal translaonal implicaons for developing more specific therapies for neuropsychiatric diseases.

      A strength of the authors' approach compared to previous work is that a large number of cells were sequenced, which was achieved using snRNA-seq, which the authors found to be superior compared to scRNA-seq for reducing sampling bias. A weakness of the study is that relavely litle new informaon is provided as the results are largely consistent with previous studies (e.g., Poulin et al., 2014). Nevertheless, it should be noted that the authors found some more nuanced subdivisions in several genecally idenfied DA subtypes.

      On this point we respectfully disagree with the reviewer. In this study, over 30,000 mDA neurons have been analyzed at the genome-wide gene expression level, idenfying mDA territories and neighborhoods (that some may call “subtypes”), a descripon of the mDA neuron diversity that goes far beyond what has been published previously.

      Although several single-cell RNA sequencing studies of mDA neurons have added to our understanding of mDA diversity, they have been limited by the low numbers of sequenced mDA neurons. As the reviewer specifically referred to the study by Poulin et al., 2014, it should be noted that in this report, 159 mDA neurons were analyzed by qPCR – not by RNAseq – of 96 previously identified marker genes. Despite those limitaons, this was indeed a highly impressive study, suggesng five different mDA neuron subtypes (as compared to the 16 neighborhoods described here), published before the era of single-cell genome-wide gene expression methods and advanced bioinformac tools were available. On average, the following scRNAseq studies typically captured a few hundred mDA neurons - compared to over 30,000 in this study. None of the studies menoned in our manuscript were close to capturing the full diversity, and the informaon on mDA neuron diversity is, for this reason, somewhat fragmented in the scienfic literature. Indeed, the seven mDA “subtypes” described in the excellent reviews by Poulin et al., 2020 in Trends in Neurosciences and Garritsen et al., 2023 in Nature Neuroscience are integrated interpretaons of the results from numerous independent studies, each methodologically unique. Several previously idenfied groups, especially Vglut2+ populaons in VTA and SNpc, have been considered poorly defined. As menoned above, our findings in this study could reliably idenfy, by computaonal analyses and combinatorial marker expression in situ, 16 different neighborhoods within the mDA populaon and localize them in the ssue (Figure 4, Supplementary figures 4-1 to 4-3, described further in Supplementary Results). To menon three examples: Within Sox6+ SNpc, we idenfied four different variants (neighborhoods) with partly unique anatomical localizaon. In addion, the large group of mDA neurons referred to as the Pcsk6 territory has not been clearly defined in earlier studies. We also idenfied a novel mDA neuron group that is related to the previously well described Vip-expressing mDA neurons. These and other novel features are menoned in the manuscript and in Supplementary Figure 4-1 to 4-3.

      Although we have, for the consideraon of the space and intelligibility, characterized the 16 neighborhoods with only a few selected key marker genes, we have idenfied numerous addional novel markers, some of which are shown in dot plots in Figure 3 and Supplementary Figure 3, which can be used to characterize these groups further. We also provide all our sequencing data and our Padlock probe ISS data for anyone to download and analyze further, and we have made a web-based tool, CELLxGENE, available on our group’s website to facilitate exploraon of the different aspects of our dataset.

      Lastly, the authors performed molecular analysis of ventral midbrain cells in response to 6-OHDA exposure, which leads to the degeneraon of SN dopamine neurons, whereas VTA dopamine neurons are mainly unaffected. Based on this analysis, the authors idenfied several candidate genes that may be linked to neuronal vulnerability or resilience.

      Overall, the authors present a comprehensive mouse brain atlas detailing gene expression profiles of ventral midbrain cell populaons, which will be important to guide future studies that focus on understanding dopamine heterogeneity in health and disease.<br /> We thank the reviewer for poinng this out.

      Reviewer #2 (Public Review):

      In the manuscript by Salmani et al., the authors explore the transcriptomic characterizaon of dopamine neurons in order to explore which neurons are parcularly vulnerable to 6-OHDA-induced toxicity. To do this they perform single nucleus RNA sequencing of a large number of cells in the mouse midbrain in control animals and those exposed to 6-OHDA. This manuscript provides a detailed atlas of the transcriptome of various types of ventral midbrain cells - though the focus here is on dopaminergic cells, the data can be mined by other groups interested in other cell types as well.

      The results in terms of cell type classificaon are largely consistent with previous studies, though a more nuanced picture of cellular subtypes is portrayed here, a unique advantage of the large dataset obtained. The major advance here is exploring the transcriponal profile in the ventral midbrain of animals treated with 6-OHDA, highlighng potenal candidate genes that may influence vulnerability. This approach could be generalizable to invesgate how various experiences and insults alter unique cell subtypes in the midbrain, providing valuable informaon about how these smuli impact DA cell biology and which cells may be the most strongly affected.

      We appreciate these comments. We want to state that the study not only gives a more nuanced picture but goes far beyond previously published studies and provides a highly resolved and detailed atlas of mDA neurons. Thus, it clarifies poorly described diversity and idenfies enrely novel groups of diverse mDA neurons at the genome-wide gene expression level.

      Overall, the manuscript is relavely heavy on characterizaon and comparavely light on funconal interpretaon of findings. This limits the impact of the proposed work. It also isn't clear what the vulnerability factors may be in the neurons that die. Beyond the characterizaon of which neurons die - what is the reason that these neurons are suscepble to lesion? Also, the interpretaon of these findings is going to be limited by the fact that 6-OHDA is an injectable, and the effects depend on the accuracy of injecon targeng and the equal access of the toxin to access all cell populaons. Though the site of injecon (MFB) should hit most/all of the forebrain-projecng DA cells, the injecon sites for each animal were not characterized (and since the cells from animals were pooled, the effects of injecon targeng on the group data would be hard to determine in any case).

      We agree that the results are presented to provide a comprehensive and valuable resource rather than explaining molecular mechanisms. The reviewer points out that “what the vulnerability factors may be in the neurons that die” is unclear. However, our study was designed to answer the queson: What genes are enriched in clusters of mDA neurons that are parcularly likely to die aer toxic stress? Using single-cell analysis, we believe this queson had higher priority than atempng to idenfy gene expression changes occurring during the cell death process. We agree that we cannot answer why neurons are suscepble to lesions, only idenfy genes that correlate with either high or low sensivity. Thus, the genes we refer to as “vulnerability genes” and “resilience genes” are candidates for influencing differenal vulnerability. Hard evidence for such influence will require addional and extensive funconal analysis. As for the variability of injecon and the characterizaon of individual animals, we wish to menon the online interacve explorer available at htps://perlmannlab.org/resources/. It allows visualizaon of nuclei distribuon per territory and neighborhood for each mouse, making it easy to determine the cell loss rao and cell distribuon per animal. There is indeed variance in the proporons of intact/lesioned total nuclei per animal. This is also evident from the DAT autoradiographs shown for each lesioned animal and presented in Figure Supplement 5-1 A. Importantly, the relave UMAP distribuon of nuclei is quite similar between individual animals. To further invesgate this, we used Pearson’s Chi square test of independence with a conngency table for animals, each with two categorical variables as the proporon of nuclei from intact vs lesioned parts of the vMB (see added Supplementary figure 5-1 C ). This shows that – while there is a difference in the number of nuclei remaining aer lesioning – the relave distribuon among clusters and neighborhoods is similar between animals. We have clarified this point in the manuscript (see page 12 ).

      I am also not clear why the authors don't explore more about what the genes/pathways are that differenate these condions and why some cells are parcularly vulnerable or resilient. For example, one could run GO analyses, weighted gene co-expression network analysis, or any one of a number of analysis packages to highlight which genes/pathways may give rise to vulnerability or resilience. Since the manuscript is focused on idenfying cells and gene expression profiles that define vulnerability and resilience, there is much more that could have been done with this based on the data that the authors collected.

      We performed GO analysis for the genes upregulated and downregulated in the ML clusters (specific to the lesion condion) in the original manuscript (Please see figure supplement 7-1 C-E, and the newly added Supplementary file 10), but we agree with the reviewer that we could also have analyzed funconal categories of genes correlang with differenal vulnerability. Thus, we have used tools recently developed by Morabito et al., Cell Reports Methods (2023), and their hdWGCNA package to address this queson. This method is parcularly suitable for analyzing high-dimensional transcriptomics data such as single-cell RNA-seq or spaal transcriptomics. We calculated the coexpression network based on the lesioned nuclei of the mDA territories. Of the 9 co-expression modules calculated, one has the highest expression in Sox6 territory and has genes in common with the vulnerability module. Another co-expression module has genes in common with the resilience module and is most highly expressed in Otx2 and Ebf1 territories. We also did GO analysis for these co-expression modules and added addional GO analysis of the ML-enriched genes (see Supplementary Figure 7-1 D,E, the newly added Supplementary Figure 6-3, and the newly added Supplementary file 9). Text describing these addional analyses are menoned on page 15 and 17.

      In addition, we wish to emphasize our idenficaon of the genes we refer to as vulnerability and resilience modules in the previous version of the manuscript. Several of the genes were discussed in the previous version of the manuscript but we have now included more informaon on these genes, based on previously published studies and discuss their potenal funconal roles (see pages 22 & 23 in the Discussion).

      Another limitation of this study as presented is the missed opportunity to integrate it with the rich literature on midbrain dopamine (and non-dopamine) neuron subtypes. Many subtypes have been explored, with divergent funcons, and can usually be disnguished by either their projecon site, neurotransmiter identy, or both. Unfortunately, the projecon site does not seem to track parcularly well with transcriptomic idenes, aside from a few genes such as DAT or the DRD2 receptor. However, this could have been more thoroughly explored in this manuscript, either by introducing AAVretro barcodes through injecon into downstream brain sites, or through exisng evidence within their sequencing dataset. There are likely clear interpretaons from some of that literature, some of which may be more excing than others. For example, the authors note that vGluT2-expressing cells were part of the resilient territory. This might be because this is expressed in medially-located DA cells and not laterally-located ones, which tends to track which cells die and which don't.

      The manuscript consists of a comprehensive descripon of transcriponal diversity. Although of clear value, we believe that addional, comprehensive analysis that combines snRNAseq with, e.g., AAVretro barcodes must be done in a separate study. It should also be noted that we describe each territory and neighborhoods in the further detail in the Supplementary Results, which contains references to the relevant literature. In line with the comments, this secon has now been expanded with further references to relevant studies (see Supplementary Results related to Figure 4-figure supplements 1-3).

      It is not immediately clear why the authors used a relaxed gate for mCherry fluorescence in Figure 1. This makes it difficult to definively isolate dopaminergic neurons - or at least, neurons with a DATCre expression history. While the expression of TH/DAT should be able to give a fairly reliable idenficaon of these cells, the reason for this decision is not made clear in the text.

      We used a relaxed gang to ensure that we could capture nuclei expressing low levels of RFP, which we believe could be especially relevant for the lesioned dataset (see page 5). We did not find that it would be advantageous to use a more stringent gang that would risk losing all cells expressing no (or very low levels) RFP. Idenfying mDA neurons based on their typical markers is straighorward, as their transcriponal relaonship is evident from the expression profile of several markers, including transcripon factors such as Nr4a2, Pitx3, and En1. In addion, as pointed out in response to Reviewer #1, point 5, atypical DA neurons expressing Th and other mDA markers with no or low levels of Slc6a3 (DAT) were isolated. We believe the study is more complete by the inclusion of these cells. Moreover, we included a sufficiently large number of cells, which ensured a comprehensive analysis of mDA neurons in relaon to other cell types dissected from the ventral midbrain.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors state that a major advantage of their approach is that it prevents biased datasets when compared to methods that rely on capturing certain cell types. I was wondering if the authors could follow up on this topic with a more detailed descripon of their methodological advantages regarding potenal sampling bias. This is somewhat unclear to me, given that the results of the present study are largely consistent with previous work on this topic.

      As expanded on above (see response to the inial comment in the public review), we strongly disagree that there is litle novelty in our study. None of the previous studies come close to describing the mDA neuron populaon with a similar resoluon, which is unsurprising given the differences in the number of analyzed mDA neurons in this versus previous reports. We agree with the reviewer that our data is consistent with previous studies, when they are all combined. Thus, we idenfied mDA neuron groups that correspond (or roughly correspond) to major DA neuron groups idenfied in previous studies (see pages 8-14 in the Supplementary Results). However, the atlas presented here goes well beyond anything published in scope and resoluon. The diversity we define is comparable to findings that, with careful cross-paper analyses, can be stched together from previous single-cell studies. However, even such a combined analysis does not unravel the resoluon and diverse categorizaon of what we have demonstrated herein (16 neighborhoods in midbrain dopaminergic territories). Considering the well-established problems of dissociang and isolang whole neurons from adult brain ssue, this is likely due to sampling bias, resulng in an almost complete exclusion of some sub-populaons of neurons. We have added text on page 20 to clarify this point.

      (2) In the abstract, the authors state that their "results showed that differences between mDA neuron group could best be understood as a connuum without sharp differences between subtypes". However, I am not sure whether this is the most appropriate descripon of the authors' results, parcularly when looking at the schemac overview shown in Fig. 4F. To me, it seems more likely that genecally-defined DA subtypes overlap with discrete ventral midbrain subnuclei - parcularly in the case of Sox6-expressing cells, which are almost exclusively located in the SNc. In the case of genes that are specific for the VTA, there also seems to be a strong bias toward certain VTA subnuclei, although I agree that arguments can be made that there is some topographic organizaon along a dorso-ventral and medio-lateral gradient, which seems to be largely consistent with the anatomical locaon of projecon-defined dopamine neurons as described previously by Poulin et al., 2018 (Nature Neuroscience).

      What was meant by connuum must be interpreted in the context of the transcriponal landscape of mDA neurons and not their anatomical localizaon. As stated in the paper, the dendrogram depicon of mDA neurons’ transcriptome can be misinterpreted as an indicaon of sharp boundaries and discrete groups in transcriponal profiles. In contrast, we assert that differences between developmentally related mDA neurons are beter described as a connuum with areas in the gene expression landscape defined by the expression of shared genes but without sharp borders between them. We decided to name different areas within this connuum as “territories” at the higher hierarchical level and “neighborhoods” at the more highly resolved level. Hypothecally, such categorizaon can be even more fine-grained, but we find it unlikely that a resoluon beyond the neighborhood level is biologically relevant. As pointed out, the Sox6 territory is the territory that best qualifies as a disncve subtype, while mDA neurons in, e.g., the VTA consist of much higher and nuanced diversity. Importantly, all mDA neurons are much more related to each other than cell types lacking a common developmental origin, including hypothalamic DA neurons. Thus, our effort to define differences in such a gene expression connuum is, in our opinion, more accurate than conveying the message that the diversity consists of subtypes comparable in difference to other cell types that lack a close developmental relaonship with the mDA neuron populaon. Such disnct neuron types, despite using the same neurotransmiter as hypothalamic DA neurons, appear as disnct islands in the UMAP snRNA-seq landscape and typically harbor hundreds of differenally expressed genes. As pointed out in the Discussion, several other studies have noted similar difficules in defining different subtypes among related neurons in e.g. the cortex, striatum, and hippocampus (Kozareva et al., 2021; Saunders et al., 2018; Tasic et al., 2018; Yao et al., 2021). For example, Yao et al., 2021, used a similar hierarchical definion to avoid the implicaon that different groups (“neighborhoods” in this study) should be defined as disnct subtypes of neurons with obvious disncve funcons.

      (3) I recommend that the authors revise the introducon to include more current literature on this topic. The review by Bjoerklund and Dunnet, 2006, is very informave and important, but there is more current literature available that discusses anatomical, molecular, and funconal heterogeneity in the ventral midbrain. For example, it would be nice to incorporate recent work from the Awatramani lab on the mapping of the projecon of molecularly defined dopamine neurons (Poulin et al., 2018; Nature Neuroscience).

      We deliberately avoided including primary references to previously described diversity in the Introducon since numerous papers are relevant to cite. Instead, we refer to three essenal reviews, including the recent arcles from Awatramani and Pasterkamp. In the Supplementary Results related to Figure 4 (pages 8-14 in the Supplementary Results), we include many references and the Poulin 2018 paper. We believe that this is the appropriate place for a comprehensive discussion on anatomical, molecular, and funconal heterogeneity. In the revised manuscript's main body, we now emphasize that previous literature is discussed in the Supplementary Results (see page 11).

      (4) In Fig. 1C, the authors show a sample image demonstrang overlap between TH and mCherry, but this has not been quanfied. Similarly, there seem to be no sample images and quanficaon for the contralateral side that was exposed to 6-OHDA.

      The mouse lines used here (Dat-Cre and Rpl10a-mCherry) have been characterized before (Toskas et al., Science Advances 2022). The labelling colocalizes nearly fully with TH, with some excepons (see response below to point #5). We have now complemented with addional data showing an IHC image of one of the midbrain of a unilaterally lesioned mouse in Figure Supplement 5-1E.

      (5) The authors state that they focused their analysis on 33,052 nuclei expressing above-threshold levels of either Th OR Slc6a3. However, there seem to be cell populaons in the ventral midbrain of mice that express TH mRNA but not TH protein, and these cells do not seem to be bona fide dopamine neurons (see work from the Morales lab). Similarly, not all dopamine neurons may express DAT mRNA. I was wondering how these discrepancies may influence the authors' analysis and interpretaon.

      Indeed, the presence of cells lacking TH protein despite Th mRNA being expressed has been previously described. We also detected these cells across SNpc and VTA and now show these data as a newly added supplementary figure 2-1. In our dataset, the Gad2 territory, located in the ventromedial VTA, contains cells that express many typical mDA markers, such as Pitx3, but very low levels of TH protein. We have idenfied these based on Pitx3-EGFP and Gad2 mRNA co-expression (figure supplement 4-3). In other parts of VTA and SNpc, most cells seem to co-express Th mRNA and protein and are labeled with Dat-Cre. Also scatered in these areas, we could detect some rare mDA cells that lack TH protein. It should be noted that in our mDA territories other typical mDA neuron genes were expressed, such as Slc18a2, Ddc, Nr4a2 and Pitx3, and thus, they were not solely defined by the presence of Th and/or Slc6a3. Cells that do not have a history of DAT-expression, and therefore were not mCherry labelled, were also included in the analysis due to the relaxed gang used during FANS isolaon.

      (6) The sex and age of the mice that are used for the experiments are not stated in the Materials and Methods secon under "Mouse lines and genotyping".

      Thank you for pointing this out. This informaon has been added to the updated manuscript in the methods secon.

      Reviewer #2 (Recommendations For The Authors):

      I think that the manuscript can be significantly improved just by providing deeper analyses of the exisng data and linking them to the current state of the art in terms of defining midbrain dopamine neurons (e.g., by projecon). The dataset is likely richer than was explored in the manuscript and more valuable insights could be gleaned with a deeper analysis.

      Please see our response to Reviewer #2 (Public Review), regarding WGCNA analysis, and the comments on ML-based GO analysis, as well as the comments on the added secons in the supplementary results file.

    1. Author Response

      eLife assessment

      This study, which seeks to identify factors from the glial niche that support and maintain neural stem cells, unveils a novel role for ferritin in this process. Furthermore, the work shows that defects in larval brain development resulting from ferritin knockdown can be attributed to impaired Fe-S cluster activity and ATP production. These findings will be valuable to both oncologists and neurobiologists, though the supporting evidence is currently incomplete.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      This study unveils a novel role for ferritin in Drosophila larval brain development. Furthermore, it pinpoints that the observed defects in larval brain development resulting from ferritin knockdown are attributed to impaired Fe-S cluster activity and ATP production. In addition, knocking down ferritin genes suppressed the formation of brain tumors induced by brat or numb RNAi in Drosophila larval brains. Similarly, iron deficiency suppressed glioma in the mice model. Overall, this is a well-conducted and novel study.

      Strengths:

      Thorough analyses with the elucidation of molecular mechanisms.

      Weaknesses:

      Some of the conclusions are not well supported by the results presented.

      We really appreciate your review and positive feedback. As for weaknesses, we will try our best to solidate the related conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Zhixin and collaborators have investigated if the molecular pathways present in glia play a role in the proliferation, maintenance, and differentiation of Neural Stem Cells. In this case, Drosophila Neuroblasts are used as models. The authors find that neuronal iron metabolism modulated by glial ferritin is an essential element for Neuroblast proliferation and differentiation. They show that loss of glial ferritin is sufficient to impact on the number of neuroblasts. Remarkably, the authors have identified that ferritin produced in the glia is secreted to be used as an iron source by the neurons. Therefore iron defects in glia have serious consequences in neuroblasts and likely vice versa. Interestingly, preventing iron absorption in the intestine is sufficient to reduce NB number. Furthermore, they have identified Zip13 as another regulator of the process. The evidence presented strongly indicates that loss of neuroblasts is due to premature differentiation rather than cell death.

      Strengths:

      • Comprenhensive analysis of the impact of glial iron metabolism in neuroblast behaviour by genetic and drug-based approaches as well as using a second model (mouse) for some validations.

      • Using cutting-edge methods such as RNAseq as well as very elegant and clean approaches such as RNAi-resistant lines or temperature-sensitive tools

      • Goes beyond the state of the art highlighting iron as a key element in neuroblast formation as well as as a target in tumor treatments.

      Weaknesses:

      Although the manuscripts have clear strengths, there are also some strong weaknesses that need to be addressed.

      • Some literature is missing

      Thanks for your reminder and we will add the missing literatures.

      • In general, the authors succeeded but in some cases, the authors´ claims are not fully supported by the evidence presented and additional experiments are critical to discriminate among different hypotheses.

      We are greatly grateful to the reviewer for recognizing our work, and we will support our conclusions with further evidence.

      • Moreover, some potential flaws might be present in the analysis of cell death and mitochondrial iron.

      We used Caspase-3 or TUNEL to indicate the apoptosis signal. Further, we overexpressed the anti-apoptosis gene p35 to inhibit apoptosis and found no rescue effect on neuroblast number. The results of these experiments are consistent.

      It is difficult to determine the mitochondrial iron of neuroblast, so we used indirect methods to test ferroptosis, such as TEM and iron (or iron chelator) supplement. We will perform more experiments according to recommendations to determine that.

      Reviewer #3 (Public Review):

      In this manuscript, Ma et al seek to identify stem cell niche factors. They perform an RNAi screen in glial cells and screen for candidates that support and maintain neuroblasts (NBs) in the developing fly brain. Through this, they identify two subunits of ferritin, which is a conserved protein that can store iron in cells in a non-toxic form and release it in a controlled manner when and where required. They present data to support the conclusion that ferritin produced in glia is released and taken up by NBs where it is utilised by enzymes in the Krebs cycle as well as in the electron transport chain. In its absence from glia, NBs are unable to generate sufficient energy for division and therefore prematurely differentiate via nuclear prospero resulting in small brains. The work will be of interest to those interested in neural stem cells and their non-cell autonomous control by niches.

      The past decade has seen a growing appreciation of how glial cells support and maintain NBs during development.

      The authors' discovery of glial-derived ferritin providing essential iron atoms for energy production is interesting and important. They have employed a variety of genetic tools and assays to uncover how ferritin in glia might support NBs. This is particularly challenging because there are no direct ways of assaying for iron or energy consumption in a cell-specific manner.

      There are however instances where conclusions are drawn to support the story being developed without considering the equally plausible alternative explanations that should ideally be addressed.

      For example, the data supporting the transfer of ferritin from glia to NBs was weak given the misexpression system used; the Shi[ts] experiment was also not convincing (perhaps they have more representative images?).

      Thanks for your comment. We have the negative control, which excludes the misexpression. As for Shits experiment, we will substitute for more representative images.

      The iron manipulation experiments are in the whole animal and it is likely that this affects general feeding behaviour, which is known to affect NB exit from quiescence and proliferative capacity. The loss of ferritin in the gut and iron chelators enhancing the NB phenotype are used as evidence that glia provide iron to NB to support their number and proliferation. Since the loss of NB is a phenotype that could result from many possible underlying causes (including low nutrition), this specific conclusion is one of many possibilities.

      Iron chelator (or iron salt) feeding is a common method for investigating metal metabolism in Drosophila[1-3]. And other metal chelators (such as copper and zinc chelator) do not have similar phenotype (data not shown), which can partially exclude this possibility. Further, iron absorption was blocked by knockdown of ferritin only in the iron cell region[1], a small part of midgut, which phenocopied iron chelator feeding, implying iron deficiency is probably the main cause of the phenotype. More importantly, iron chelator only enhances the NB phenotype in the ferritin knockdown group, not the control group, suggesting iron deficiency results in the phenotype, which rules out other possibilities.

      Similarly, knockdown of the FeS protein assembly components phenocopy glial ferritin knock down. Since iron is so important for the TCA and the ETC, this is not surprising, but the similarities in the two phenotypes seem insufficient to say that it's glial ferritin that's causing the lack of iron in the NB and therefore resulting in loss of NBs.

      It is hard to get this conclusion just by FeS protein assembly components knockdown, so we just used “implied” to describe this result. However, we combine several results to address this issue, including iron chelator feeding, ferritin knockdown in the midgut, the enhancement of phenotype by iron chelators, aconitase activity, GO enrichment, KEGG enrichment, and Zip13. These results pointed to the interpretation that iron deficiency in NBs caused by glial ferritin defects leads to NB loss.

      Pros RNAi will certainly result in an increase in NB numbers because the loss of pros results in an inability of NB progeny to differentiate. This (despite the slight increase in nuclear pros) is not sufficient to infer that glial ferritin knockdown results in premature differentiation of NBs via nuclear pros.

      First, pros RNAi, brat RNAi, or numb RNAi can each result in an inability of NB progeny to differentiate, respectively[4-6]. If the rescue of NB number by pros RNAi mainly relies on the differentiation block of NB progeny, brat RNAi or numb RNAi is expected to similarly rescue the NB number. However, our results showed that only pros RNAi could rescue the NB number, while brat RNAi or numb RNAi could not.

      Secondly, nuclear Pros represses genes required for self-renewal and is also required to activate genes for terminal differentiation[7]. Thus, Pros is kept in the cytoplasm and remains almost undetectable in the nuclei in normal NBs[8]. However, we observed the detectable Pros in the nuclei of some NBs after glial ferritin knockdown, and the NB number with detectable nuclear Pros was significantly increased when compared to control.

      Altogether, we conclude that NBs tend to undergo premature differentiation after glial ferritin knockdown.

      I recognise these are challenging to prove irrefutably, however, the frequency of such expansive interpretations of data is of concern.

      (1) Tang X, Zhou B. Ferritin is the key to dietary iron absorption and tissue iron detoxification in Drosophila melanogaster. FASEB J, 2013,27(1):288-98

      (2) Xiao G, Liu ZH, Zhao M, et al. Transferrin 1 Functions in Iron Trafficking and Genetically Interacts with Ferritin in Drosophila melanogaster. Cell Rep, 2019,26(3):748-58 e5

      (3) Mukherjee C, Kling T, Russo B, et al. Oligodendrocytes Provide Antioxidant Defense Function for Neurons by Secreting Ferritin Heavy Chain. Cell Metab, 2020,32(2):259-72 e10

      (4) Knoblich JA, Jan LY, Jan YN. Asymmetric Segregation of Numb and Prospero during Cell-Division. Nature, 1995,377(6550):624-7

      (5) Zacharioudaki E, Magadi SS, Delidakis C. bHLH-O proteins are crucial for neuroblast self-renewal and mediate Notch-induced overproliferation. Development, 2012,139(7):1258-69

      (6) Bello B, Reichert H, Hirth F. The brain tumor gene negatively regulates neural progenitor cell proliferation in the larval central brain of. Development, 2006,133(14):2639-48

      (7) Choksi SP, Southall TD, Bossing T, et al. Prospero acts as a binary switch between self-renewal and differentiation in Drosophila neural stem cells. Developmental Cell, 2006,11(6):775-89

      (8) Spana EP, Doe CQ. The Prospero Transcription Factor Is Asymmetrically Localized to the Cell Cortex during Neuroblast Mitosis in Drosophila. Development, 1995,121(10):3187-95

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for recognizing the importance of our work on transcription-independent early recovery of proteasome activity. We also thank them for their thoughtful criticisms and suggested improvements, which we addressed in the revised version as described below.

      The reviewers and editors asked for data to support the model that early recovery of proteasome activity is due to accelerated proteasome assembly. This model is backed by published data that proteasome assembly intermediates increase dramatically in cells treated with proteasome inhibitors (Fig. 6 in Ref. 46 of the revised manuscript). We expanded the discussion of this paper in a paragraph that describes our model. Another key experiment to confirm this model would be to determine what fraction of nascent polypeptides is degraded within minutes after synthesis, which is not trivial, and Ibtisam ran out of time to conduct these experiments because she had to graduate in spring before the expiration of her visa. This type of experiment usually uses metabolic labeling by a heavy or radioactive amino acid that always includes a prior depletion of a non-labeled amino acid. However, the fundamental flaw of this approach, which is not recognized by the scientific community, is that depletion of an amino acid stresses cells and reduces the rate of protein synthesis, especially if this amino acid is methionine. Thus, this model is not easy to test, and should be considered a speculation. We therefore moved the description of this model, together with Fig. 4, into a separate "Ideas and Speculations" section and removed this model's description from the abstract.

      Reviewer 1 raised the possibility that a background band detected on the western blot of DDI2 KO cells could be a highly homologous protease DDI1. This is highly unlikely because, according to Protein Atlas, DDI1 is selectively expressed in the testis and is not expressed in the cell lines we used. Reviewer 1 also suggested that we should base our conclusion on Nrf1 KD, which we de-facto did because we confirmed that DDI2 KD blocks Nrf1 activation (Fig. 1d).

      In response to Reviewer 1 critiques regarding the presentation of proteasome subunits stability data in Fig. 4 (Ref. 45 of the revised manusript), we removed PSMB8 and replaced chaperons with the subunits of the 26S base. We changed color palettes, symbols, and axis scales to improve clarity.

      We acknowledged in the discussion that our work did not exclude DDI2 role in the recovery of proteasome after repeated pulse treatments, as suggested by Reviewer 1.

      We agree with Reviewer 2 that using “proteasome levels” is inaccurate when describing our activity measurement data. However, in the manuscript, we use "levels" only when discussing data in the literature. We believe measuring activity and not the total levels is more important because not all proteasomes are active, e.g., latent 20S proteasome core particles.

      Reviewer 3 expressed concern that our conclusions were based on data in HAP1 cells, which are haploid, and appear not very sensitive to proteasome inhibitors. This is why we used DDI2 KD in MDA-MB-231 and SUM149 cells, which are highly sensitive to proteasome inhibitors (Weyburne et al., Ref. 11). In our experience, full extent of proteasome inhibitor cytotoxicity is not revealed until 48hr after treatments, and viability determined at 12hr and 24hr as on Fig. 1c should not be used to determine sensitivity (it was used for activity assay normalization). We added a new supplementary figure showing that HAP1 cells are as sensitive to proteasome inhibitors as MDA-MD-231 cells when cell viability is assayed 48hr after treatment (new Fig. S2). Another panel on this new figure demonstrates that the baseline proteasome activity is very similar in HAP1, MD-MB-231 and SUM149 cells. We also added data demonstrating that inactivation of DDI2 by mutation does not change the recovery of proteasome activity in HCT-116 cells (new Fig. 1g). Recovery in MDA-MB-231, SUM149, and HCT-116 cells was measured at 18hr, which is still within the 12 – 24hr window when other investigators observed partially DDI2-dependent recovery.

      We have conducted an experiment in which we followed activity recovery for up to 72hr. We found that activity plateaued at 24hr and opted against the repeat because there were no changes. We feel that the manuscript should not include one biological replicate data. The fact that the recovery is incomplete and that cells seem to survive with lower levels of proteasome activity is interesting; however, investigating the molecular basis for this phenomenon is beyond the scope of the current project.

      We were not disputing the conclusions of previous studies that DDI2/Nrf1 is responsible for enhanced expression of proteasomal mRNA in cells continuously treated with proteasome inhibitors. In fact, we confirmed that pulse-treatment causes similar increase (Fig. 2b). As for papers that measured activity recovery after pulse treatment, we objectively discuss our results in the context of these papers. In response to Reviewers' recommendations and minor points:

      • We reviewed the revised version carefully to eliminate spelling and grammatical errors and typos.

      • We no longer refer to DDI2 as a novel protease, as suggested by Reviewer 1.

      • We agree with Reviewer 2 that our CHX results do not necessarily mean that recovery involves translation of proteasomal mRNAs, and we now conclude that proteasome recovery requires protein synthesis.

      • We revised Fig. 1c, 3a and 4a to improve clarity.

      • We have stated in the caption that data in Fig. 4a comes from Table S4 in Ref. 45.

      • We accepted an excellent suggestion of Reviewer 3 to change "recovery" to "early recovery" in the title.

      • Regarding Reviewer 3 request to assay activity recovery at additional time points before 12h, this was done in the cycloheximide experiment in Fig. 3A.

      • Even if we assume that the differences in the observed recovery activity in MDA-MB-231 cells (Fig. 1f) are statistically significant, which may implicate DDI2 involvement in the activity recovery, the percentage is still small, suggesting that most activity recovery is DDI2-independent.

      • We toned down the statement "the present findings suggest that DDI2 desensitizes cells to PI by a different mechanism," replacing "suggest" with "raise a possibility".

      • We indicated that only Bortezomib is approved for mantle cell lymphoma.

      • We changed the description of clinical dosing as suggested by Reviewer 3. We added a reference on PK of subcutaneous bortezomib (Ref. 9), even though the review we quoted (Ref. 7) discussed subcutaneous dosing.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of the points that were made. However, despite some things that may well be beyond the scope, I would like to insist on a few small points:

      Point 1: If the authors have conducted a gross analysis of cardiac morphology by histology already, they should include this data in the manuscript and comment with 1-2 sentences that "cardiac healing"..."is unlikely influenced by developmental defects".

      We agree with the reviewer that this analysis is important. Therefore, we are currently conducting an in-depth analysis of the cardiac phenotype of different mouse strains lacking distinct subpopulations of cardiac macrophages in development and non-stimulated (baseline) conditions, including functional, metabolic and even electrophysiological aspects. These analysis will also include FIRE mice. While a gross analysis in this mouse strain did not show pathologic aspects, we look forward to the very detailed tissue characterization before publishing any data from a first basic analysis.

      Point 7: There is still no legend in Figure 6: what is read? What is blue?

      We added the respective legend in the figure.

      Point 8: Please add the information on the background of mice used for the different FIRE mice into the methods part of the paper

      We added the information in the Methods Part (lines 344-347).

      Reviewer #2 (Recommendations For The Authors):

      The authors have responded to all questions. I have no further comments and congratulate the authors on their work.

      We thank the reviewer for their important questions and the constructive feedback.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study highlights a valuable finding that chromosomal instability can change immunes responses, in particular macrophages behaviours. The convincing results showing that the use of CD47 targeting and anti-Tyrp1 IgG can overcome changes in immune landscape in tumors and prolong survival of tumor-bearing mice. These findings reveal a new exciting dimension on how chromosomal instability can influence immune responses against tumor.

      We thank the Editors for their enthusiasm and appreciation for this work. We also want to highlight our thanks for their careful reading, support, and patience while handling this manuscript. While this work provides useful insight into potential therapeutic implications of chromosomal instability in the macrophage immunotherapy field, we also hope it elucidates some novel basic science to further explore how chromosomal instability has such interesting effects on the immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Hayes et al. explored the potential of combining chromosomal instability with macrophage phagocytosis to enhance tumor clearance of B16-F10 melanoma. However, the manuscript suffers from substandard experimental design, some contradictory conclusions, and a lack of viable therapeutic effects.

      The authors suggest that early-stage chromosomal instability (CIN) is a vulnerability for tumorigenesis, CD47-SIRPa interactions prevent effective phagocytosis, and opsonization combined with inhibition of the CD47-SIRPa axis can amplify tumor clearance. While these interactions are important, the experimental methodology used to address them is lacking.

      Reviewer #1 (Recommendations For The Authors):

      First, early stages of the tumor are essentially being defined as before implantation. In all cases, the tumor cells were pre-treated with MPS1i or had a genetic knockout of CD47. This makes it difficult to see how this would translate clinically.

      We greatly appreciate the Reviewer’s interest in the topic and its potential, but our manuscript makes no claims of immediate clinical translation. Chromosomal instability (CIN) studies have to date not yet discovered or described whether and how CIN can affect macrophage function. To our knowledge, this is the first study to begin such characterizations with various MPS1i drugs to induce CIN. Many variations of the approach can be envisioned for future studies.

      Our Results include some key studies of cancer cells with wildtype levels of CD47- including in vivo tumor elimination (Fig.3E). Nonetheless, we do conduct some of our studies in a CD47 knockout context to remove this “brake” that generally impedes phagocytosis, with our goal being to better understand how CIN affects phagocytosis. As cited to some extent in our Introduction, there are many efforts in clinical trials to disrupt this macrophage checkpoint and others focused on macrophage immunotherapy. Whether CIN can be induced by clinically translatable drugs and specifically in cancer cells is beyond the scope of our studies.

      I would like to see the amount of CIN that occurs in WT B16F10 over the course of tumorigenesis (ie longer than 5 days). This is because I would assume that CIN would eventually occur in the WT B16F10 regardless of whether MPS1i is being given. And if that's the case, then the initiation of CIN at day 10 after implantation (for example) would still be considered "early stage" CIN. If the therapy is then initiated at this point, does the effect remain? Or put differently, how would the authors propose to induce the appropriate level of CIN in an established tumor? Why is pretreatment necessary?

      Untreated B16F10 cells fail to produce micronuclei over 12 days compared to MPS1i treated cells – as shown in a newly added panel in Fig. S1:

      Author response image 1.

      This helps support our decision to pre-treat cells with MPS1i to stimulate genomic instability and is described in the first section of Results:

      “…we saw >10-fold increases of micronuclei over the cell line’s low basal level (~1% of cells), and two other MPS1i inhibitors AZ3146 and BAY12-17389 confirm such effects (Fig. S1A). Micronuclei-positive cells can persist up to 12 days after treatment (Fig. S1B), while control cells maintain the low basal levels. The results suggest pre-treatment with MPS1i can simulate CIN in an experimental context even for 1-2 weeks, which may not typically occur at the same frequency during early tumor growth.

      It is known that PD-1 expression inhibits tumor-associated macrophage phagocytosis (Nature, 2017). Does MSP1i (sic) treatment affect the population of PD-1+ tumor macrophages in vivo?

      We thank the Reviewer for bringing up an interesting point.

      Using the same tumor RNA-seq data that was used for Fig.1E, a heatmap of expression of PD-1 (gene Pdcd1) shows no consistent trend with MPS1i:

      Author response image 2.

      We also examined whether the secretome from CIN-afflicted cancer cells affect PD-1 expression in cultured macrophages, but we did not register any reads from our single-cell RNA-sequencing experiment for Pdcd1 in any of the macrophage clusters from Fig. 1H.

      Author response image 3.

      The Discussion section now includes a statement on this topic:

      “…B16F10 tumors are poorly immunogenic, do not respond to either anti-CD47 or anti-PD-1/PDL1 monotherapies, and show modest and variable cure rates (~20-40%; Dooling et al., 2023; Hayes et al., 2023) even when macrophages have been made maximally phagocytic according to notions above. We should note here that our whole-tumor RNA-seq data (Fig.1E) shows expression of PD-1 (gene Pdcd1) follows no consistent trend upon MPS1i treatment, and that Pdcd1 was not detected in our scRNA-seq data for macrophage cultures (Fig.1G) – motivating further study.”

      The authors must explain how the proposed therapy works since MPS1i increases tumor (cell) size, making it difficult for macrophages to phagocytose the tumor cells. It also reduces or suppresses Tyrp1 expression on the cancer cells, making it harder to opsonize. Since these were two main points for the rationale of this study, the authors need to reconcile them.

      We appreciate this comment and have re-organized this Results section to try to minimize confusion:

      CIN-afflicted, CD47-knockout tumoroids are eliminated by Macrophages

      To assess functional effects of macrophage polarization, we focused on a 3D “immuno-tumoroid” model in which macrophage activity can work (or not) over many days against a solid proliferating mass of cancer cells in non-adherent roundbottom wells (Fig. 2A) (Dooling et al., 2023). We used CD47 knockout (KO) B16F10 cells, which removes the inhibitory effect of CD47 on phagocytosis, noting that KO does not perturb surface levels of Tyrp1, which is targetable for opsonization with anti-Tyrp1 (Fig. S2A). BMDMs were added to pre-assembled tumoroids at a 3:1 ratio, and we first assessed surface protein expression of macrophage polarization markers. Consistent with our whole-tumor bulk RNA-sequencing and also single-cell RNA-sequencing of BMDM monocultures (Fig. 1E, 1I-J), BMDMs from immunotumoroids of MPS1i-treated B16F10 showed increased surface expression of M1-like markers MHCII and CD86 while showing decreased expression of M2-like markers CD163 and CD206 (Fig. 2B-C). Although these macrophages seemed poised for anticancer activity, the cancer cells showed decreased binding of anti-Tyrp1 (Fig. S2B) and ~20% larger size in flow cytometry (Fig. S2C). The latter likely reflects cytokinesis defects and poly-ploidy as acute effects of CIN induction (Chunduri & Storchová, 2019; Mallin et al., 2022). Such cancer cell changes might explain why standard 2D phagocytosis assays show BMDMs attached to rigid plastic engulf relatively few anti-Tyrp1 opsonized cancer cells pretreated with MPS1i versus DMSO (Fig. S2D). In such cultures, BMDMs use their cytoskeleton to attach and spread, competing with engulfment of large and poorly opsonized targets. Noting that tumors in vivo are not as rigid as plastic, our 3D immunotumoroids eliminate attachment to plastic, and large numbers of macrophages can cluster and cooperate in engulfing cancer cells in a cohesive mass (Dooling et al., 2023). We indeed find CIN-afflicted tumoroids are eliminated by BMDMs regardless of anti-Tyrp1 opsonization (Fig. 2D-E), whereas anti-Tyrp1 is required for clearance of DMSO control tumoroids (Fig. 2D, S3B). Imaging also suggests that cancer CIN stimulates macrophages to cluster (compare Day-4 in Fig. 2D), which favors cooperative phagocytosis of tumoroids (Dooling et al., 2023), and occurs despite the lack of cancer cell opsonization and their larger cell size. The 3D immunotumoroid results with induced CIN are thus consistent with a more pro-phagocytic M1-type polarization (Fig.1J and 2B,C).

      The authors used varying numbers of tumor cells for the in vivo portions of the study; the first half of the manuscript uses 500,000 cells, while the latter half uses 200,000 cells. Why?

      The reasons for the difference in numbers is now clarified in the Methods:

      For assessing immune infiltrates in early stages of tumor engraftment, when tumors are still small, we used a relatively high number of tumor cells (500,000 cells in Fig. 1D and Fig. 2F-G) to achieve sufficient cell numbers after dissociating the tumors, particularly for the slow-growing MPS1i-treated tumors. More specifically, with dissection, collagenase treatment, passage through a filter to remove clumps, we would lose many cells, and yet needed 100,000 viable cells or more for bulk RNA-seq suspensions and for flow cytometry measurements. For all other studies, 200,000 cancer cells were injected,

      The authors need to report the tumor volumes and the total number of cells isolated from the day five tumors to avoid grossly inflating the effect (i.e. Fig 2G and 4G).

      We have added relevant numbers in the Methods:

      For day 5 post-challenge measurements, 100,000 to 200,000 live cells were collected. For in vivo tumor infiltrate studies in re-challenged mice, 10 million live cells were collected.

      Also, regarding tumor sizes and cell numbers, we have previously published relevant measurements in assessments of tumor growth. Please see:

      Brandon H Hayes, Hui Zhu, Jason C Andrechak, Lawrence J Dooling, Dennis E Discher, Titrating CD47 by mismatch CRISPR-interference reveals incomplete repression can eliminate IgG-opsonized tumors but limits induction of antitumor IgG, PNAS Nexus, Volume 2, Issue 8, August 2023, pgad243, https://doi.org/10.1093/pnasnexus/pgad243

      Dooling, L.J., Andrechak, J.C., Hayes, B.H. et al. Cooperative phagocytosis of solid tumours by macrophages triggers durable anti-tumour responses. Nat. Biomed. Eng 7, 1081–1096 (2023). https://doi.org/10.1038/s41551-023-01031-3

      In the present study, similar tumor growth curves are provided for transparency, but the Kaplan-Meier curves as the key pieces of data in Fig. 3-4. Lastly, regarding reporting total cell number harvested, we based our experiments on previously accepted measurements that also reported numbers out of total harvested cells. See:

      Cerezo-Wallis, D., Contreras-Alcalde, M., … Soengas, M.S., 2020. Midkine rewires the melanoma microenvironment toward a tolerogenic and immune-resistant state. Nat Med 26, 1865–1877. https://doi.org/10.1038/s41591-020-1073-3

      The figure titles need to be revised. For example, the title of Figure 1 claims that "MPS1i-induced chromosomal instability causes proliferation deficits in B16F10 tumors." However, the evidence provided is weak. The authors only present GSEA analysis of proliferation and no functional evidence of impairment. The authors need to characterize this proliferation deficit using in vitro studies and functional studies of macrophage polarization. I would suggest proliferation assays (crystal violet, MTT, Incucyte, etc) to measure the B16 growth over time with MPS1i treatment.

      We thank the Reviewer for pointing this out. In Fig.1 we have minimized information regarding proliferation because it is later quantified in Figs.2D,E, S3, and 3D-i:

      Fig.1F legend: Top downregulated hallmark gene sets in tumors comprised of MPS1i-treated B16F10 cells, showing downregulated DNA repair, cell cycle, and growth-related pathways, consistent with observations of slowed growth in culture and in vivo – as subsequently quantified.

      Then the authors could collect the tumor supernatant to culture with macrophages and determine polarization in vitro. I would also like to see functional studies of macrophage polarization (suppression assays, cytokine production, etc). Currently, the authors provide no functional studies.

      Fig.2B,C provides functional surface marker measurements of in vitro polarization toward anti-cancer M1 macrophages by MPS1i-pretreated tumor cells, consistent with gene expression in Fig.1G-J. Function is further shown as ant-cancer activity in Fig.2D,E, as now stated explicitly in the text:

      “…In our 3D tumoroid in vitro assays, we found that macrophages can suppress the growth of chromosomally unstable tumoroids and clear them, surprisingly both with and without anti-Tyrp1 (Fig. 2D-E), regardless of MPS1i concentration used for treatment. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more pro-phagocytic. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more prophagocytic.”

      The authors claim that macrophages are the key effector cells, but they need to provide evidence for this claim.

      Other immune cells clearly contribute to the presented results because the IgG must eventually come from B cells. The text has been edited to indicate 'macrophages are key initiating-effector cells', and some evidence for this is the maximal survival of (WT B16 + Rev tumors) in Fig.3E upon treatment with Marrow Macrophages plus Macrophage-relevant SIRPa blockade and Macrophage-relevant IgG (via FcR). T cells do not have SIRPa or FcR.

      They can deplete macrophages and T and B cells to determine whether the effect remains or is ablated. This is the only definitive way to make this claim.

      To determine whether T and B cells might also be key initiating-effector cells, new experiments were done with mice depleted of T and B cells (per Fig.S9, below). We compared the growth of MPS1i vs DMSO treatments in these mice to results in mice with T and B cells (which should replicate our previous results in Fig.3D-i). We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells compared to mice with T and B cells. We have added to the text our conclusion that: T and B cells are not key initiating-effector cells. Whereas B cells are effector cells at least in terms of eventually making anti-tumor IgG, our results show that macrophages are key initiating-effector cells because macrophages certainly affect the growth of (WT B16 + Rev tumors) when more are added (Fig.3E).

      Author response image 4.

      Growth of CIN-afflicted wild-type (WT) tumors in T- and B-cell deficient mice and T- and B-cell replete mice. Similar growth delays for MPS1i-pretreated B16F10 cells in T- and B-cell deficient NSG mice and immunocompetent C57BL/6 mice. Both types of mice have functional macrophages. Parallel studies in vivo were done with WT B16F10 ctrl cells cultured 24 h in 2.5 μM MPS1i (reversine or DMSO, then washed 3x in growth media for 5 min each and allowed to recover in growth media for 48 h. 200,000 cells in 100 uL PBS were injected subcutaneously into right flanks, and the standard size limit was used to determine survival curves. The C57BL/6 experiments were done independently here (by co-author L.J.D.) from the similar results (by B.H.H.) shown in Fig.3D-i, which provides evidence of reproducibility.

      The Results section final paragraph describes all of this:

      Macrophages seem to be the key initiating-effector cells, based in part on the following findings. First, macrophages with both SIRPα blockade and FcR-engaging, tumor-targeting IgG maximize survival of mice with WT B16 + Rev tumors (Fig. 3E) – noting that macrophages but not T cells express SIRPα and FcR’s. Despite the clear benefits of adding macrophages, to further assess whether T and B cells are key initiating-effector cells, new experiments were done with mice depleted of T and B cells. We compared the growth delay of MPS1i versus DMSO treatments in these mice to the delay in fully immunocompetent mice with T and B cells – with all studies done at the same time. We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells when compared to immunocompetent C57 mice (Fig.S9). We conclude therefore that T and B cells are not key initiating-effector cells. At later times, B cells are likely effector cells at least in terms of making anti-tumor IgG, and T cells in tumor re-challenges are also increased in number (Fig. 4G-ii). We further note that in our earlier collaborative study (Harding et al., 2017) WT B16 cells were pre-treated by genome-damaging irradiation before engraftment in C57 mice, and these cells grew minimally – similar to MPS1i treatment – while untreated WT B16 cells grew normally at a contralateral site in the same mouse. Such results indicate that T and B cells in C57BL/6 mice are not sufficiently stimulated by genome-damaged B16 cells to generically impact the growth of undamaged B16 cells.

      Reviewer #2 (Public Review):

      Harnessing macrophages to attack cancer is an immunotherapy strategy that has been steadily gaining interest. Whether macrophages alone can be powerful enough to permanently eliminate a tumor is a high-priority question. In addition, the factors making different tumors more vulnerable to macrophage attack have not been completely defined. In this paper, the authors find that chromosomal instability (CIN) in cancer cells improves the effect of macrophage targeted immunotherapies. They demonstrate that CIN tumors secrete factors that polarize macrophages to a more tumoricidal fate through several methods. The most compelling experiment is transferring conditioned media from MSP1 inhibited and control cancer cells, then using RNAseq to demonstrate that the MSP1-inhibited conditioned media causes a shift towards a more tumoricidal macrophage phenotype. In mice with MSP1 inhibited (CIN) B16 melanoma tumors, a combination of CD47 knockdown and anti-Tyrp1 IgG is sufficient for long term survival in nearly all mice. This combination is a striking improvement from conditions without CIN.

      Like any interesting paper, this study leaves several unanswered questions. First, how do CIN tumors repolarize macrophages? The authors demonstrate that conditioned media is sufficient for this repolarization, implicating secreted factors, but the specific mechanism is unclear. In addition, the connection between the broad, vaccination-like IgG response and CIN is not completely delineated. The authors demonstrate that mice who successfully clear CIN tumors have a broad anti-tumor IgG response. This broad IgG response has previously been demonstrated for tumors that do not have CIN. It is not clear if CIN specifically enhances the anti-tumor IgG response or if the broad IgG response is similar to other tumors. Finally, CIN is always induced with MSP1 inhibition. To specifically attribute this phenotype to CIN it would be most compelling to demonstrate that tumors with CIN unrelated to MSP1 inhibition are also able to repolarize macrophages.

      Overall, this is a thought-provoking study that will be of broad interest to many different fields including cancer biology, immunology and cell biology.

      We thank the Reviewer for their enthusiastic and positive comments toward the manuscript.

      Our main purpose with this study has been discovery science oriented and mechanistic, with implications for improving macrophage immunotherapies. More experimentation needs to be done to further understand how this positive immune response emerges. However, we could address whether CIN enhances or not the anti-tumor IgG response by quantitative comparisons to our two other recent studies, and we conclude that it does not per new edits in the Abstract and the Results. See attached PPT for full details and comparison.

      Abstract:

      “CIN does not greatly affect the level of the induced response but does significantly increase survival.”

      “…these results demonstrate induction of a generally potent anti-cancer antibody response to CIN-afflicted B16F10 in a CD47 KO context. Importantly, comparing these sera results for CINafflicted tumors to our recent studies of the same tumor model without CIN (Dooling et al., 2022; Hayes et al., 2022), we find similar levels of IgG induction (e.g. ~100-fold above naive on average for IgG2a/c), similar increases in phagocytosis by sera opsonization (e.g. equivalent to antiTyrp1), and similar levels of suppressed tumoroid growth – including the variability.

      However, median survival increased (21 days) compared to their naïve counterparts (14 days), supporting the initial hypothesis of prolonged survival and consistent not only with past results indicating major benefits of a prime-&-boost approach with anti-Tyrp1 (Dooling et al., 2022) but also with the noted similarities in induced IgG levels.”

      Future studies could certainly focus on trying to identify what secreted factors might be inducing the M1-like polarization (using ELISA assays for cytokine detection, for example). This could be important because a main finding here is that we achieve nearly a 100% success rate in clearing tumors when we combine CD47 ablation and IgG opsonization with cancer cell CIN. Previous studies were only able to achieve about 40% cures in mice when working with CD47 disription and IgG opsonization alone, suggesting CIN in this experimental context does improve macrophage response.

      Lastly, we agree with the Reviewer that future studies should also address how CIN in general (not MPS1i-induced) affects tumor growth. The final paragraph of our Discussion at least cites support for consistent effects of M1-like polarization:

      “The effects of CIN and aneuploidy in macrophages certainly requires further investigation. We did publish recently that M1-like polarization of BMDMs with IFNg priming is sufficient to suppress growth of B16 tumoroids with anti-Tyrp1 opsonization more rapidly than unpolarized/unprimed macrophages and much more rapidly than M2-like polarization of BMDMs with IL4 (Extended Data Fig.5a in Dooling et al., 2023); hence, anti-cancer polarization contributes in this assay.

      While the secretome from MPS1i-treated cancer cells has been found to trigger…”

      Nonetheless, we can only speculate that there is a threshold of CIN reached by a certain timepoint in tumor engraftment and growth. Natural CIN might not be enough, so we pursued a pharmacological approach consistent with ongoing pre-clinical studies (https://doi.org/10.1158/1535-7163.MCT-15-0500). Future studies should consider trying knockdown models to gradually accrue CIN in tumors or using more relevant pharmacological drugs that are known to induce CIN not associated with the spindle. We believe, however, that these are larger questions on their own and are beyond the scope of the foundational discoveries in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      None

      We again thank the Reviewer for their support and enthusiasm for the manuscript. We made some additional changes and more data to address questions posed by the other Reviewer that we hope you find to help the manuscript further.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting manuscript that extends prior work from this group identifying that a chemovar of Cannabis induces apoptosis of T-ALL cells by preventing NOTCH1 cleavage. Here the authors isolate specific components of the chemovar responsible for this effect to CBD and CBDV. They identify the mechanism of action of these agents as occurring via the integrated stress response. Overall the work is well performed but there are two lingering questions that would be helpful to address as follows:

      • Exactly how CBD and CBDV result in the upregulation of the TRPV1/integrated stress response is unclear. What is the most proximal target of these agents that results in these changes?

      The interaction of CBD and CBDV with TRPV1 has been thoroughly investigated by previous studies in the field. A few prominent examples are:

      (1) De Petrocellis, Luciano, Alessia Ligresti, Aniello Schiano Moriello, Marco Allarà, Tiziana Bisogno, Stefania Petrosino, Colin G. Stott, and Vincenzo Di Marzo. "Effects of cannabinoids and cannabinoid‐enriched Cannabis extracts on TRP channels and endocannabinoid metabolic enzymes." British journal of pharmacology 163, no. 7 (2011): 1479-1494.

      (2) Muller, Chanté, Paula Morales, and Patricia H. Reggio. "Cannabinoid ligands targeting TRP channels." Frontiers in molecular neuroscience 11 (2019): 487.

      (3) Iannotti, Fabio Arturo, Charlotte L. Hill, Antonio Leo, Ahlam Alhusaini, Camille Soubrane, Enrico Mazzarella, Emilio Russo, Benjamin J. Whalley, Vincenzo Di Marzo, and Gary J. Stephens. "Nonpsychotropic plant cannabinoids, cannabidivarin (CBDV) and cannabidiol (CBD), activate and desensitize transient receptor potential vanilloid 1 (TRPV1) channels in vitro: potential for the treatment of neuronal hyperexcitability." ACS chemical neuroscience 5, no. 11 (2014): 1131-1141.

      (4) Costa, Barbara, Gabriella Giagnoni, Chiara Franke, Anna Elisa Trovato, and Mariapia Colleoni. "Vanilloid TRPV1 receptor mediates the antihyperalgesic effect of the nonpsychoactive cannabinoid, cannabidiol, in a rat model of acute inflammation." British journal of pharmacology 143, no. 2 (2004): 247-250.

      (5) de Almeida, Douglas L., and Lakshmi A. Devi. "Diversity of molecular targets and signaling pathways for CBD." Pharmacology research & perspectives 8, no. 6 (2020): e00682.

      (6) Anand, Uma, Ben Jones, Yuri Korchev, Stephen R. Bloom, Barbara Pacchetti, Praveen Anand, and Mikael Hans Sodergren. "CBD effects on TRPV1 signaling pathways in cultured DRG neurons." Journal of Pain Research (2020): 22692278.

      Similarly, other works have demonstrated the link between TRPV1 and the integrated stress response pathway (see below). These studies suggested increased reactive oxygen species (ROS) production, Cyclooxygenase-2 (COX-2) enzyme, as well as other stressors, lead to modulation of intracellular calcium levels by TRPV1.

      (1) Ho, Karen W., Nicholas J. Ward, and David J. Calkins. "TRPV1: a stress response protein in the central nervous system." American journal of neurodegenerative disease 1, no. 1 (2012): 1.

      (2) de la Harpe, Amy, Natasha Beukes, and Carminita L. Frost. "CBD activation of TRPV1 induces oxidative signaling and subsequent ER stress in breast cancer cell lines." Biotechnology and Applied Biochemistry 69, no. 2 (2022): 420-430.

      (3) Soliman, Eman, and Rukiyah Van Dross. "Anandamide‐induced endoplasmic reticulum stress and apoptosis are mediated by oxidative stress in nonmelanoma skin cancer: Receptor‐independent endocannabinoid signaling." Molecular Carcinogenesis 55, no. 11 (2016): 1807-1821.

      • Related to the above, all experiments to confirm the mechanism of action of CBD/CBDV rely on chemical agents, whose precise targets are not fully clear in some cases. Thus, some use of genetic means (such as by knockout of TRPV1, ATF4) to confirm the dependency of these pathways on drug response and NOTCH cleavage would be very helpful.

      Knockdown experiments and inhibition with inhibitors are two different approaches to studying the function of a specific gene or protein. Each method has its advantages and limitations. We initially attempted to knock-down CHAC1, but only managed to silence ~50% (Incomplete knockdown). Following treatment of MOLT4 cells with the whole extract, we observed only a partial downregulation in the mRNA expression of the Notch intracellular domain (NICD) (left panel), which could account for the incomplete rescue from the extract-induced death (right panel). We therefore turned to confirm the signaling pathway by inhibition of different targets with chemical agents.

      Author response image 1.

      Partial knockdown of CHAC1 hinders extract-induced cell death. (A) MOLT-4 cells were treated with either an empty vector or shRNA for Chac1, 369 and 739 represent two different areas of Chac1, for 48 hrs. Then, the gene expression of CHAC1 was assessed via qRT-PCR (N=3). (B) MOLT-4 cells were treated as in A, then added vehicle control or whole Extract (3 µg/mL) for additional 24 hrs, and the viability of the cells was assessed with XTT.

      Reviewer #2 (Public Review):

      Summary:

      The Meiri group previously showed that Notch1-activated human T-ALL cell lines are sensitive to a cannabis extract in vitro and in vivo (Ref. 32). In that article, the authors showed that Extract #12 reduced NICD expression and viability, which was partially rescued by restoring NICD expression. Here, the authors have identified three compounds of Extract #12 (CBD, 331-18A, and CBDV) that are responsible for the majority of anti-leukemic activity and NICD reduction. Using a pharmacological approach, the authors determined that Extract #12 exerted its anti-leukemic and NICD-reducing effects through the CB2 and TRPV1 receptors. To determine the mechanism, the authors performed RNA-seq and observed that Extract #12 induces ER calcium depletion and stress-associated signals -- ATF4, CHOP, and CHAC1. Since CHAC1 was previously shown to be a Notch inhibitor in neural cells, the authors assume that the cannabis compounds repress Notch S1 cleavage through CHAC1 induction. The induction of stress-associated signals, Notch repression, and anti-leukemic effects were reversed by the integrated stress response (ISR) inhibitor ISRIB. Interestingly, combining the 3 cannabinoids gave synergistic anti-leukemic effects in vitro and had growthinhibitory effects in vivo.

      Strengths:

      (1) The authors show novel mechanistic insights that cannabinoids induce ER calcium release and that the subsequent integrated stress response represses activated NOTCH1 expression and kills T-ALL cells.

      (2) This report adds to the evidence that phytocannabinoids can show a so-called "entourage effect" in which minor cannabinoids enhance the effect of the major cannabinoid CBD.

      (3) This report dissects the main cannabinoids in the previously described Extract #12 that contribute to T-ALL killing.

      (4) The manuscript is clear and generally well-written.

      (5) The data are generally high quality and with adequate statistical analyses.

      (6) The data generally support the authors' conclusions. The exception is the experiments related to Notch.

      (7) The authors' discovery of the role of the integrated stress response might explain previous observations that SERCA inhibitors block Notch S1 cleavage and activation in T-ALL (Roti Cancer Cell 2013). The previous explanation by Roti et al was that calcium depletion causes Notch misfolding, which leads to impaired trafficking and cleavage. Perhaps this explanation is not entirely sufficient.

      Weaknesses:

      (1) Given the authors' previous Cancer Communications paper on the anti-leukemic effects and mechanism of Extract #12, the significance of the current manuscript is reduced.

      Our original manuscript consisted extensive multidisciplinary research, and we were asked to divide the research work into a paper that focuses on the cannabis plant and another paper that addresses finding the specific molecules and their underlying mechanism.

      We understand that our publication of the initial observations with the whole extract dampened the overall novelty presented here, but the previous publication lacked the detailed and strong mechanistic work presented here that explains how the cannabis extract exerted its antitumoral effects.

      In addition, the finding of the need for 3 phytocannabinoids and the synergy analysis supplies essential support to the ‘entourage effect’. This is a phenomenon in which the presence of minor proportions of cannabinoids and other plant components significantly modulate the effects of the main active components of cannabis and thereby produce more potent or more selective effects than the use of one major cannabinoid alone. It was well-demonstrated for endocannabinoids but was only demonstrated in very few studies for phytocannabinoids.

      (2) It would be important to connect the authors' findings and a wealth of literature on the role of ER calcium/stress on Notch cleavage, folding, trafficking, and activation.

      We mentioned three previous papers (ref. 34-36) that guided us in our investigation. Following this reviewer’s comment, we added to the discussion a few lines connecting our findings to previous works on ER stress and Notch activation with the appropriate references.

      (3) There is an overreliance on the data on a single cell line -- MOLT4. MOLT4 is a good initial choice as it is Notch-mutated, Notch-dependent, and representative of the most common T-ALL subtype -- TAL1. However, there is no confirmatory data in other TAL1positive T-ALLs or interrogation of other T-ALL subtypes.

      As mentioned by the reviewer, this study followed a previous publication in which 7 different cell lines were assessed (MOLT‐4, CCRF‐CEM, Jurkat, Loucy, HPB-ALL, DND-41and T-ALL1). MOLT-4 cells were used to investigate the mechanism, both MOLT-4 cells and CCRF-CEM cells were utilized to investigate the effect of the cannabinoid combination or the whole extract in-vivo.

      (4) Fig. 6H. The effects of the cannabinoid combination might be statistically significant but seem biologically weak.

      Survival rates are presented in Fig. 6H for the combination of the cannabinoids and in Supplementary Fig. S6C for the whole extract. While this mouse model provides valuable insights, the biological significance and the translation of findings to human patients require cautious interpretation.

      (5) Fig. 3. Based on these data, the authors conclude that the cannabinoid combination induces CHAC1, which represses Notch S1 cleavage in T-ALL cells. The concern is that Notch signaling is highly context-dependent. CHAC1 might inhibit Notch in neural cells (Refs. 34-35), but it might not do this in a different context like T-ALL. It would be important to show evidence that CHAC1 represses S1 cleavage in the T-ALL context. More importantly, Fig. 3H clearly shows the cannabinoid combination inducing ATF4 and CHOP protein expression, but the effects on CHAC1 protein do not seem to be satisfactory as a mechanism for Notch inhibition. Perhaps something else is blocking Notch expression?

      We understand the reviewer’s concern. Previous works had shown the upregulation of CHAC1 also in the context of Notch signaling in leukemia, particularly recently also for T-ALL:

      (1) Meng, X., Matlawska-Wasowska, K., Girodon, F., Mazel, T., Willman, C.L., Atlas, S., Chen, I.M., Harvey, R.C., Hunger, S.P., Ness, S.A. and Winter, S.S., 2011. GSI-I (Z-LLNle-CHO) inhibits γ-secretase and the proteosome to trigger cell death in precursor-B acute lymphoblastic leukemia. Leukemia, 25(7), pp.11351146.

      (2) Chang, Yoon Soo, Joell J. Gills, Shigeru Kawabata, Masahiro Onozawa, Giusy Della Gatta, Adolfo A. Ferrando, Peter D. Aplan, and Phillip A. Dennis. "Inhibition of the NOTCH and mTOR pathways by nelfinavir as a novel treatment for T cell acute lymphoblastic leukemia." International Journal of Oncology 63, no. 5 (2023): 1-12.

      As for the second part of the reviewer’s comment, we tested both the mRNA transcript and protein expression of CHAC1. The increase is clearly shown at 60 min for the mRNA Fig. 3D and Fig. 4F and for the protein also in Supplementary Fig. S4G-I.

      To show direct involvement of CHAC1 we utilized the means of knockdown. Though it was not completely effective and accounted for about ~50% reduction, it clearly shows the involvement of CHAC1 in the mechanism leading to the reduction in viability of these cancer cells.

      Author response image 2.

      Partial knockdown of CHAC1 hinders extract-induced cell death. (A) MOLT-4 cells were treated with either an empty vector or shRNA for Chac1, 369 and 739 represent two different areas of Chac1, for 48 hrs. Then, the gene expression of CHAC1 was assessed via qRT-PCR (N=3). (B) MOLT-4 cells were treated as in A, then added vehicle control or whole Extract (3 µg/mL) for additional 24 hrs, and the viability of the cells was assessed with XTT.

      (6) Fig. 4B-C/S5D-E. These Western blots of NICD expression are consistent with the cannabinoid combination blocking Furin-mediated NOTCH1 cleavage, which is reversed by ISR inhibition. However, there are many mechanisms that regulate NICD expression. To support their conclusion that the effects are specifically Furin-medated, the authors should probe full-length (uncleaved) NOTCH1 in their Western blots.

      We have probed for the full-length Notch1 in our previously published paper (Cancer Communications, Supplementary Fig. S1G-I). As we have shown here the three cannabinoids together mimic the effect of the whole extract, we did not repeat the experiments with full-length Notch1.

      (7) Fig. S4A-B. While these pharmacologic data are suggestive that Extract #12 reduces NICD expression through the CB2 receptor and TRPV1 channel, the doses used are very high (50uM). To exclude off-target effects, these data should be paired with genetic data to support the authors' conclusions.

      We performed a dose-response experiment before choosing the doses used for the inhibitors of CB2 and TRPV1 (see below). The dose of 50 µM was selected as it did not affect the viability of the cells.

      Author response image 3.

      Dose-response of CB2 and TRPV1 inhibitors in MOLT-4 cells. MOLT-4 cells were treated with increasing concentrations (µM) of (A) CB2 inhibitor AM630 or (B) TRPV1 inhibitor AMG9810; and 24 hrs later the viability of the cells was assessed with XTT.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Fig. 6H, it is unclear why the authors are using CCRF-CEM cells, which are known to be resistant to Notch inhibitors, rather than popular cell lines that are Notch-dependent (e.g. CUTLL1, DND-41, HPB-ALL). Since cannabinoids seem to kill at least in part through Notch inhibition, the effects would be predicted to be greater in Notch-dependent T-ALL cell lines than Notch-independent cell lines like CCRF-CEM. To show stronger in vivo preclinical efficacy, another suggestion is to combine cannabinoids with tolerable dosing of gammasecretase inhibitors as published by the Michelle Kelliher group.

      We have shown in our previous publication that both MOLT-4 and CCRF-CEM cells are dependent on Notch for their propagation, while other cell lines of T-ALL such as Loucy and Jurkat do not. Therefore, we treat CCRF-CEM as Notch-dependent. We discuss the possibility of using the cannabinoid combination with other treatments, specifically chemotherapy, to enhance effectiveness.

      (2) To increase significance, this reviewer suggests strengthening the mechanism. However, this reviewer understands the challenge of identifying the correct mechanism. Thus, an alternative would be to increase clinical relevance. Some specific suggestions are described below.

      (a) With regard to increasing mechanistic insights, the authors should be aware of some papers that might be helpful. Roti et al (Cancer Cell 2013) showed that SERCA inhibitors like thapsigardin reduce ER calcium levels and block Notch signaling by inhibiting NOTCH1 trafficking and inhibiting Furin-mediated (S1) cleavage of Notch1. Multiple EGF repeats and all three Lin12/Notch repeats in the extracellular domains of Notch receptors require calcium for proper folding (Aster Biochemistry 1999; Gordon Nat. Struct. Mol. Biol. 2007; Hambleton Structure 2004; Rand Protein Sci 1997). Thus, Roti et al concluded that ER calcium depletion blocks NOTCH1 S1 cleavage. This effect seems to be conserved in Drosophila as Periz and Fortiin (EMBO J, 1999) showed impaired Notch cleavage in Ca2+/ATPasemutated Drosophila cells. Besser et al should consider these papers when exploring the mechanism by which the ER calcium release by the cannabinoid combination blocks activated NOTCH1 expression. Similarities and differences should be discussed.

      As mentioned above and stated also by the reviewer, many papers have shown the cleavage and/or activation of Notch following ER stress.

      (b) With regard to increasing clinical relevance, the authors should consider testing the effects of the cannabinoid combination on primary samples, PDX models, and/or genetically engineered mouse models. Pan-Notch inhibitors like gamma-secretase inhibitors (GSIs) have been disappointing in clinical trials because of excessive on-target toxicity, in particular in the intestine. The authors should consider exploring whether the cannabinoids might be superior to GSIs with regard to intestinal toxicity and why that might be (e.g. receptor expression).

      We thank the reviewer and agree that clinical relevance is of outmost importance. As obtaining primary tumor cells from patients is challenging, we assessed the whole cannabis extract in a PDX model. This extract is already being used by patients. We added this result as Supplementary fig. S7, and address it in the main text of the Results and in the Materials and Methods section.

      (3) Since the authors have performed gene expression profiling, another test to confirm that Extract #12 acts through the Notch pathway is to perform enrichment analysis for known Notch target genes in T-ALL (e.g. Wang PNAS 2013).

      We performed the analysis and this is how we pinpointed the involvement of ATF4, CHOP and CHAC1 of the integrated stress response pathway.

      Minor concern:

      Supplemental Table S4. According to the text (page 10, line 160) and table title, these data are RNA-seq. However, according to the GSE154287 annotation, these data are Affymetrix arrays There are no gene names in the GSE table. Are the IDs probesets rather than genes?

      Indeed, the gene analysis data are Affymetrix arrays and the title was corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The delineation of MBOAT function is important with theoretical and practical implications in MAFLD, alcohol-induced hepatic steatosis, and lysosomal diseases. The strength of evidence is convincing using methodology in line with current state-of-the-art, with good support for the claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors provide mechanistic insights into how the loss of function of MBOAT7 promotes alcoholassociated liver disease. They showed that hepatocyte-specific genetic deletion of Mboat7 enhances ethanol-induced hepatic steatosis and increased ALT levels in a murine model of ethanol-induced liver disease. Through lipidomic profiling, they showed that mice with Mboat7 deletion demonstrated augmented ethanol-induced endosomal and lysosomal lipids, together with impaired transcription factor EB (TFEB)-mediated lysosomal biogenesis and accumulation of autophagosomes.

      Strengths:

      Alcohol-induced liver disease (ALD) and metabolic-associated steatotic liver disease (MASLD) are major global health problems, and polymorphism near the gene encoding MBOAT7 has been associated with these conditions. This paper is timely as it is important to gain insights on how loss of MBOAT function contributes to liver disease as this may eventually lead to therapeutic strategies. -The conclusions of the paper are mostly well supported by data.

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Weaknesses:

      (1) In regards to circulating levels of MBOAT7 products, a comparison of heavy drinkers with ALD versus heavy drinkers without ALD would be more clinically relevant.

      We agree this comparison would be an important comparison to make in future studies, but given the difficulties in accessing well-matched samples such as these we see this as beyond the scope of the current work.

      (2) A few typos need to be addressed. For Figure 1 - figure supplement 1, should the second column heading be "Heavy drinkers" instead of "Healthy drinkers"? Also, in the same figure, it is unclear what the "healthy" subcategory under MELD means.

      The typographical error was addressed in the main text and in all associated tables and figures.

      (3) Some of the data in the tables need to be addressed/discussed. For instance, the white blood cell count (WBC) in Figure 1 - figure supplement 1 for "healthy controls" is 34, compared to 13.51 for drinkers. A WBC of 34 is not at all healthy and should be explained. The vast difference between BMI and also between racial distribution within the two cohorts should also be explained. Is it possible that some of these differences contributed to the different levels of circulating MBOAT7 products that were measured?

      Sincere thanks for catching this error. In follow up, we found that some of our patient recruitment sites were using different units to report WBC counts (percent vs 1000/ml) and at this time we cannot retrospectively correct that difference. Therefore, we have incomplete WBC values for the cohort so elected to exclude that information to avoid confusing readers. A revised table is provided in revision reflecting these changes/ If we look at each site separately, values for WBC were in the normal range, so we do not think this is a major limitation of our studies. In regards to BMI and race: Race is not actually significant, but close. For BMI, there are 2 very low BMIs in the Heavy drinkers which bring that average down. We agree with Reviewer # 1 that race and / or BMI could impact MBOAT7, but larger cohorts are needed to detect such potential differences.

      (4) The representation of the statistical difference between the bars in the results figures by using alphabets is a bit confusing. For instance, in figure 2C, does that mean all the bars labelled A are significantly different from B? The solid black bar seems to be very similar to the open red bar; please double check.

      We apologize for this confusing presentation. Using the letter system, groups not sharing a common superscript differ statistically. Given this concern, we have gone back and reviewed all statistical comparisons and realized that there were several mistakes in the graph Figure 2C, Figure 3F and G, Figure 3-Supplementary Figure 1 F and Figure 3-Supplementary Figure 10H. The graphs themselves were not altered, but the denotation of statistical significance was updated with the correct letter superscripts.

      Reviewer #2 (Public Review):

      Summary:

      The work by Varadharajan et. al. explored a previously known genetic variant and its pathophysiology in the development of alcohol-associated liver injury. It provides a plausible mechanism for how varying levels of MBOAT7 could impact the lipid metabolomics of the cell, leading to a deleterious phenotype in MBOAT7 knockout. The authors further characterized the impact of the lipidomic changes and raised lysosomal biogenesis and autophagic flux as mechanisms of how MBOAT7 deletion causes the progression of ALD.

      Strengths:

      Connecting the GWAS data on MBOAT7 variants with plausible pathophysiology greatly enhances the translational relevance of these findings. The global lipidomic profiling of ALD mice is also very informative and may lead to other discoveries related to lipid handling pathways.

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Weaknesses:

      The rationale of why MBOAT7 metabolites are lower in heavy drinkers than in normal individuals is not well explained. MBOAT7 loss of function drives ALD, but unclear if MBOAT7 deletion also drives preference for alcohol or if alcohol inhibits MBOAT7 function. Presuming most individuals studied here were WT and expressed an appropriate level of MBOAT7?

      Although we were unable to genotype for the rs641738 SNP in the human subjects studied here, the original study by Buch et al. published in Nature Genetics performed cis expression quantitative trait lock (cis-eQTL) analyses to demonstrate that the minor disease-associated allele was associated with reduced MBOAT7 expression in subjects with alcohol-related cirrhosis. It is important to note that we did not see any evidence that alcohol preference was altered in either myeloid- or hepatocyte-specific Mboat7-knockout mice, given ethanol intake was similar in all genotypes. Additional studies are needed to address the possibility that MBOAT7 loss of function may promote alcohol preference, but we agree that this should be further investigated.

      Also, the discussion of mechanisms of MBOAT7-induced dysregulation of lysosomal biogenesis/autophagy, while very interesting, seems incomplete. It is not clear how MBOAT7 an enzyme involved in membrane phospholipid remodeling increases mTOR which leads to decreased TFEB target gene transcription.

      Although we agree with Reviewer #2 that mechanistic understanding by which MBOAT7 loss of function impacts mTOR activity and TFEB-driven lysosomal biogenesis is still incomplete, we do feel that the results published here will inform downstream investigation linking phosphatidylinositol remodeling to mTOR and TFEB. The MBOAT7 gene encodes an acyltransferase enzyme that specifically esterifies arachidonyl-CoA to lysophosphatidylinositol (LPI) to generate the predominant molecular species of phosphatidylinositol (PI) in cell membranes (38:4). It is well established that PI-related lipids can regulate membrane dynamics and signal transduction pathways. For instance PI-phosphates (PIPs) are dynamically shaped by PI kinases and phosphatases to play crucial roles in the regulation of a wide variety of cellular processes via specific interactions of PIP-binding proteins. Among PI phosphates, PI 3phosphate (PI3P) regulates vesicular trafficking pathways, including endocytosis, endosome-toGolgi retrograde transport, autophagy and mTOR signaling. Although additional work is needed to understand the molecular details of how MBOAT7-driven LPI acylation impacts mTOR and TFEB, it is not particularly surprising that PI lipid remodeling could broadly impact cell signaling.

      Furthermore, given the significant disturbances of global lipidomic profiling in MBOAT7 knockout, many pathways are potentially affected by this deletion. Further in vivo modeling that specifically addresses these pathways (TFEB targeting, mTOR inhibitor) would help strengthen the conclusions of this paper.

      We agree that further in vivo studies are needed that are beyond the scope of the current work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) p values are rather hard to read. For example, Figure 2c, Hepatocyte-specific deletion of Mboat7 resulted in enhanced ethanol-induced increases in liver weight. However, doesn't look like there is a significant difference between the 2 EtOH groups in Figure 2C? Same comment for Figure 2e, not sure if pair-fed groups had a significant difference.

      (2) Figure 2 Supp fig 1, what is the top band on the MBOAT7 WB?

      We have addressed these statistical comparison comments as described above. Although we cannot be sure, it is likely that the top band on the MBOAT7 Western blot is a non-specific band that shows up with the antibody combination used given there is equal intensity in the Mboat7flox/flox and the MSKO mice (Mboat7flox/flox+LysM-Cre).

  3. Feb 2024
    1. Author Response

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      Authors Response- We ae grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B). We will characterize other markers of senescence including senescence-associated β galactosidase in the revised manuscript.

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We only observed the death of cells at higher concentrations of 15d-PGJ2 (5 µM and 10 µM) (Fig. S2A), but not significantly at the 4 µM concentration used in Figure 2. This is the reason 4uM was used, and we should have clarified this. We will include viability data for the low concentration of 15d-PGJ2 (4 µM) in the revised manuscript.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ¬2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation. The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported.

      Literature does support this statement, and it is important to clarify this mis-impression for the field as whole

      Let us clarify-

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively.

      Interaction between HRas and 15d-PGJ2 in skeletal muscles has not been shown before, even though both HRas and 15d-PGJ2 are shown to be key regulators of muscle homeostasis.

      Activation of HRas by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of HRas signaling.

      Recently, our lab contributed to a study where the functional implication of activation of HRas signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021).

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage based senescence that is used in this manuscript. It is not a models for replicative senescence since it is immortalized. In this study we show that C2C12 cells undergo DNA damage mediated senescence after treatment with Doxo. We also observe similar phenotype in MCF7 breast cancer cells and IMR90 lung fibroblasts after treatment with Doxo (Data will be updated in the supplementary figure 1). Also, several reports in the literature have shown induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide mediated DNA damage. Moustogiannis et al 2021 show induction of replicative senescence in C2C12 cells.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogene-induced senescence, would strengthen the study.

      Fig. 1E shows time dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows increase in the levels of 15d-PGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate info will be added to the materials and methods section in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript offers a commendable exploration into the relationship between plasma omega-6/omega-3 fatty acid ratios and mortality outcomes.

      Strengths:

      The chosen study design and analytical techniques align well with the research objectives, and the results resonate with existing literature.

      Weaknesses:

      Lack of information on the selection criteria for participants; 5. The analysis of individual PUFAs is not appropriate; The definition of comorbidities is vague; The rationale of conducting the mediation analysis of blood biomarkers is not given.

      Thank you for your insightful feedback and for acknowledging the strengths of our manuscript, particularly regarding the alignment of our study design and analytical methods with our research objectives. Your recognition of how our results resonate with existing literature is greatly appreciated.

      Addressing the concerns you've raised:

      Selection Criteria for Participants: In the “Methods-Study population” section, we have outlined the exclusion criteria for participant selection. This information provides comprehensive insight into our methodology for selecting the study cohort.

      Analysis of Individual PUFAs: We acknowledge your concern regarding the analysis of individual PUFAs due to their inter-correlations in plasma levels. However, the correlations between omega-3% and omega-6% (r = -0.12) and between DHA% and LA% (r = 0.03) are actually low. Because DHA is one of omega-3 PUFAs, we did not include PUFAs in the same model. Similar considerations apply to LA and omega-6. We believe that exploring the effects of individual fatty acids adds valuable depth to our research. Both DHA and LA have been included in the same model due to their low correlation, with careful adjustments for confounding factors to provide a nuanced understanding of their individual impacts on mortality.

      Definition of Comorbidities: The definition of comorbidities, including hypertension, diabetes, and longstanding illness, is elaborated under the Methods section. These conditions were identified through self-reported data collected via the Assessment Centre Environment (ACE) touchscreen questionnaire, allowing us to capture a broad range of chronic conditions as reported by participants.

      Rationale for Mediation Analysis: Initially, our approach to mediation analysis included various blood biomarkers available in the UK Biobank database to explore the potential underlying pathways. However, upon considering your feedback regarding the overlap of fatty acids with lipid classes or lipid particles in plasma, we have decided to remove these elements from our mediation analysis.

      Reviewer #2 (Public Review):

      Summary:

      This study utilized a large sample from the UK Biobank which enhanced statistical robustness, employed a prospective design to establish clear temporal relationships, used objective biomarkers for assessing plasma omega-6/omega-3 ratio, and investigated various mortality causes including CVD and cancer for a holistic health understanding.

      Strengths:

      The authors used a large sample size, employed a prospective design, and investigated various mortality.

      Weaknesses:

      Analyzing n-3 and n-6 PUFAs separately might be less instructive. It might not be methodologically sound to treat TG, HDL, LDL, and apolipoproteins as mediators. It's imperative to exercise caution when drawing causal conclusions from the observed correlations. The manuscript might propose potential research trajectories.

      We are grateful for your thoughtful analysis of our study's strengths and for your constructive feedback on areas for improvement.

      Response to Weaknesses:

      Analyzing n-3 and n-6 PUFAs Separately: We recognize the challenge in analyzing n-3 and n-6 PUFAs separately due to their correlations. However, the correlation between n-3% and n-6% in UK Biobank was actually relatively low (r = -0.12). We include them in one model to test if both are associated with the outcomes after controlling for the effects of the other. Indeed, both were negatively associated with the mortality outcomes in our analysis. We believe our supplemental analysis of n-3 and n-6 PUFAs provides useful information to the readers, in addition to our findings based on the n-6/n-3 ratio.

      Mediation Analysis of TG, HDL, LDL, and Apolipoproteins: We appreciate your insight on the methodological considerations of treating these biomarkers as mediators. After careful review and in line with suggestions from another reviewer, we have removed these elements from our mediation analysis. This revision improves the net scientific rigor of our work, ensuring that our conclusions are drawn from the most robust and methodologically sound of our analyses.

      Causal Conclusions from Correlations: We fully agree with the need for caution in interpreting correlations in observational studies. To this end, we have avoided implying causality in our manuscript. Terms suggesting causality, like "protective effects," have been replaced with "inverse associations" to more accurately represent our findings. This adjustment enhances the clarity and accuracy of our conclusions.

      Proposing Future Research Trajectories: Recognizing the importance of advancing causal and mechanistic understanding in this field, we have called for future studies to further examine causality and characterize molecular mechanisms of the observed associations in our study.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find out whether the levels of omega-6 and omega-3 fatty acids in the blood are linked to the likelihood of dying from anything, of dying from cancer and of dying from cardiovascular disease. They use a large dataset called UK Biobank where fatty acid levels were measured in blood at the start of the study and what happened to the participants over the following years (average of 12.7 years) was followed. They find that both omega-6 AND omega-3 fatty acids were linked with less likelihood of dying from anything, from cancer and from cardiovascular disease. The effects of omega-3s were stronger. They then made a ratio of omega-6 to omega-3 fatty acids and found that as that ratio increased risk of dying also increased,. This supports the idea that omega-3s have stronger effects than omega-6s.

      Strengths:

      This is a large study (over 85,000 participants) with a good follow up period (average 12.7 years). Using blood levels of fatty acids is superior to using estimated dietary intakes. The authors take account of many variables that could interfere with the findings (confounding variables) - they do this using statistical methods.

      Weaknesses:

      There are several omega-6 and omega-3 fatty acids - it is not clear which ones were actually measured in this study

      Thank you for recognizing the strengths of our study, including the large sample size, the duration of follow-up, and our methodological approach to using blood levels of fatty acids and addressing potential confounders. Regarding the weakness you've highlighted, we understand the importance of specifying which omega-6 and omega-3 fatty acids were analyzed in our study. We have revised the Method section to provide detailed information about how the exposures were measured.

      Recommendations for the author:

      Reviewer #1 (Recommendations for the Authors):

      To elevate the manuscript's scholarly rigor, I propose the following refinements:

      (1) The manuscript lacks information on the selection criteria for participants and the representativeness of the UK Biobank cohort. It is important to provide details on how participants were selected and whether it is representative of the general population, which is crucial for assessing the generalizability of the findings.

      We appreciate the opportunity to clarify the participant selection criteria and the representativeness of the UK Biobank cohort within our manuscript. In the “Methods-Study population” section, we delineated the exclusion criteria: "Participants with cancer (n=37,736) or CVD (n=100,972), those who withdrew from the study (n=879), and those with incomplete data on the plasma omega-6/omega-3 ratio (n=277,372) were excluded from this study, leaving 85,425 participants, 6,461 died during follow-up, including 2,794 from cancer and 1,668 from CVD." To further address representativeness, we performed a sensitivity analysis, examining the baseline characteristics of participants included in our study relative to those omitted due to lack of exposure information. This analysis, presented in Additional file 2: Table S13, indicates comparable baseline characteristics across both participant groups, bolstering confidence in the representativeness of our study sample with the general UK Biobank participants.

      Regarding the UK Biobank's representativeness with the general population, we acknowledge that the cohort does not mirror the broader UK demographic in terms of socioeconomic and health profiles. Participants in the UK Biobank generally exhibit better health and higher socioeconomic status than the average UK resident, potentially influencing the disease prevalence and incidence rates. Nonetheless, the UK Biobank's extensive sample size and comprehensive exposure data enable the generation of valid estimates for exposure-disease associations. These estimates have been corroborated by findings from more demographically representative cohorts, as highlighted in the studies by Batty et al., and Fry et al..

      We recognize the importance of this aspect and will incorporate a discussion on the implications of these factors for the generalizability of our findings in the “Discussion-Limitations” section of our manuscript. We are grateful for this insightful comment and believe that this addition will enhance the manuscript's contribution to the field.

      Here is what we added in the “Discussion-Limitations” section of our manuscript: “Third, we acknowledged that the cohort did not mirror the broader UK demographic in terms of socioeconomic and health profiles. Participants in the UK Biobank generally exhibited better health and higher socioeconomic status than the average UK resident, potentially influencing the disease prevalence and incidence rates. Nonetheless, the UK Biobank's extensive sample size and comprehensive exposure data enable the generation of valid estimates for exposure-disease associations. These estimates have been corroborated by findings from more demographically representative cohorts47,48.”

      References:

      Batty, G. D., Gale, C. R., Kivimäki, M., et al. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ. 2020; 368: m131.

      Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017;186(9):1026–34.

      (2) The study sample included different ancestries which may introduce confounding from genetic background. As over 90% of the participants were of European ancestry, I recommend excluding individuals of non-European ancestry in the main analysis.

      Thank you for raising the concern regarding the inclusion of different ancestries in our study sample and the potential confounding. In our research, we have adhered to the widely accepted practice of including all participants in the study to ensure a comprehensive analysis. Recognizing the predominance of European ancestry within our cohort, which exceeds 90%, we have proactively incorporated ethnicity as a covariate in our statistical models to mitigate confounding influences.

      We also considered the feasibility of conducting a stratified analysis for non-European participants. However, the small sample sizes of non-European subgroups do not provide sufficient statistical power to yield reliable or meaningful separate analyses. Consequently, to maintain the integrity and robustness of our findings, we opted to include all participants in the main analysis, adjusting for ethnicity to account for potential confounders.

      (3) I noted that a large proportion of participants were excluded due to the lack of data on plasma PUFAs. Were the characteristics of these participants similar to the current analysis sample?

      Thank you for raising this very important point. According to UK Biobank, “The EDTA plasma samples were picked randomly and are therefore representative of the 502,543 participants in the full cohort.” (As detailed in Julkunen et al.) Moreover, as noted in our reply to comment #1 above, we performed a sensitivity analysis, examining the baseline characteristics of participants included in our study relative to those omitted due to lack of exposure information.

      The results of this analysis are detailed in Additional file 2: Table S13. They demonstrate that the baseline characteristics—such as age, gender, ethnicity, socioeconomic status, and lifestyle habits—are indeed similar between the two groups. This similarity supports the representativeness of our analysis sample and suggests that the exclusion of participants without plasma PUFA data does not introduce a bias that would undermine the validity of our study's findings.

      References:

      Julkunen H, Cichońska A, Tiainen M, et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat Commun. 2023 Feb 3;14(1):604. doi: 10.1038/s41467-023-36231-7.

      (4) The methods section should include a detailed description of the measurement of plasma omega-6/omega-3 fatty acid ratio. It is important to provide information on the analytical techniques used and any quality control measures implemented to ensure the accuracy and reliability of the measurements. Importantly, were repeated measurements done?

      Thank you for raising this important point. The details of the metabolomic profiling have been described in previous UK Biobank publications. In this revision, we added a brief description of the measurement process and provided references to previous publications.

      Here is what we added in the “Methods- Ascertainment of exposure” section of our manuscript: “Metabolomic profiling of plasma samples was performed with high-throughput nuclear magnetic resonance (NMR) spectroscopy. At the time of this analysis (15 Mar 2023), UK Biobank released the Phase 1 metabolomic dataset, which covered a random selection of 118,461 plasma samples from the baseline recruitment. These samples were collected between 2007 and 2010 and had been stored in −80 °C freezers, while the NMR measurements took place between 2019 and 2020. Detailed descriptions could be found in previous publications about plasma sample preparation, NMR spectroscopy setup, quality control protocols, correction for sample dilution, verification with duplicate samples and internal controls, and comparisons with independent measurements from clinical chemistry assays20-22.”

      (5) The analysis of individual PUFAs is not appropriate because plasma levels of these PUFAs, including n-3 PUFAs and n-6 PUFAs, EPA, DHA and AA, are usually correlated. It is hard to differentiate these correlated FAs in Cox model. Whereas the ratio of n-6/n-3 is indeed more comprehensive, and the current analysis demonstrated this ratio as a good marker of mortality. Therefore, the analyses of individual PUFAs can be removed and only focus on the ratio of n-6/n-3.

      We resonate with the Reviewer regarding the importance of focusing on the ratio of n-6/n-3. Indeed, the ratio is our focus in this manuscript. We also acknowledge the Reviewer's concern regarding the inclusion of correlated covariates in one statistical model. In that specific analysis, the correlations between omega-3% and omega-6% (r = -0.12) and between DHA% and LA% (r = 0.03) are relatively low. Additionally, we also checked the model for multicollinearity and found that the variance inflation factors (VIFs) were within acceptable ranges. In the fully adjusted model that included omega-3% and omega-6%, all variables had VIFs below 1.13, with omega-3% at a VIF of 1.06 and omega-6% at a VIF of 1.12. Similarly, in the model including DHA% and LA%, all variables also exhibited VIFs under 1.13, with DHA% recording a VIF of 1.07 and LA% a VIF of 1.10. Because DHA is one of omega-3 PUFAs, we did not include them in the same model. We did not include LA and omega-6 in the same model, either. Because the ratio has two components and each component is the sum of multiple individual PUFAs, it is natural to ask which component is more important (e.g., omega-6 or omega-3?), which specific fatty acid is driving the effect of omega-3 PUFAs (e.g., ALA? Or the marine omega-3, EPA and DHA?). We received such feedback frequently when we presented our research previously. Therefore, as an effort to address them, we performed analysis of omega-3, omega-6, DHA, and LA. While we understand the complexities involved in differentiating the effects of individual fatty acids in a Cox model, we believe there is intrinsic value in exploring these relationships further. In our analysis, we have attempted to investigate the effects of individual PUFAs on mortality by including both DHA and LA within the same model due to their low correlation, making adjustments to account for confounding factors (As detailed in Additional file 2: Table S9). Our findings indicate significant inverse associations between both DHA and LA with all-cause, cancer, and cardiovascular disease (CVD) mortality. We agree with the Reviewer that the focus of our manuscript should be the ratio, but also hope the Reviewer will agree with us that keeping the results from individual PUFAs will provide additional useful information to the readers.

      (6) The definition of comorbidities (including hypertension, diabetes, and longstanding illness) is vague. Please clarify what diseases longstanding illness includes.

      We appreciate the request for clarification regarding the definition of comorbidities in our study, including the categorization of longstanding illness. The information regarding longstanding illnesses was obtained via the Assessment Centre Environment (ACE) touchscreen questionnaire. Participants were asked, "Do you have any long-standing illness, disability, or infirmity?" with the response options being “Yes,” “No,” “Do not know,” and “Prefer not to answer.” For the purposes of our analysis, participants who selected “Yes” were categorized as having a longstanding illness, while the remaining options were grouped as not having a longstanding illness.

      This method of classification aligns with our detailed explanation in the “Methods-Ascertainment of covariates” section of the manuscript, where we state that “Comorbidities, including hypertension, diabetes, and longstanding illness, were self-reported at baseline. Longstanding illness refers to any long-standing illness, disability, or infirmity, without other specific information.” It is important to note that this approach is consistent with established precedents in the field. Specifically, the paper by Li et al. in the BMJ utilized a similar definition for comorbidities, reinforcing the validity of our methodology.

      References:

      Li ZH, Zhong WF, Liu S, et al. Associations of habitual fish oil supplementation with cardiovascular outcomes and all cause mortality: evidence from a large population based cohort study. BMJ. 2020 Mar 4;368:m456.

      (7) The rationale of conducting the mediation analysis of blood biomarkers is not given. Since fatty acids can be formed as TG or bound with apolipoproteins in plasma, there is a large overlap of FAs with these biomarkers and thus it is not appropriate to analyze TG, HDL, LDL, and apolipoproteins as mediators.

      We are grateful for the insightful feedback regarding the mediation analysis of blood biomarkers. Our mediation analysis aimed to explore the possible biomarkers and biological processes that explain the effects of PUFAs on mortality. Upon reflection, we recognize the complexities introduced by the inherent overlap of fatty acids with different lipid particles and lipid classes in plasma. Considering the potential confounding this overlap presents, and in agreement with your recommendation, we have decided to remove the mediation analyses involving cholesterol, TG, HDL-C, LDL-C, Lp(a), ApoA, and ApoB from our study. We appreciate your guidance on this matter and have updated our manuscript accordingly to reflect these changes.

      Reviewer #2 (Recommendations for the Authors):

      (1) Analyzing n-3 and n-6 PUFAs separately might be less instructive given the inherent correlations among plasma levels of n-3 PUFAs and n-6 PUFAs. Also, some important specific PUFAs, such as ALA, AA, EPA, etc. were not available in the UK Biobank data though the authors tried to analyze LA and DHA. The n-6/n-3 ratio, as evidenced by the current analysis, offers a more holistic perspective and might be a superior mortality marker. Thus, I recommend shifting the focus solely to this ratio.

      Thank you for the thoughtful comment. Reviewer #1 raised a similar point (comment #5 above). We are glad that both reviewers recognized the importance of the omega-6/omega-3 ratio and agreed with us that the ratio should be the focus of the paper. Please also see our more detailed response above. Briefly, our manuscript centered on the ratio, while the supplemental analysis of omega-3%, omega-6%, DHA%, and LA% provided additional useful information. We included omega-3% and omega-6% in the same model because their correlation was relatively low (r = -0.12). We also checked the model for multicollinearity and found that the variance inflation factors (VIFs) for n-3 PUFAs and n-6 PUFAs were within acceptable ranges. In the fully adjusted model that included omega-3% and omega-6%, all variables had VIFs below 1.13, with omega-3% at a VIF of 1.06 and omega-6% at a VIF of 1.12. Similarly, in the model including DHA% and LA%, all variables also exhibited VIFs under 1.13, with DHA% recording a VIF of 1.07 and LA% a VIF of 1.10. Therefore, we decided to keep the content for omega-3 and omega-6 PUFAs. We hope that Reviewer will agree with us that this content only provides additional information to the readers.

      (2) It might not be methodologically sound to treat TG, HDL, LDL, and apolipoproteins as mediators. Since the model included comorbidities as covariates, hypercholesteremia and hypertriglyceridemia seemed to have been adjusted in the analysis. Thus, further adjusting these blood biomarkers for mediation analysis which overlapped with comorbidities is redundant.

      We appreciate your critical evaluation of our methodological approach. Your point is well-taken, especially in light of the fact that comorbidities such as hypercholesterolemia and hypertriglyceridemia have been accounted for as covariates in our model. This overlap, as you correctly identified, could indeed render the mediation analysis redundant. In concordance with your recommendation, and incorporating the comments of another reviewer, we have now omitted the mediation analysis involving these blood biomarkers from our study. We believe this adjustment strengthens the methodological soundness of our research and are thankful for your contribution to this refinement. We have updated our manuscript to reflect these changes and ensure our analysis remains robust and free from redundancy.

      (3) It's imperative to exercise caution when drawing causal conclusions from the observed correlations. The inherent constraints of observational studies, coupled with potential residual confounding or reverse causality, should be acknowledged.

      We concur with the caution against implying causality from correlations observed in our study. As such, we have carefully refrained from claiming any causal relationships within our paper. We acknowledge that the term "protective effects" could suggest a causal inference, and we have revised our language to describe these observations as "inverse associations" to more accurately reflect the nature of our findings.

      We have also addressed the inherent limitations of observational research in the Discussion section under 'limitations' of our manuscript. There, we recognize that while we have accounted for many confounders, the possibility of residual confounding cannot be entirely excluded. We also agree that reverse causality is a concern in observational studies. To mitigate this, we performed a sensitivity analysis excluding participants who died within the first year of follow-up. The results from this analysis, which are provided in Additional file 2: Table S12, show consistency with our main findings, suggesting that the observed associations are less likely to be predominantly driven by reverse causation. We are grateful for your insights, which have guided us in strengthening our manuscript and ensuring that our conclusions are presented with the appropriate scientific rigor.

      (4) To guide subsequent scholarly endeavors, the manuscript might propose potential research trajectories, such as spearheading randomized controlled trials to delve deeper into the causal nexus between plasma omega-6/omega-3 ratios and mortality outcomes or probing the mechanistic underpinnings of the observed correlations.

      We agree that conducting randomized controlled trials could illuminate the potential causal relationships between plasma PUFA biomarkers and mortality outcomes. While the primary focus of our manuscript is to report on associations, we acknowledge the importance of causal analysis in advancing the field. In our secondary analysis, we touched upon mediation effects of blood biomarkers, which could serve as a preliminary step towards establishing causality. Although our current work did not delve deeply into causal mechanisms, the results we have presented may indeed stimulate further exploration. By reporting our mediation analysis results, we aim to provide a foundation that other researchers might build upon. We hope that our work will act as a catalyst for more in-depth studies, such as RCTs or mechanistic investigations, to pursue the questions we have begun to explore.

      Following this recommendation, we have revised our Conclusion paragraph and added: “Our findings support the active management of a high circulating level of omega-3 fatty acids and a low omega-6/omega-3 ratio to prevent premature death. Future research is warranted to further test the causality, such as Mendelian randomization and randomized controlled trials. Mechanistic research, including comprehensive mediation analysis, in-depth experimental characterization in animal models or cell lines, and intervention studies, is also needed to unravel the molecular and physiological underpinnings.”

      Reviewer #3 (Recommendations for the Authors):

      (1) Line 32. Delete "a balanced" because a balanced o6:o3 cannot be defined.

      Thank you for pointing out the issue with the term "a balanced". Most authors agree with your observation that defining what constitutes a 'balanced' ratio can be ambiguous and potentially misleading. One author, JTB, disagrees that “balance” as a concept is unacceptably ambiguous or misleading. In response, we have removed the words from our manuscript.

      (2) In the abstract you should present the findings for omega-6 and omega-3 PUFAs first and then the findings for the ratio.

      We appreciate your suggestion to present the findings for omega-6 and omega-3 PUFAs prior to those for the ratio in the abstract. As laid out in the Background section, the ratio was our primary exposure of interest. So, we organized our manuscript by centering on the ratio. We are glad that both Reviewer #1 and #2 expressed a particular interest in the ratio findings and urged us to keep the ratio as the focus. We believe that this emphasis reflects the novel aspects of our research and aligns with the thematic structure of our manuscript.

      (3) Line 80. controversial should read uncertain.

      Thank you for the suggestion. We have changed “controversial” to “uncertain”.

      (4) It is unclear which fatty acids are included in total PUFAs, omega-6 PUFAs and omega-3 PUFAs. It is vital that this is specified.

      Thank you very much for your suggestion. We agree that it is important to clarify the specific fatty acids included in the analysis. In the revised manuscript, we emphasized that we analyzed “total omega-6 PUFAs” and “total omega-3 PUFAs”, while “LA is one type of omega-6 PUFAs” and “DHA is one type of omega-3 PUFAs”. We also revised the Method section of “Ascertainment of exposure” to provide more information about how the exposures were measured. Here is what we added in the “Methods- Ascertainment of exposure” section of our manuscript: “Five PUFAs-related biomarkers were directly measured in absolute concentration units (mmol/L), including total PUFAs, total omega-3 PUFAs, total omega-6 PUFAs, docosahexaenoic acid (DHA), and linoleic acid (LA). Of note, DHA is one type of omega-3 PUFAs, and LA is one type of omega-6 PUFAs. Our primary exposure of interest, the omega-6/omega-3 ratio, was calculated based on their absolute concentrations. We also performed supplemental analysis for four exposures, the percentages of omega-3 PUFAs, omega-6 PUFAs, DHA, and LA in total fatty acids (omega-3%, omega-6%, DHA%, and LA%), which were calculated by dividing their absolute concentrations to that of total fatty acids.”

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The signaling pathway upstream of Maf1 remains unknown. In eukaryotes, Maf1 is a negative regulator of RNA pol III and is regulated by external signals via the TORC pathway. Since TORC components are absent in the apicomplexan lineage, one central question that remains open is how Maf1 is regulated in P. falciparum. Magnesium is probably not the sole stimulus involved, as suggested by the observation that Ile deprivation also down-regulates RNA pol III activity.

      We agree that there is still much to uncover relating to the PfMaf1 signaling pathway. While we still do not know each component, we have been able to link external factors (of course not limited to only magnesium) to the increased nuclear occupancy of PfMaf1. Other protein interactors that potentially regulate PfMaf1, while not confirmed, have been identified in plasma sample as candidates for future experiments to validate their potential involvement of RNA Pol III inhibition.

      The study does not address why MgCl2 levels vary depending on the clinical state. It is unclear whether plasma magnesium is increased during asymptomatic malaria or decreased during symptomatic infection, as the study does not include control groups with non-infected individuals. Along the same line, MgCl2 supplementation in parasite cultures was done at 3mM, which is higher than the highest concentrations observed in clinical samples.

      This reviewer raised a valid point. The plasma magnesium levels for the wet symptomatic samples (averaging [0.79mM]) were within the normal range of a healthy individual (between [0.75-0.95mM]) while the dry asymptomatic levels were above the normal range (averaging [1.13mM]). Ideally, we would have liked to have control uninfected plasma samples from individuals from The Gambia. Unfortunately, field studies and human volunteer studies do not always have all the ideal controls that in vitro studies have. We recognize that [3mM] is higher than the normal range for magnesium levels, which is why we included a revised Supplementary Figure 3A. This figure shows that magnesium concentrations as low as [1mM] (similar to the levels found in dry asymptomatic samples) reduced the expression of RNA Pol III-transcribed genes.

      Although the study provides biochemical evidence of Maf1 accumulation in the parasite nuclear fraction upon magnesium addition, this is not fully supported by the immunofluorescence experiments.

      We agree that the resolution of IFA images does not allow to support the WB data. We believe that the importance of the IFA Supplementary Figure is to show that PfMaf1 clusters together in foci, which has not been previously reported.

      Reviewer #2 (Public Review):

      Weaknesses:

      However, most analyses are rather preliminary as only very few (3-5) candidate genes are analyzed by qPCR instead of carrying out comprehensive analyses with a large qPCR panel or RNA-seq experiments with GO term analyses. Data presentation lacks clarity, the number of biological replicates is rather low and the statistical analyses need to be largely revised. Although the in vivo data from wet (mildly symptomatic) and dry (asymptomatic) season parasites with different expression levels of Pol III-regulated genes, var genes, and MgCl2 are interesting, the link between the in vitro data and the in vivo virulence of P. falciparum, which is made in many sections of the manuscript, should be toned down. Especially since (i) the only endothelial receptor studied is CD36, which is associated with parasite binding during mild malaria, and (ii) several studies provide contradictory data on MgCl2 levels during malaria and in different disease states, which is not further discussed, but the authors mainly focused on this external stimulus in their experiments.

      We agree that, ideally, we would have liked to do full RNA-seq on The Gambia samples. However, that was out of the scope of this project. The RNA samples were limited which is why we did not use more primers. We believe that an appropriate number of replicates was done for the experiments. The wet symptomatic samples from this study were from mildly symptomatic individuals, as stated in the manuscript. Therefore, CD36 was a relevant receptor to use for our studies.

      We agree that the published studies about magnesium levels in infected individuals are not always consistent. What these studies do not consider is the time of year, whether the infection occurred during the dry or wet season. These studies were also done in different regions of the world using different technologies. For this reason, we only highlight the observed difference observed in our field study data from The Gambia.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) The signals upstream of Maf1 remain rather a black box. 4 are tested - heat shock and low-glucose, which seem to suppress ALL transcription; low-Isoleucine and high magnesium, which suppress Pol3. Therefore the authors use Mg supplementation throughout as a 'starvation type' stimulus. They do not discuss why they didn't use amino acid limitation, which could be more easily rationalised physiologically. It may be for experimental simplicity (no need for dropout media) but this should be discussed, and ideally, sample experiments with low-IsoLeu should be done too, to see if the responses (e.g. cytoadhesion) are all the same.

      We agree that deprivation of isoleucine would have been another experimental assay for our study, but it also would not have been as novel as magnesium. While understanding the exact mechanism or involvement of magnesium as a stress condition was not the scope of this manuscript, we believe that our data will be valuable into demonstrating that external stimuli act on P. falciparum virulence gene expression via RNA Pol III inhibition. Since we also had plasma level data for magnesium, and not isoleucine, we believed it made for a better external factor to use for our in vitro studies.

      (2) The proteomics, conducted to seek partners of Maf1, is probably the weakest part. From Figure S3: the proteins highlighted in the text are clearly highly selected (as ones that might be relevant, e.g. phosphatases), but many others are more enriched. It would be good to see the whole list, and which GO terms actually came top in enrichment.

      We apologize if the reviewer did not see the attached supplementary Co-IP MS data. The file includes all proteins found in each sample as well as GO term analysis. For the purpose of this work, we highlight proteins potentially involved in the canonical role of Maf1 that have been shown in model organisms to reversibly inhibit RNA Pol III (phosphatases, RNA Pol III subunits).

      (3) Figure 3 shows the Maf1-low line has very poor growth after only 5 days but it is stated that no dead parasites are seen even after 8 cycles and the merozoites number is down only ~18 to 15... is this too small to account for such poor growth (~5-fold reduced in a single cycle, day 3-5)? It would additionally be interesting to see a cell-cycle length assessment and invasion assay, to see if Maf1-low parasites have further defects in growth.

      We agree with the reviewer that the observed reduced merozoite numbers may not the only cause of the reduced growth rate. Other factors in the PfMaf1 knock-down line may contribute to the observed poor growth.

    1. Author Response

      Our answer to reviewer #1 comments:

      We attempted to perform structural characterization of the ASK1 complex with TRX1, but were unable to prepare a sufficiently stable ASK1:TRX1 complex for cryo-EM analysis, probably due to their relatively weak interactions. Therefore, we subsequently decided to use HDX-MS to characterize the structural changes of ASK1 induced by interactions with TRX1.

      Detailed information about cryo-EM data processing including 2D classification averages, local resolution of the EM map and FSC figure are shown in Supporting Information, Supplementary Table S1 and Figures S1-S3.

      We fully agree with the reviewer that the presence of hydrogen bonding cannot be reliably described at this resolution. However, if there is a sufficient electron density in a given region and a corresponding hydrogen bond donor-acceptor pair in the model, this suggests the possible presence of such an interaction.

      Our answer to reviewer #2 comments:

      We are fully aware that the use of a C-terminally truncated construct limits this study due to the presumed role of the C-terminus in ASK1 dimerization. A C-terminally truncated construct consisting of TBD, CRR, and KD (residues 88-973) was used due to the low expression yield and solubility of full-length human ASK1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We found the reviewer’s comments highly valuable and addressed them by the following additional experiments and changes in the text and the figures:

      (1) We measured the effect of ROCK MASO’s on the ROCK expression by immunostaining and observed a reduction in ROCK signal, supporting the downregulation of ROCK protein level under ROCK MASO’s (new Fig. S3).

      (2) We measured the effect of lower concertation of ROCK inhibitor, Y27632 (10µM), and observe the same phenotypes of skeletal loss, skeletal reduction and ectopic branching in this concentration (Fig. 2, S4). Importantly, these phenotypes were not observed when directly inhibiting PKA and PKC, in whole sea urchin embryos (1) and in skeletogenic cell cultures (2), further supporting the specificity of ROCK inhibitor.

      (3) We added a time course of Pl-ROCK expression and immunostaining of ROCK in the fertilized egg, that show that this gene is maternal and the protein is present in the egg Fig. 2SA-C.

      (4) We recorded F-actin in ROCK MASO’s and demonstrate that it is still detected around the spicules and their tips, similarly to ROCK inhibited embryos (new Fig.S3).

      (5) We revised the paper text and figures to provide a better description of our results, distinguish clearly between our data and our interpretations and emphasize the novelty of our findings.

      This paper demonstrates that ROCK, F-actin polymerization and actomyosin contractility play critical roles in biomineral growth and in shaping biomineral morphology in the sea urchin embryo, and that ROCK activity affects skeletogenic gene expression. Our findings together with previous reports of the role of actomyosin in Eukaryotes biomineralization, suggest that this molecular machinery is a part of the common molecular tool-kit used in biomineralization. The identification of a common molecular mechanism within the diverse gene regulatory networks, organic scaffolds and minerals that Eukaryote use to build their biominerals will be of high interest to the field of biomineralization and evolutionary biology. Furthermore, our paper portrays the interplay between the cellular and the genetic machinery that drives morphogenesis. We believe it would be of great interest to the broad readership of eLife and particularly to the fields of biomineralization, cell, developmental and evolutionary biology.

      Thank you very much for the helpful review of our paper.

      Reviewer #1 (Public Review):

      We thank the reviewer for the appreciation of our work the helpful comments that guided us to strengthen the experimental evidence for our conclusions and increase the paper’s clarity. Below are our responses to the specific comments:

      Major comments

      One MASO led to reduced skeleton formation while the other one additionally induced ectopic branching. How was the optimum concentration for the MASOs determined? Did the authors perform a dose-response curve? What is the reason for this difference? Which of the two MASOs can be validated by reduced ROCK protein abundance? Since the ROCK antibody works, I would like to see a control experiment on Rock protein abundance in control and ROCK MO injected larvae which is the gold-standard for validating the knock-down.

      We tested several MASO concentrations to identify a concentration where the control embryos injected with Random MASO were overall healthy and ROCK MASO’s showed clear phenotypes.

      To test the effect of ROCK MASO’s on ROCK protein levels we did immunostaining experiments that are now presented in new Fig. S3. We could not do Western blot for injected embryos since ROCK antibody requires thousands of embryos for Western blot, which is not feasible for injected embryos. Therefore, we tested the effect of the two translation ROCK MASO’s on ROCK abundance compared to uninjected and Random MASO injected embryos using immunostaining. We observed a reduction of ROCK signal, supporting the downregulation of ROCK protein level in these genetic perturbations (new Fig. S3).

      L212 "Together, these measurements show that ROCK is not required for the uptake of calcium into cells." But what about trafficking and exocytosis? As mentioned earlier, I think this is a really important point that needs to be confirmed to understand the function of ROCK in controlling calcification. In their previous study (reference 45) the authors demonstrated that they have superior techniques in measuring vesicle dynamics in vivo. Here an acute treatment with the ROCK inhibitor would be sufficient to test if calcein-positive vesicle motion, including the observed reduction in velocity close to the tissue skeleton interface, is affected by the inhibitor.

      We thank the reviewer for the appreciation of our previous work where we studied calcium vesicle dynamics in whole embryos (Winter et al, Plos Com Biol 2021). We agree with the reviewer that the best way to test directly the effect of ROCK on mineral deposition and vesicle kinetics is to observe it in live skeletogenic cells. However, in Winter et al 2021, we found that the skeleton (spicules) doesn’t grow when the embryos are immobilized in either control or treated embryos. We have to immobilize the embryos to record live timelapses of whole embryos. Hence, this means that we can not determine the role of ROCK or any other perturbation in vesicle trafficking and exocytosis based on experiments conducted in immobilized whole embryos, since skeletogenesis is arrested. We believe that we can do it in skeletogenic cell cultures and we are currently developing this assay for vesicle tracking, but this is beyond the scope of this current work.

      Is there a colocalization of ROCK and f-actin in the tips of the spicules? This would support the mechano-sensing-hypothesis by ROCK.

      Our studies show that F-actin is localized around the spicule cavity and in the cortex of the cells (Figs. 5 and 6) while ROCK is enriched in the skeletogenic cell bodies, with some localization near the skeletogenic cell membranes (Fig. 1). To directly address the reviewer question we immune-stained ROCK and F-actin in the same embryos, and showed that their sub-cellular localizations does not show a strong overlap (Fig. S3 Q-T). However, ROCK does not bind F-actin directly: ROCK activates another kinase, LimK that phosphorylates Cofilin that interacts with F-actin. Therefore, the fact that ROCK is not colocalized with F-actin does not support nor contradicts the possible role of ROCK in mechano-sensing.

      L 283. "F-actin is enriched at the tips of the spicules independently of ROCK activity" The results of this paragraph clearly demonstrate that ROCK inhibition has no effect on the localization of f-actin at the tips of the growing spicules. In addition, the new cell culture experiments underline this observation. Still, the central question that remains is, what is the interaction between ROCK, f-actin, and the mineralization process, that leads to the observed deformations? What does the f-actin signal look like in a branched phenotype or in larvae that failed to develop a skeleton (inhibition from Y20)?

      As we report in Fig. 6, and now on new Fig. S3, under ROCK late inhibition or in ROCK morphants, we still detect F-actin around the spicule and enriched at the tips. When ROCK is inhibited and the embryo fails to develop a skeleton, we observe Factin accumulation in the skeletogenic cells, but the F-actin is not organized (Fig. 5). As the spicule is absent in this condition, it is hard to conclude whether the effect on F-actin organization is direct or due to the absence of spicule in this condition. We stated that explicitly in the current version in the results, lines 324-326 and in the discussion, lines 405-408.

      Immunohistochemical analyses on f-actin localization and abundance should be additionally performed with ROCK knock-down phenotypes to confirm the pharmacological inhibition.

      We did that in our new Figure S3 and showed that ROCK morphant show the same F-actin localization at the tips like control and ROCK inhibited embryos.

      L 365 "...supporting its role in mineral deposition..." "...Overall, our studies indicate that ROCK activity....is essential for the formation of the spicule cavity......which could be essential for mineral deposition..." I think the authors need to do a better job in clearly separating between the potential processes impacted by ROCK perturbation. Is it stabilization and mechano-sensing in the spicule tip or the intracellular trafficking and deposition of the ACC? If the dataset does not allow for a definite conclusion, I suggest clearly separating the different possibilities combined with thorough discussion-based findings from other mineralizing systems where the interaction between ROCK and F-actin has been described.

      We thank the reviewer for this important comment. We believe that ROCK and the actomyosin are involved in both, mechano-sensing of the rigid biomineral and in the transport and exocytosis of mineral-bearing vesicles. In the current version we provide explicit explanations of these two hypotheses in the discussion section. The possible role in exocytosis and the experiments that are required to assess this role are described in lines 427-439, and the possible mechano-sensing role and effect on gene expression is described in lines 440-453.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      L185 "These SR-µCT measurements show that the rate of mineral deposition is significantly reduced under ROCK inhibition." To correctly support this statement I would suggest to calculate the real growth rates (µm3 time-1). For example, an increase in volume from 6,850 µm3 at 48 hpf to 14,673 µm3 at 72 hpf would result in a growth rate of 7823 µm3 24h-1.

      We thank the reviewer for this suggestion. We calculated the rate of spicule growth as the reviewer suggested and we added this information in lines 218-221.

      L343: "This implies that....within the skeletogenic lineage." This concluding sentence is very speculative and therefore misplaced in the results section.

      We removed this sentence from the results section into the discussion, lines 443-445.

      L382: "The participation of F-actin and ROCK in polarized tip-growth and vesicle exocytosis has been observed in both, animals and plants." L407-409: "...F-actin could be regulating the localized exocytosis of mineral-bearing vesicles...." I think this is exactly the core question that remains unresolved in this study. To reduce speculations I strongly recommend addressing the effect of ROCK inhibition on vesicle trafficking and exocytosis (Monitoring of calcein-positive Vesicles in PMCs).

      We agree with the reviewer that this is a critical question that we would have address, but as we explained above, is beyond the scope of this study.

      Figure 5: The values below the scale bars in the newly added figures U+V are extremely small. Also, the Legend for this figure sounds incorrect. Should read: "...and skeletogenic cell cultures that were treated with 30µM ROCK inhibitor that was added at 48hpf and recorded at 72hpf.

      We increased the font near the scale bars and corrected the figure caption. Thanks for this and your other helpful comments!

      Reviewer #2 (Public Review):

      We thank the reviewer for raising the important issue of inhibitor concentration which led us to do additional experiments with lower concentration that were valuable and strengthen the manuscript. We also thank the reviewer for asking us to be clearer with the interpretation of the results. Below are our responses to the specific comments:

      My concerns are the interpretation of the experiments. The main overriding concern is a possible over-interpretation of the role of ROCK. In the literature that ROCK participates in many biological processes with a major contribution to the actin cytoskeleton. And when a function is attributed to ROCK, it is usually based on the determination of a protein that is phosphorylated by this kinase. Here that is not the case. The observation here is in most cases stunted growth of the spicule skeleton and some mis-patterning occurs or there is an absence of skeleton if the inhibitor is added prior to initiation of skeletal growth. They state in the abstract that ROCK impairs the organization of F-actin around the spicules. The evidence for that as a direct role is absent.

      We agree with the reviewer that since the spicule doesn’t form under ROCK continuous inhibition, it is unclear if the absence of F-actin around the spicule in this condition is a direct outcome of the lack of ROCK activation of F-actin polymerization, or an indirect outcome due to the lack of spicule to coat. We therefore deleted this line in the abstract and explicitly stated that we cannot conclude whether the impaired F-actin organization is directly due to ROCK effect on actin polymerization in the results, lines 324-326 and in the discussion, lines 405-408.

      They use morpholino data and ROCK inhibitor data to draw their conclusion. My main concern is the concentration of the inhibitor used since at the high concentrations used, the inhibitor chosen is known to inhibit other kinases as well as ROCK (PKA and PKC). They indicate that this inhibition is specifically in the skeletogenic cells based on the isolation of skeletogenic cells in culture and spicule production either under control or ROCK inhibition and they observe the same - stunting and branching or absence of skeletons if treated before skeletogenesis commences. Again, however, the high concentrations are known to inhibit the other kinases.

      In the previous version of the paper we used the range of 30-80µM Y-27632 to block ROCK activity. These concentrations are commonly used in mammalian systems and in Drosophila to block ROCK activity (3-8). The reviewer is correct stating that at high concentration, this inhibitor can block PKA and PKC. However, the affinity of the inhibitor for these kinases is more than 100 times lower than its affinity to ROCK as indicated by the biochemical Ki values reported in the manufactory datasheet: 0.14-0.22 μM for ROCK1, 0.3 μM for ROCK2, 25 μM for PKA and 26 μM for PKC.

      Importantly, these Ki values are based on biochemistry assays where the activity of the inhibitor is tested in-vitro with the purified protein. Therefore, these concentrations are not relevant to cell or embryo cultures where the inhibitor has to penetrate the cells and affect ROCK activity in-vivo. Y-27632 activity was studied both in-vitro and in-vivo in Narumiya, Ishizaki and Ufhata, Methods in Enzymology 2000 (9). This paper reports similar concentrations to the ones indicated in the manufactory datasheet for the in-vitro experiments, but shows that 10µM concentration or higher are effective in cell cultures. We therefore tested the effect of 10µM Y-27632 added at 0hpf (continuous inhibition) and at 25hpf (late inhibition) and added this information to Figs. 2 and S3. Continuous inhibition at this concentration resulted with three major phenotypes: skeletal loss, spicule initiations and small spicules with ectopic branching. This result supports our conclusion that ROCK activity is necessary for spicule formation, elongation and prevention of branching. Late inhibition in this concentration resulted with the majority of the embryos developing branched spicules, which is very similar to the effect of MyoII inhibition with Blebbistatin. This result again, supports the inference that ROCK activity is required for normal skeletal growth and the prevention of ectopic branching. Importantly, there are two papers were PKA and PKC were directly inhibited in whole sea urchin embryos (1) and in skeletogenic cell cultures (2). In both assays, PKC inhibition resulted with mild reduction of spicule length while PKA inhibition did not affect skeletal formation. Neither skeletal loss nor ectopic branching were ever observed under PKC or PKA inhibition, supporting the specific inhibition of ROCK by Y-27362. Furthermore, both genetic and pharmacological perturbations of ROCK resulted with significant reduction of skeletal growth and with the enhancement of ectopic branching. Therefore, we believe we provide convincing evidence for the role of ROCK in spicule formation, growth and prevention of branching. We revised Fig. 2 and S3 to include the 10µM Y-27632 data and the text describing the inhibition to include the explanations and references we provided here.

      They use blebbistatin and latrunculin and show that these known inhibitors of actin cytoskeleton lead to abnormal spiculogenesis, This coincidence is suggestive but is not proof that it is ROCK acts on the actomyosin cytoskeleton given the specificity concerns.

      As stated above, we believe that in the current vesion we overcame the specificity concerns and provided solid evidence that ROCK activity is necessary for spicule formation, growth and prevention of branching. Furthermore, the skeletogenic phenotypes of late 10µM Y-27632 are highly similar to those of MyoII inhibition (Blebbistatin) while the phenotypes of higher concetrations resemble the inhibition of actin polymerization by Latrunculin. We agree with the reviewer that: “This coincidence is suggestive but is not proof that ROCK acts on the actomyosin cytoskeleton” and we revise the discussion paragraph to differentiate between our solid findings and our speculations (lines 421-426): “These correlative similarities between ROCK and the actomyosin perturbations lead us to the following speculations: the low dosage of late ROCK inhibition is perturbing mostly ROCK activation of MyoII contractility while the higher dosage affects factors that control actin polymerization (Fig. 8F). Further studies in higher temporal and spatial resolution of MyoIIP activity and F-actin structures in control and under ROCK inhibition will enable us to test this.”

      Reviewer #2 (Recommendations For The Authors):

      The following areas require attention:

      (1) You begin and end the abstract with statements on evolution in which the actomyosin cytoskeleton is associated with skeletogenesis despite different GRNs, different contributing proteins, etc. You then move to ROCK and claim to reveal that ROCK is a central player in the process. As above, in the judgement of this reviewer, you fail to establish a direct role of ROCK to the actomyosin role in skeletogenesis. Sure, the ROCK inhibitors suggest that ROCK plays some kind of role in the process but you also indicate that ROCK could act on many processes, none of which you directly associate with the necessary activity of ROCK.

      We agree that our paper provides correlative similarities between the phenotypes of ROCK and those of direct pertrubations of the actomyosin network, and lacks causal relationship. We made this point clear throughout the current version of the manuscript.

      (2) In the abstract you report that ROCK inhibition impairs the actin cytoskeleton around the skeleton. In examining your images in Fig. 5 that is not the case. Based on Phalloidin staining, actin surrounds both the control and the ROCK-inhibited skeleton. The distribution of actin is the same in both cases. Myosin is also stained in this figure and it too shows similar staining both in experimental and control. So, to this reviewer, there is insufficient evidence to suggest that the actin cytoskeleton is impaired, and there is no evidence directly relating ROCK with that cytoskeleton. I'm not questioning the observation that inhibition of ROCK causes stunting and mispatterning of the skeleton. That you show and quantify well. The issue is the precise target of ROCK. Your data does not establish the specific cause. It could be the actin cytoskeleton but your experiments do not directly address that.

      Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (3) In parts of the manuscript you use the term filopodia and in other parts I think you use pseudopodia to refer to the same structure. Since Ettensohn has provided the most evidence on the organization of the skeletogenic syncytia, I suggest you use the same term he used for those cellular extensions.

      The filopodia and the pseudopodia are two distinct structures generated by the skeletogenic cells. The filopodia is the common cellular extension described in many cells, while the term “pseudopodia cable” describes the specific structure that forms between the skeletogenic cells in which the spicule cavity forms, in agreement with Prof. Ettensohn terminology.

      (4) In trying to find relationships you cite a number of previous papers at the end of the introduction. I went back to those papers and they describe (from your work) calcium exocytosis, plus filopodia formation, plus planar cell polarity, plus CDC42, any one of which could involve an actin cytoskeleton. You even cite a paper saying that perturbations of ROCK prevent spicule formation. I went back to that paper and that isn't the case. You then summarize the Introduction by relating ROCK and the actin cytoskeleton, thereby raising reader expectation that the two will be connected. As above, in reality, your evidence here does not connect the two.

      We thank the reviewer for giving us credit for all these works, but only the paper on vesicle kinetics is from our lab (winter et al 2021). As for Croce et al, 2006 that the reviewer refers to: in Fig. 9A, 75µM of Y-27632 is used to inhibit ROCK in the same sea urchin species that we use, and the phenotype is identical to what we observe – the skeletogenic cells are there, but the spicule is not formed. As mentioned above, in the current version we distinguished clearly between our solid findings and our interpretations.

      (5) You emphasize in Fig. 1 the inhibition of ROCK in the presence of VEGFR inhibition. However, at no place in the manuscript do you say anything about how VEGFR is inhibited, when it is inhibited, or how you know it is inhibited. That oversight must be corrected. You mention axitinib but don't say anything about what it does. Some readers may know its activity but many will not.

      We now indicate that we use Axitinib to block VEGFR in the results section (line 104) and in the methods section (lines 470-471).

      (6) Fig. 2. The use of Y27632 as a selective inhibitor of ROCK. According to data sheets from the manufacturer, at the levels used in your experiments, 120 µm, 80 µm and 30 µm, those levels of inhibitor also inhibit the activity of PKA and PKC (both inhibited at around 25 µm). This is concerning because of the literature indicating that activation of the VEGFR operates through PKA. Inhibition of PKA, then, would inhibit the activity of VEGF signaling. Thus, the inhibitory effects of Y27632 may actually not be attributed specifically to ROCK. Furthermore, the heading of this section states that ROCK activity controls initiation, growth, and morphology of the spicule. Yet, even in high levels of inhibitor spicule production is initiated. Yes, the growth and the morphology are compromised, but the initiation doesn't seem to be.

      The spicule fails to form under ROCK continuous inhibition in all concentrations (Fig. 2). Also, as we explained in details above, these Ki values are based on biochemical experiments with purified proteins and are not relevant to in-vivo use of the inhibitor. Yet, these Ki values demonstrate that the affinity of the inhibitor to ROCK is 100 higher than of its affinity to PKA and PKC. Specifically to the reviewer suggestion here: direct inhibition of PKA does not have skeletogenic phenotypes, not in whole embryos (1) and not in skeletogenic cell culture (2). Since we see the same skeletogenic phenotypes at low Y-27362 concentration and the genetic and pharmacological pertrubations of ROCK reconcile, we believe that these phenotypes can be atributed directly to ROCK.

      (7) The synchrotron study is very nice with two points that should be addressed. Again, a high concentration of Y27632 was used giving a caveat on ROCK specificity. And second, the blue and green calcein pulses are very nice but the recent paper by the Bradham group should be cited.

      We added a reference to Bradham recent paper on two calcein pulses (10).

      (8) Fig. 5 is where an attempt is made to associate ROCK inhibition to alterations in actomyosin. Again, a high concentration of the inhibitor is used casting doubt on whether it specifically inhibits ROCK. However, even if the inhibition is specific to ROCK the images do not provide convincing evidence that ROCK activity normally is directed toward actomyosin. This is crucial to the manuscript.

      As stated above, we addressed the specificity in this version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (9) Again in Fig. 6 the inhibitor is used with the same concern about whether the effects noted are due to ROCK.

      Fig. 6 is now Fig. 7 – the effect of ROCK on gene expression and as explained above, we addressed the specificity in this version.

      (10) Lines 350-358. This interpretation falls apart without showing that the inhibitor is specific for ROCK as indicated above. Also, Fig. 5 is unconvincing in showing a difference in actin or myosin distribution in control vs ROCK inhibited embryos. Yes, the spicules are stunted, but whether actin or myosin have anything to do with that as a result of lack of ROCK activity is not demonstrated.

      As stated above, we addressed the specificity in the revised version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (11) Throughout, the manuscript spelling, grammar, and sentence structure will require extensive editing. The mistakes are numerous.

      We did our best to correct the spelling and grammar. If we still missed some mistakes, we would be happy to further correct them.

      References

      (1) Mitsunaga K, Shinohara S, Yasumasu I. Probable Contribution of Protein Phosphorylation by Protein Kinase C to Spicule Formation in Sea Urchin Embryos: (sea urchin/protein kinase C/spicule formation/H-7/HA1004). Dev Growth Differ. 1990;32(3):335-42.

      (2) Mitsunaga K, Shinohara S, Yasumasu I. Does Protein Phosphorylation by Protein Kinase C Support Pseudopodial Cable Growth in Cultured MicromereDerived Cells of the Sea Urchin, Hemicentrotus pulcherrimus?: (sea urchin/protein kinase C/spicule formation/phorbol ester/H-7). Dev Growth Differ. 1990;32(6):647-55.

      (3) Su Y, Huang H, Luo T, Zheng Y, Fan J, Ren H, et al. Cell-in-cell structure mediates in-cell killing suppressed by CD44. Cell Discov. 2022;8(1):35.

      (4) Kagawa H, Javali A, Khoei HH, Sommer TM, Sestini G, Novatchkova M, et al. Human blastoids model blastocyst development and implantation. Nature. 2022;601(7894):600-5.

      (5) Canellas-Socias A, Cortina C, Hernando-Momblona X, Palomo-Ponce S, Mulholland EJ, Turon G, et al. Metastatic recurrence in colorectal cancer arises from residual EMP1(+) cells. Nature. 2022;611(7936):603-13.

      (6) Becker KN, Pettee KM, Sugrue A, Reinard KA, Schroeder JL, Eisenmann KM. The Cytoskeleton Effectors Rho-Kinase (ROCK) and Mammalian DiaphanousRelated (mDia) Formin Have Dynamic Roles in Tumor Microtube Formation in Invasive Glioblastoma Cells. Cells. 2022;11(9).

      (7) Segal D, Zaritsky A, Schejter ED, Shilo BZ. Feedback inhibition of actin on Rho mediates content release from large secretory vesicles. J Cell Biol. 2018;217(5):1815-26.

      (8) Fischer RS, Gardel M, Ma X, Adelstein RS, Waterman CM. Local cortical tension by myosin II guides 3D endothelial cell branching. Curr Biol. 2009;19(3):2605.

      (9) Narumiya S, Ishizaki T, Uehata M. Use and properties of ROCK-specific inhibitor Y-27632. Methods Enzymol. 2000;325:273-84.

      (10) Descoteaux AE, Zuch DT, Bradham CA. Polychrome labeling reveals skeletal triradiate and elongation dynamics and abnormalities in patterning cue-perturbed embryos. Dev Biol. 2023;498:1-13.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The OSCA/TMEM63 channels have recently been identified as mechanosensitive channels. In a previous study, the authors found that OSCA subtypes (1, 2, and 3) respond differently to stretch and poke stimuli. For example, OSCA1.2 is activated by both poke and stretch, while OSCA3.1, responds strongly to stretch but poorly to poke stimuli. In this study, the authors use cryo-EM, mutagenesis, and electrophysiology to dissect the mechanistic determinants that underlie the channels' ability to respond to poke and stretch stimuli.

      The starting hypothesis of the study is that the mechanical activation of OSCA channels relies on the interactions between the protein and the lipid bilayer and that the differential responses to poke and stretch might stem from variations in the lipid-interacting regions of OSCA proteins. The authors specifically identify the amphipathic helix (AH), the fenestration, and the Beam Like Domain (BLD) as elements that might play a role in mechanosensing.

      The strength of this paper lies in the technically sound data - the structural work and electrophysiology are both very well done. For example, the authors produce a high-resolution OSCA3.1 structure which will be a useful tool for many future studies. Also, the study identifies several interesting mutants that seemingly uncouple the OSCA1.2 poke and stretch responses. These might be valuable in future studies of OSCA mechanosensation.

      However, the experimental approach employed by the authors to dissect the molecular mechanisms of poke and stretch falls short of enabling meaningful mechanistic conclusions. For example, we are left with several unanswered questions surrounding the role of AH and the fenestration lipids in mechanosensation: Is the AH really important for the poke response if mutating residues conserved between OSCA1.2 and OSCA3.1 disrupts the OSCA1.2 ability to respond to poke but mutating the OSCA1.2 AH to resemble that of OSCA3.1 results in no change to its "pokability"? Similar questions arise in response to the study of the fenestrationlining residues.

      We thank the reviewer for their feedback. We believe that the different OSCA1.2 mutants on their own suggest an involvement of the AH and fenestration-lining residues in its mechanosensitive response. We attribute the inability to restore the poke response of OSCA3.1 with similar mutations to its inherent high threshold to this particular stimulus and perhaps other structural differences, or a combination of them, that we did not probe in this study. We agree more work is required in the field to address these remaining questions and further dissect the difference between poke and stretch responses.

      Reviewer #2 (Public Review):

      Summary:

      Jojoa-Cruz et al. determined a high-resolution cryo-EM structure in the Arabidopsis thaliana (At) OSCA3.1 channel. Based on a structural comparison between OSCA3.1 and OSCA1.2 and the difference between these two paralogs in their mechanosensitivity to poking and membrane stretch, the authors performed structural-guided mutagenesis and tested the roles of three structural domains, including an amphipathic helix, a beam-like domain, and a lipid fenestration site at the pore domain, for mechanosensation of OSCA channels.

      Strengths:

      The authors successfully determined a structure of the AtOSCA3.1 channel reconstituted in lipid nanodiscs by cryo-EM to a high resolution of 2.6 Å. The high-resolution EM map enabled the authors to observe putative lipid EM densities at various sites where lipid molecules are associated with the channel. Overall, the structural data provides the information for comparison with other OSCA paralogs.

      In addition, the authors identified OSCA1.2 mutants that exhibit differential responses to mechanical stimulation by poking and membrane stretch (i.e., impaired response to poke assay but intact response to membrane stretch). This interesting behavior will be useful for further study on differentiating the mechanisms of OSCA activation by distinct mechanical stimuli.

      Major weakness:

      The major weaknesses of this study are the mutagenesis design and the functional characterization of the three structural domains - an amphipathic helix (AH), a beam-like domain (BLD), and the fenestration site at the pore, in OSCA mechanosensation.

      (1) First of all, it is confusing to the reviewer, whether the authors set out to test these structural domains as a direct sensor(s) of mechanical stimuli or as a coupling domain(s) for downstream channel opening and closing (gating). The data interpretations are vague in this regard as the authors tend to interpret the effects of mutations on the channel 'sensitivity' to different mechanical stimuli (poking or membrane stretch). The authors ought to dissect the molecular bases of sensing mechanical force and opening/closing (gating) the channel pore domain for the structural elements that they want to study.

      We agree with the reviewer that our data are unable to distinguish the transduction of a mechanical stimulus and channel gating. We set up to determine whether these features were involved in the mechanosensitive response. However, as the reviewer points out, evaluating whether they work as direct sensors or coupling domains would require a more involved experimental design that lies beyond the scope of this work. Thus, we do not claim in our study whether these features act as direct sensors of mechanosensitive stimuli or as coupling domains, only their involvement.

      Furthermore, the authors relied on the functional discrepancies between OSCA1.2 (sensitive to both membrane poking and stretch) and OSCA3.1 (little or weak sensitivity to poking but sensitive to membrane stretch). But the experimental data presented in the study are not clear to address the mechanisms of channel activation by poking vs. by stretch, and why the channels behave differently.

      We had hoped that when we switched regions of the OSCA1.2 and OSCA3.1 channels we would abolish poke-induced responses in OSCA1.2 and confer poke-induced sensitivity to OSCA3.1. We agree with the reviewer that we were not able to pinpoint the reason or multiple reasons, as it could be a compounded effect of several differences, that caused OSCA3.1 higher threshold and thus we could not confer to it an OSCA1.2-like phenotype. Yet, we shed some light on some of the structural differences that appear to contribute to OSCA3.1 behavior, as mutagenesis of OSCA1.2 to resemble this channel led to OSCA3.1-like phenotype.

      (2) The reviewer questions if the "apparent threshold" of poke-induced membrane displacement and the threshold of membrane stretch are good measures of the change in the channel sensitivity to the different mechanical stimuli.

      The best way to determine an accurate measure of sensitivity to mechanical stimuli is stretch applied to a patch of membrane. There are more complicating factors that influence the determination of "apparent threshold" in the whole cell poking assay, including visualizing when the probe first hits the cell (very difficult to see). With that said, the stretch assay has its own issues such as the creep of the membrane into the pipette glass which we try to minimize with positive pressure between tests.

      (3) Overall, the mutagenesis design in the various structural domains lacks logical coherence and the interpretation of the functional data is not sufficient to support the authors' hypothesis. Essentially the authors mutated several residues on the hotspot domains, observed some effects on the channel response to poking and membrane stretch, then interpreted the mutated residues/regions are critical for OSCA mechanosensation. Examples are as follows.

      In the section "Mutation of key residues in the amphipathic helix", the authors mutated W75 and L80, which are located on the N- and C-terminal of the AH in OSCA1.2, and mutated Pro in the OSCA1.2 AH to Arg at the equivalent position in OSCA3.1 AH. W75 and L80 are conserved between OSCA 1.2 and OSCA3.1. Mutations of W75 and/or L80 impaired OSCA1.2 activation by poking, but not by membrane stretch. In comparison, the wildtype OSCA3.1 which contains W and L at the equivalent position of its AH exhibits little or weak response to poking. The loss of response to poking in the OSCA1.2 W/L mutants does not indicate their roles in pokinginduced activation.

      Besides, the P2R mutation on OSCA1.2 AH showed no effect on the channel activation by poking, suggesting Arg in OSCA3.1 AH is not responsible for its weak response to poking. Together the mutagenesis of W75, L80, and P2R on OSCA1.2 AH does not support the hypothesis of the role of AH involved in OSCA mechanosensation.

      Mutagenesis of OSCA1.2 in the amphipathic helix for residues W75 and L80 suggests a role of the helix in the poke response in OSCA1.2, regardless of OSCA3.1 having the same residues. Furthermore, the lack of alteration in the response for mutant P77R suggests that specific residues of the helix are involved in this response and is not a case where any mutation in the helix will lead to a loss of function.

      OSCA3.1 WT exhibits a high-threshold response (near membrane rupture) in the poke assay without any mutations, and this could be due to other features, for example, the residues lining the membrane fenestration, as well as features not identified/probed in this study. We agree with the reviewer that the differences in the AH do not explain the different response to poke in OSCA1.2 and OSCA3.1, and we have added this statement explicitly in the discussion for clarification (line #251-252).

      In the section "Replacing the OSCA3.1 BLD in OSCA1.2", the authors replaced the BLD in OSCA 1.2 with that from OSCA3.1, and only observed slightly stronger displacement by poking stimuli. The authors still suggest that BLD "appears to play a role" in the channel sensitivity to poke despite the evidence not being strong.

      We agree with the reviewer that the experiments carried out show little difference between the response of OSCA1.2 WT and OSCA1.2 with OSCA3.1 BLD, and we have stated so (line #259: “Substituting the BLD of OSCA1.2 for that of OSCA3.1 had little effect on poke- or stretchactivated responses. Although these results suggest that the BLD may not be involved in modulating the MA response of OSCA1.2…”). However, the section of the discussion that the reviewer points out also considers evidence provided by recent reports from Zheng, et al. (Neuron, 2023) and Jojoa-Cruz, et al. (Structure, 2024) and we suggest an hypothesis to reconcile our findings with these new evidence.

      OSCA1.2 has four Lys residues in TM4 and TM6b at the pore fenestration site, which were shown to interact with the lipid phosphate head group, whereas two of the equivalent residues in OSCA3.1 are Ile. In the section "Substitution of potential lipid-interacting lysine residues", the authors made K435I/K536I double mutant for OSCA1.2 to mimic OSCA3.1 and observed poor response to poking but an intact response to stretch. Did the authors mutate the Ile residues in OSCA3.1 to Lys, and did the mutation confer channel sensitivity to poking stimuli resembling OSCA1.2? The reviewer thinks it is necessary to perform such an experiment, to thoroughly suggest the importance of the four Lys residues in lipid interaction for channel mechanoactivation.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are no longer able to perform such experiments.

      Reviewer #3 (Public Review):

      Summary:

      Jojoa-Cruz et al provide a new structure of At-OSCA3.1. The structure of OSCA 3.1 is similar to previous OSCA cryo-em structures of both OSCA3.1 and other homologues validating the new structure. Using the novel structure of OSCA3.1 as a guide they created several point mutations to investigate two different mechanosensitive modalities: poking and stretching. To investigate the ability of OSCA channels to gate in response to poking they created point mutations in OSCA1.2 to reduce sensitivity to poking based on the differences between the OSCA1.2 and 3.1 structures. Their results suggest that two separate regions are responsible for gating in response to poking and stretching.

      Strengths:

      Through a detailed structure-based analysis, the authors identified structural differences between OSCA3.1 and OSCA1.2. These subtle structural changes identify regions in the amphipathic helix and near the pore that are essential for the gating of OSCA1.2 in response to poking and stretching. The use of point mutations to understand how these regions are involved in mechanosensation clearly shows the role of these residues in mechanosensation.

      Weaknesses:

      In general, the point mutations selected all show significant alterations to the inherent mechanosensitive regions. This often suggests that any mutation would disrupt the function of the region, additional mutations that are similar in function to the WT channel would support the claims in the manuscript. Mutations in the amphipathic helix at W75 and L80 show reduced gating in response to poking stimuli. The gating observed occurs at poking depths similar to cellular rupture, the similarity in depths suggests that these mutations could be a complete loss of function. For example, a mutation to L80I or L80Q would show that the addition of the negative charge is responsible for this disruption not just a change in the steric space of the residue in an essential region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several questions regarding some of the aspects of your study:

      Mutation of the hydrophobic W75 and L80 in OSCA1.2 to charged residues significantly decreases the poke response in OSCA1.2 without affecting the stretch response. However, W75 and L80 are also present in OSCA3.1, which does not respond efficiently to poke. You conclude that these two residues are important for the poke response, but do not delve into why, if these residues are important, OSCA3.1 is not poke-sensitive.

      In addition, mutation of the OSCA1.2 AH to resemble that of OSCA3.1 does not produce channels that are less poke-sensitive. Given the data presented, if AH were a universal "poke sensor", one could also expect WT OSCA3.1 to exhibit a robust poke response, like OSCA1.2. Here I think it would be important to explain in more detail how this data might fit together.

      We thank the reviewer for bringing up this issue. We decided to test the importance of the AH due to the presence of similar structures in other mechanosensitive channels. Our data showed that single and double mutants of the AH of OSCA1.2 affected its poke response but not stretch. This supports the idea of the AH involvement in the poke response. Yet, we agree that the differences in the AH between OSCA1.2 and OSCA3.1 (P77R mutation) do not explain the higher threshold of OSCA3.1, we have explicitly added this in line #255. The particular OSCA3.1 phenotype may be due to other differences in the structure, for example, differences in the membrane fenestration area, or a combined effect of several differences, which we believe is more likely.

      I also have some questions about the protein-lipid interactions in the fenestration. A lipid has been observed in this location in both OSCA1.2 and OSCA3.1 structures. Mutation of the two OSCA1.2 lysines to isoleucines results in channels that are resistant to poke which leads to the conclusion that the interactions between the fenestration lysines and lipids are important for the poke response.

      Here, there are several questions that arise but are not answered:

      It is not shown what happens when OSCA3.1 isoleucines are mutated to lysines - do these mutants result in poke-able channels? Is the OSCA3.1 mechanosensing altered?

      We performed a preliminary test on OSCA3.1 I423K/I525K double mutant (n = 3). However, we did not see an increase in poke sensitivity. We attributed this to other unexplored differences in OSCA3.1 having an effect in channel mechanosensitivity.

      It is implied that the poke response is predicated on the lysine-lipid interaction. However, lipid densities are present in both OSCA1.2 and OSCA3.1 structures, indicating that both fenestrations interact with lipids. How can we be certain that the mutation of lysine to isoleucine does not disrupt an inter-protein interaction rather than a protein-lipid one? For example, the K435I mutation might disrupt interactions with D523 or the backbone of G527?

      The reviewer brings up a good point. We believe the phenotype seen is due to a different strength in the interaction between lipids and proteins, however, disrupted interaction with other residues is a valid alternative explanation. We agree that the suggested experiments will further clarify the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Similarly, the effects of single lysine-to-isoleucine (K435I or K536I) mutations are not explored.

      The observed effect might be caused by only one of these substitutions.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      I also wanted to take this opportunity to ask a couple of philosophical (?) questions about using a mammalian system to study ion channels that have evolved to function in plants. Your study highlights the intimate relationship between the lipid bilayer and protein function/mechanosensitivity. Plant cells contain high levels of sterols and cerebrosides that would significantly affect both cell stiffness and the specific interactions that can be formed between the protein and the lipid bilayer. I wonder if the properties of the lipid bilayer might shift the thresholds for poke and/or stretch stimuli and if structural elements that do not appear to have a major role in mechanosensation in a mammalian cell (e.g., BLD) might be very influential in a lipid environment that more closely resembles that of a plant?

      Conversely, is it possible that OSCA channels are not poke-sensitive in plant cells? These questions are beyond the scope of your study, but they might be a nice addition to your discussion.

      The reviewer poses a great question. Electrophysiological approaches for studying plant mechanosensitive channels suffer the limitation of not being able to fully reconstitute the environment of a plant cell. To be able to patch the cell, the cell wall needs to be disposed of, which eliminates the tension generated from this structure onto the membrane. In that sense, performing these assays in plant cells or another system would not give us a fully accurate picture of the physiological thresholds of these channels. Given this limitation, we performed our study with mammalian cells given our expertise with them. Like the reviewer, we are also intrigued by the effect of different membrane compositions on the behavior of OSCA channels and how these channels will behave under physiological conditions, but we agree with the reviewer that these questions are out of the scope of our work. To address this point, in line #294 we have added: “It is also important to note that the membrane of a plant cell contains a different lipid composition than that of HEK293 cells used in our assays, and thus these lipids, or the plant cell wall, may alter how these channels respond to physiological stimuli.”

      Line 313 For structural studies, human codon-optimized OSCA3.1. Could you please clarify what this means?

      We have changed the phrase to “For structural studies, the OSCA3.1 (UniProt ID: Q9C8G5) coding sequence was synthesized using optimized codons for expression in human cells and subsequently cloned into the pcDNA3.1 vector” in line #327 to clarify this sentence.

      As a final comment, in the methods you use references to previously published work. I would strongly encourage you to replace these with experimental details.

      We understand the reviewer’s argument. However, this article falls under eLIFE’s Research Advances and will be linked to the original published work to which we reference the method. As suggested in the guidelines for this type of article, we only described the methods that were different from the original paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) In line 85, provide C-alpha r.m.s.d. values for the structural alignment among OSCA3.1, OSCA1.1, and OSCA1.2 protomers.

      As requested, we have added the C-alpha RMSD in line #86.

      (2) In line 90, should the figure reference to Fig. 1d be Fig. 1e?

      We thank the reviewer for catching this error. We have corrected it in the manuscript.

      (3) In lines 89-94, what putative lipid is it resolved in the OSCA3.1 pore? Can the authors assign the lipid identity? Is this the same or different from the lipids resolved in OSCA1.2, OSCA1.1, and TMEM63?

      In the model, we have built the lipid as palmitic acid to represent a lipid tail, but the resolution in this area makes it difficult to ascertain the identity of said lipid, hence we cannot compare to lipids in other orthologs.

      (4) In lines 115-121, the authors describe the presence of AHs and their functional roles in MscL and TMEM16. It will be more informative if the authors can add figures to show the structure of MscL and highlight the analogous AH. In addition, the current Supplementary Fig. 6 is not informative so it should be improved. It is not clear to the reviewer why that stretch of helix in TMEM16 is equivalent or analogous to the AH in OSCAs, either sequence alignment or a detailed structural alignment is helpful to address this point. Also, in lines 120-121, it says this helix in TMEM16 "does not present amphipathic properties", please show the sequence or amphipathicity of the helix.

      We thank the reviewer for the feedback on this figure. Supplementary Fig. 6 has been thoroughly modified to address the reviewer’s concerns. We now include a panel showing the structure of MscL and its amphipathic helix. We have modified the alignment of OSCA3.1 to a TMEM16 homolog to make clearer the homologous positioning of the helices in question and zoom in to show their sequences.

      (5) In discussion, lines 249-257, the authors referred to a recent study that suggested three evolutionarily coupled residue pairs located on BLD and TM6b. The authors speculate that the reason they did not observe a significant effect of channel response to poke/stretch stimuli in the BLD swapping between OSCA1.2 and 3.1 is due to the 2 of 3 salt bridges remaining for the residue pairs. To test the importance of these residue pairs and their coupling for channel gating, instead of swapping the entire BLD, can the authors systematically mutate the residue pairs, disrupt the salt-bridge interactions, and analyze the effect on channel response to mechanical force?

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (6) The reviewer suggests the authors tone down the elaboration of polymodal activation of OSCA by membrane poking and stretch.

      We believe the idea of polymodal activation is sufficiently toned down as we only postulate it as a possibility and following we give an alternative explanation based on methodological limitations: “Nonetheless, the discrepancy could be due to inherent methodological differences between these two assays, as whole-cell recordings during poking involve channels in inaccessible membranes (at the cell-substrate interface) and channel interactions with extracellular and intracellular components, while the stretch assay is limited to recording channels inside the patch.”

      (7) In lines 81-83, the authors described the BLD as showing increased flexibility, and the EM map at this region is less well resolved for registry assignment. In the method for cryo-EM image processing and Supplementary Fig. 1, the authors only carried out 3D refinement and classification at the full channel level. Have the authors attempted to do focus refinement or classification at the BLD domain in order to improve the local resolution or to sort out conformational heterogeneity? The reviewer suggests doing so because the BLD domain is a hot spot that the authors have proposed to play an important role in OSCA mechanosensation. Conformational changes identified in this region might provide insights into its role in the channel function.

      We thank the reviewer for this suggestion. We have performed focused classification on the BLD with and without surrounding regions and, in our hands, it did not improve the resolution or provide further insights.

      Reviewer #3 (Recommendations For The Authors):

      Here are a few specific minor corrections that should be addressed

      (1) In lines 117-135, in the discussion of Figure 2, the data shows an apparent increase in the poking threshold to gate W75K/L80E. The substantial increase in the depth required to gate the channel suggests that these channels are less sensitive to poking. Would it be possible to compare the depth at which these two patches show activity and the depth at which the other 22 cells ruptured? Line 161 mentions that the rupture threshold of HEK cells is close to the gating of OSCA3.1 at 13.8 µm.

      The distance just before the cell ruptured in 22 cells with no response was 12.5 +/- 2.5 um. The distance at which the cells ruptured was 0.5 um more (13 +/- 2.5 n=22). We have added this last value in line #137.

      (2) Would it be possible in Figures 2 panels b and c, 3, and figure 4 to label the WT as WT OSCA1.2?

      We thank the reviewer for pointing this out. We agree this modification will improve the clarity of the figures and have changed the figures to follow the reviewer’s suggestion.

      (3) Can you provide a western blot of the mutations described in Figure 2? This would provide insight into the amount of protein at the cell surface and available to respond to poking, the stretch data shows that these channels are in the membrane but does not show if they are in the membrane in similar quantities.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (4) The functional differences between the two channels are projected to be tied to several distinct point mutations, however, the data could be strengthened by additional point mutations at all sites to show that the phenotypes are due to the mutations specifically not just any mutation in the region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, we discovered several erroneous duplicate values in our source data sets from figures S1, 2, 4, and 8, due to mistakes from MATLAB analysis. We have re-analyzed the data and corrected these errors; since limited values in each data set changed, the results were unaffected. The changes are reflected in updated figures and source data.

      Overall, the reviewers gave a positive assessment of our work, but had reservations about:

      (1) Specifics of the iGluSnFR data and analysis

      (2) Overstatement/oversimplification of the importance of syt7 and Doc2

      (3)The strength and interpretation of the EM data 4) The relevance and parametrization of the modeling data

      (1) We have clarified aspects of the iGluSnFR data and analysis in the point-by-point response, as well as in the manuscript.

      (2) We have toned down our statements about the role of syt7 and Doc2 throughout, and emphasized that the DKO data are conclusive and reveal that there must be additional Ca2+ sensors for AR. We have also added to the discussion, noting syt3 as a strong candidate to perform a function analogous to syt7 (to regulate docking), along with another protein (or proteins) performing a role similar to Doc2 (directly in fusion) that has not been identified as a candidate in the field yet.

      (3) We feel the EM data are consistent with the model as much as they could be, and while a sequence of events can only be inferred from time-resolved EM, we believe our work falls in the scope of reasonable interpretation. However, upon reexamining the terminology of ‘feeding’ and related discussion, we realized this could be misleading, so these sections have been revised.

      (4) We have improved the description and interpretation of the model in the manuscript and provide a detailed rationale of our approach in the point-by-point-response.

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) It is surprising the optical GluSnFR approach reports so much asynchronous release in control hippocampal neurons after single stimuli (36% of release). This seems much higher than what is observed at most synapses, where asynchronous release is usually less than 5% of the initial response to the first evoked stimuli. Any thoughts on why the GluSnFR approach reports such a high level of asynchronous release? Could the optical approach be slower in activation kinetics in some cases, which artificially elevates the asynchronous aspect of fusion? This seems to be the case, given electrophysiology recordings in Figure 3 show the asynchronous release component as ~10% in controls at the 1st stimuli (panel C).

      The reported proportion of asynchronous release from cultured hippocampal neurons varies, contingent upon a range of factors (calcium concentration, how asynchronous release is quantified, etc). However, we would argue that there is considerable evidence for a higher percentage of asynchronous release (more than the <5% indicated by the referee) at synapses in the hippocampus. In our previous work on Doc2 using electrophysiology in cultured hippocampal neurons (Yao et al., 2011, Cell), it was noted that there is an approximate 25% incidence of asynchronous release after a single action potential. Furthermore, Hagler and Goda also reported a 26% ratio of asynchronous neurotransmitter release, also from cultured hippocampal neurons (Hagler and Goda, 2001, J Neurophysiol.).

      We also point out that another study using iGluSnFR to measure synchronous/asynchronous release ratios, with more sophisticated stimulation, imaging, and analysis procedures than ours, found an average ratio of synchronous to asynchronous release that is in-line with our values, with considerable variability among individual boutons (Mendonça et al., 2022; 25% asynchronous release after a single action potential). We feel that iGluSnFR is actually the superior approach (barring specialized e-phys preparations that can measure quantal events at individual small synapses; please see Miki et al., 2018), as it directly measures the timing of individual release events at individual boutons. By comparison, in most electrophysiology experiments there is a large peak of synchronous release from many synapses. iGluSnFR also bypasses postsynaptic considerations such as receptor kinetics and desensitization, or asynchronous release being poorly aligned to AMPA receptors, per a recent study of ours (Li et al., 2021), and a study showing 25% of asynchronous release occurs outside the active zone (Malagon et al., 2023). All these factors could obscure asynchronous release or otherwise make it difficult to measure by electrophysiology. To our knowledge, the approach in Miki et al., 2018 best bypasses these limitations, though the data in that study are from exceptionally fast and synchronous cerebellar synapses, and so cannot be directly compared to our findings. Thus, it is possible that iGluSnFR can report more asynchronous release than electrophysiological recordings, but this may actually reflect real biology.

      This being said, after considering the reviewer’s points we realized that our analysis method likely underestimates the total amount of synchronous release when using the high-affinity sensor (Figure 1). We quantify release by ‘events’ (that is, peaks), which does not take into account multiquantal peaks resulting from near-simultaneous multivesicular release. We have previously determined by quantal analysis that most synchronous peaks after a single action potential are multiquantal, while for asynchronous release there are still multiquantal events but they are in the minority (Vevea et al., 2021; Mendonça et al., 2022). So, in our data sets, the total amount of synchronous release is underestimated more so than asynchronous release. Thus, 37% asynchronous release is probably an overestimate, which explains the 12% difference compared to Mendonça et al., 2022, who used sophisticated quantal analysis (though that study also was performed at room temperature, which could also cause differences). We have now pointed this out in the text:

      “This ratio of synchronous to asynchronous release is likely an underestimate, since our analysis only counts the number of peaks (‘events’) and does not take into account multiquantal peaks resulting from near-simultaneous multivesicular release. We have previously determined by quantal analysis that most synchronous peaks are multiquantal after a single action potential, while for AR there are still multiquantal events but they are in the minority (Vevea et al., 2021). So, in our measurements, the total amount of synchronous release is underestimated; sophisticated quantal analysis using the A184V iGlusnFR recently found the percentage of total release that is AR to be ~25%, with otherwise similar results to ours (Mendonça et al., 2022) . Nonetheless, this approach faithfully distinguishes synchronous from asynchronous release…”

      However, while this method underestimates total synchronous release, it does not misclassify synchronous events as asynchronous because of kinetics. Even the slower iGluSnFR variant does not have a rise time that would misrepresent a synchronous event as asynchronous (Marvin et al., 2018). Mendonça et al (2022) note that averaged iGluSnFR traces for the A184V are biphasic, with the transition from fast to slow component occurring around 10 ms. These authors also determined that the temporal resolution of glutamate imaging is actually limited by the frame rate, not the biosensor, and based on simulations found that detection time was biased in their data to be about 1 ms earlier than the actual timing of release events.

      The reviewer’s final point about Figure 3 is a misunderstanding, as these are data from iGluSnFR, not electrophysiology. The asynchronous proportion in these experiments is ~10% because, as noted in the manuscript, we used a faster, lower-affinity variant of iGluSnFR in train stimulation experiments (Figure 2). In contrast to the high-affinity sensor, as explained above, in our analysis this variant would be expected to underestimate the amount of asynchronous release because it fails to detect many uniquantal release events (presumably those further from the focal plane, with too little fluorescence to reach our detection threshold) as evidenced by the fact that the apparent mini rate is much lower as measured by this sensor compared to higher-affinity variants. Since synchronous peaks are mostly multiquantal after a single action potential, while asynchronous peaks are mostly uniquantal, a fraction of release going undetected results in mostly smaller synchronous peaks, which are counted the same in our analysis while many asynchronous peaks are missed entirely. We have added a bit more clarification in the text to avoid confusion on this point:

      “This sensor underestimates the fraction of AR (~10% of total release for a single action potential) as compared to the A184V variant used above that overestimates the fraction of AR (~35% of total release for a single action potential). This is because it is less sensitive and misses many uniquantal events; as discussed above, our analysis quantifies release by number of peaks, and most synchronous peaks are multiquantal after a single action potential, while most AR peaks are uniquantal (Vevea et al., 2021). Still, the S72A variant reported the same phenotypes as the A184V variant after the first action potential (Fig. 3B, C).”

      As discussed above, we think the synchronous-to-asynchronous ratio is actually harder to determine with electrophysiology, and the preparations are different (acute slice vs dissociated culture); still, our electrophysiological measurements are in line with the iGluSnFR data: 29% for Figure 2 and 26% from the first action potential of Figure 4. These values also agree with the findings from Yao et al. (2011) and Hagler and Goda (2001), discussed above.

      Finally, the ultimate goal of our study was to measure the effects of deleting Doc2 and syt7 on synchronous and asynchronous release, not to measure the exact ratio between the two. If iGluSnFR greatly misreported synchronous events as asynchronous, we would expect the results from the knockouts to diverge between our imaging and electrophysiology data, which they do not. We have also previously applied this approach to syt1 knockouts, showing the characteristic desynchronization of release (Vevea et al., 2020). Furthermore, the high-affinity and low-affinity iGluSnFR variants, which as discussed above in our analysis overestimate and underestimate the fraction of release that is asynchronous, respectively, both reported the same phenotypes.

      (2) In the acute hippocampal physiology traces, it looks like the effect on cumulative release in Doc2A mutants only appears around ~40 msec after stimulation. This is a relatively late phase of asynchronous release. Any reason this effect does not show up sooner, where most asynchronous fusion events occur, or is this due to some technical aspects of the physiology clamp that masks earlier components?

      The reviewer is correct, although the curves actually diverge at around 30 ms (see image below). This can be attributed to the fact that the EPSCs in our recordings are broad, probably because of the large number of different synaptic inputs captured in our stimulation and recording paradigm (note that the currents are also quite large), resulting in a broad spread in the timing of release. That is to say, synchronous release is likely still occurring fairly late into the trace, obscuring any changes in asynchronous release earlier than 30 ms. This is not related to Doc2 specifically, as the EGTA charge transfer curve also diverges from the control curve at the same time. This EGTA control gives us confidence that our broad EPSCs still faithfully report synchronous and asynchronous release, even if the exact timing is spread-out to some extent.

      Author response image 1.

      (3) How do the authors treat multi-vesicular release in their synchronous/asynchronous quantification? It was not clear from the methods section. Many of the optical traces show dual peaks - are those that occur in the 10 ms bin assigned to synchronous and those outside to asynchronous? Are the authors measuring the area of the response or just the peak amplitude for the measurements? The methods seem to indicate peak amplitude, but asynchronous is better quantified with area measurements for electrophysiology.

      This is an excellent point by the reviewer, and in the Methods we now explicitly state how we treat multivesicular release/multiple peaks in our analysis. Release timing is assigned based on peak timing, including when there are multiple peaks at the same bouton.

      “Timing of release was determined based on the frame in which the signal peaked, including for dual peaks in the case of synchronous and asynchronous release at the same bouton.”

      Regarding the comparison to area measurements for electrophysiology, we agree with the reviewer, which is why we used such an approach for our electrophysiological data. However, a key advantage of iGluSnFR is the ability to resolve individual quantal events (or, as is often the case for synchronous release, simultaneous multiquantal events), so temporal binning of the peaks is the appropriate analysis approach regarding these data. This is comparable to the analysis used for electrophysiology recordings of responses from single small synapses, which also detects individual quantal events, where release timing is calculated as the latency between the stimulus and the beginning of each EPSC (Miki et al., 2018).

      This leaves the general concern that multiple vesicle fusions at the same bouton that occur milliseconds apart could blur together and make it more difficult to accurately determine release timing, particularly with the slower sensor used in the single-stim experiments in Figure 1. We believe this is not a major concern, since we also performed experiments with the much faster sensor, S72A which can resolve peaks from 100 Hz stimulation (Marvin et al., 2018). Furthermore, while the peak-calling method we used is crude by comparison, the synchronous/asynchronous ratio we report is similar to that of Mendonça et al. (2022) who used a higher frame rate and deconvolution to produce more easily distinguishable quanta when synchronous and asynchronous release occur at the same bouton after the same action potential.

      (4) It would be relevant to show that calcium binding mutations in Syt7 do not support SV docking/capture in the current assays, given some evidence for Syt7 calcium-independent activities has been reported in the field.

      To our knowledge, when using the correct mutations to block calcium binding, none of the reported syt7 knockout phenotypes (including those reported by our laboratory in Liu et al., 2014) have ever been rescued. However, this does not formally rule out a calciumindependent role in transient docking. For the EM data, we originally considered including rescue experiments with normal and non-calcium binding mutants of both syt7 and Doc2 in our study. However, our EM approach is spectacularly expensive and labor-intensive and such experiments would as much as triple the amount of EM work in the study. We plan on doing such experiments, and there is a great deal of additional structure-function work to be done on both these proteins. We feel that reassessing the calcium binding mutants with iGluSnFR and zap-andfreeze falls into the scope of this future work. For now, this as a limitation of the current study.

      (5) The authors are not consistent in how they describe the role of the two proteins in asynchronous release, with the reader often drawing the impression that these two proteins solely mediate this aspect of SV fusion. As the authors note, some synapses do not require Syt7 or Doc2 for SV release, indicating different asynchronous sensors or molecular components at distinct brain synapses. Indeed, asynchronous release is only reduced, not eliminated, in the double mutants the authors report, so other components are at play even in these hippocampal synapses. The authors should be more consistent in noting this in their text, as the wording can be confusing as noted below:

      "Together, these data further indicated that AR after single action potentials is driven by Doc2α, but not syt7, in excitatory mouse hippocampal synapses."

      "after a single action potential, Doc2α accounts for 54-67% of AR at hippocampal excitatory synapses, whereas deleting syt7 has no effect."

      "This, along with our finding that syt7/Doc2a DKOs still had remaining AR, raises the possibility that there are other unidentified calcium sensors for AR."

      We have made adjustments throughout to not overstate the role of syt7 and Doc2, including at the locations the reviewer points out. This is an important point from the reviewer, and not just to avoid misleading readers. It is itself interesting; in the original manuscript we should have emphasized, far more than we did, that the DKO experiments strongly point to asyet-unidentified proteins being involved in asynchronous release. This has been rectified in the revised text: we now emphasize that another calcium sensor for asynchronous release is likely present at all relevant points in the manuscript.

      (6) Given the authors' data, I don't think it's fair to say "raises the possibility" of other AR sensors, as almost 50% of AR remained in the Doc2A mutant in some of the experimental approaches. Clearly, other AR calcium sensors or molecular components are required, so better to just state that in the 1st paragraph of the discussion with something like: "Given syt7/Doc2a DKOs still had remaining AR, further work should explore the diversity of synaptic Ca2+ sensors and how they contribute to heterogeneity in synaptic transmission throughout the brain."

      We agree; this was poor phrasing on our part. We meant to imply that there may be proteins that have not even been considered, because it is also technically possible that the remaining asynchronous release is supported by the known machinery (i.e., syt1). We have changed “raises the possibility” to “indicates”.

      Minor points:

      (1) Remove "on" from the abstract sentence "Consequently, both synchronous and asynchronous release depress from the second pulse on during repetitive activity".

      We have changed “on” to “onward” to reduce ambiguity.

      (2) Shouldn't syt7 be Syt7 and syt1 be Syt1 when referring to the proteins?

      To our knowledge there is not a hard-and-fast convention for non-acronym mouse protein abbreviations. The technically correct full name is lowercase, so we find it reasonable to use lowercase for the abbreviation.

      (3) Both calcium and Ca2+ are used in the manuscript - better to stick to one term throughout.

      We thank the referee for catching this error; we now use only “Ca2+” throughout our study.

      Reviewer #2 (Recommendations For The Authors):

      (1) While the GluSnFR experiments appear to be well done, what is striking is the relatively small and "jagged" fluorescent responses. Are the authors concerned that they are missing many fast (with peaks occurring within 10 ms) synchronous events and incorrectly identifying them asynchronous? If this is not a concern, why not?

      With respect to the small raw responses, this is the nature of measuring individual quanta from individual boutons while imaging at 100 Hz, even with the excellent signal-to-noise ratio of the iGluSnFR variants we used.

      As far as kinetics, as noted in the response to Reviewer 1 point #1, even the slower iGluSnFR variant has a rise time fast enough that it cannot misrepresent a synchronous event as asynchronous (Marvin et al., 2018). This threshold for iGluSnFR has been used by others: see Mendonça et al., 2022, who note that averaged iGluSnFR traces are biphasic, with the transition from fast to slow component occurring around 10 ms. The ‘jaggedness’ is in large part due to the frame rate (100 Hz); Mendonça et al., 2022 used 250 Hz and deconvolution to produce smoother, cleaner traces, but still achieved similar results to us.

      Finally, we reiterate what we wrote in response to Reviewer 1 point #1: “the ultimate goal of our study was to measure the effects of deleting Doc2 and syt7 on synchronous and asynchronous release, not to measure the exact ratio between the two. If iGluSnFR misreported synchronous events as asynchronous, we would expect the results from the knockouts to diverge between those data and our electrophysiology data, which they do not. We have also previously applied this approach to syt1 knockouts, showing the characteristic desynchronization of release (Vevea et al., 2020). Also, the phenotypes reported by the faster and slower iGluSnFR variants were identical. ”

      (2) On page 6, I'm not sure I would agree that short-term plasticity is "so catastrophically disrupted". It is probably enough to say that plasticity is disrupted in the ko.

      We argue that syt7 knockout causes the most severe phenotype specific to short-term plasticity so far described (that is, without affecting initial release probability), but we have changed “catastrophically” to “strongly”.

      (3) Differences in the post-stim number of "docked" vesicles between conditions are, in absolute numbers, very small. For example, it seems that the number of docked vesicles goes from ~ 2.2 prior to stimulation, to ~ 1.5 in the first 5 ms window following stimulation. While this number may be statistically significant, I worry about bias and sampling errors. It is comforting that images are randomized prior to analysis. Nevertheless, the differences are very small and this should be explicitly acknowledged.

      This ~40% decrease in number of docked vesicles in dissociated cultured hippocampal neurons has been consistent throughout all our studies using flash-and-freeze and zap-and-freeze electron microscopy (Watanabe et al., 2013; Kusick et al., 2020, Li et al., 2021), as well as those of other labs (Chang et al., 2018). Statistically, 40% is far beyond the limit to detect differences between samples with 200-300 synapses quantified per condition and an average of ~2 docked vesicles per image. The low absolute number of docked vesicles per synaptic profile (since the 40 nm section only captures a portion of the active zone, which contain an average of 12 docked vesicles in total; Kusick et al., 2020) is not relevant except that it does reduce the statistical power to detect differences, but this is compensated for by the huge number of images we capture and annotate per sample. We are able to detect differences in fusion and endocytic pits (albeit with much less precision and sensitivity), such as the Doc2 phenotype in this study, even though these events are an order of magnitude rarer than docked vesicles. Biologically, in our view, a 40% reduction in all docked vesicles across all synapses, considering that the majority of synapses do not have even 1 vesicle fusion, after only a single action potential, is substantial. We have even been puzzled why there is such a large decrease, but as stated above this result has been consistent for a decade of using this approach. For comparison to the magnitude of baseline docking changes in mutants, this 40% is similar to the effect of deleting synaptotagmin 1 (Imig et al, 2014; Chang et al, 2018; note in Imig et al., considered a gold standard in the field, the average number of docked vesicles per tomogram is ~10, but there are fewer than 25 tomograms per sample, so the actual amount of sampling in our data set is slightly greater).

      (4) The related point is that how can one know about the "transient" nature of vesicle docking when the analysis is performed on completely different sections from different cells? Moreover, what does it mean that the docked granules have recovered or not recovered (abstract)? This should be explained in more detail.

      This is a fundamental difficulty of interpreting time-resolved electron microscopy data. We cannot observe a sequence of events at any given synapse, but only try to measure each time point as accurately as we can and interpret the data.

      By ‘recovery’ we simply mean that the number of docked vesicles at a given time point after stimulation is similar to the no-stimulation baseline. We have replaced ‘recovery’ in the abstract with ‘replenishment’ to avoid confusion.

      We now realize that in the context of this study the term ‘transient docking’ is confusing, since we only measured out to 14 ms in this study. In experiments with samples frozen at 5 ms, 14 ms , 100 ms, 1,s and 10 s, the return to baseline at 14 ms appears temporary, since samples frozen at 100 ms have a similar reduction of docked vesicles as those at 5 ms (Kusick et al., 2020). The number of vesicles again returns to baseline at 10 s, so we used the term ‘transient docking’ to distinguish the recovery at 14 ms from the slower and presumably permanent return to baseline that takes 10 s. The apparently temporary nature of this process is why we believe it contributes to facilitation, which likewise peaks soon after stimulation and decays over the course of ~100 ms.

      To make the transient docking terminology less confusing, we have removed the word ‘transiently’ from the title and added a clarification of what transient docking is when it is first mentioned:

      “vesicles can dock within 15 ms of an action potential to replenish vacated release sites and undock over the next 100 ms”

      As noted by the reviewer, such a sequence of events, where vesicles dock within 14 ms, then undock over the course of 100 ms, then dock again over the course of 10 s, is an inference, but is based on predictions from electrophysiological data and modeling (see Silva, Tran, and Marty, 2021 for review; those authors use the term ‘calcium-dependent docking’ but this refers to the same process), and as yet there is no way to directly observe vesicle dynamics at synapses down to nanometer resolution in live cells.

      On the reviewers recommendation we have removed references to syt7 ‘feeding’ vesicles from the abstract and the beginning of the “physiological relevance” section of the discussion. This phrasing could imply a direct molecular pipeline between syt7 and syt1/Doc2, which is a misrepresentation of our actual model that syt7 simply helps recruit docked vesicles.

      “These findings result in a new model whereby syt7 drives activity-dependent docking, thus providing synaptic vesicles for synchronous (syt1) and asynchronous (Doc2 and other unidentified sensors) release during ongoing transmission.”

      “In the case of paired-pulse facilitation it can supply docked vesicles for syt1-mediated synchronous release to enhance signaling; it likely functions in the same manner to reduce synaptic depression during train stimulation. In the case of AR, syt-7-mediated docked vesicles can be used by Doc2α, which then directly triggers this slow mode of transmission.”

      (5) In this study, docking is phenomenologically defined and, therefore, arbitrary; vesicles are defined as docked if there is no space between them and the plasma membrane. What happens if the definition is broadened to include some small distance between the respective membranes? Does the timecourse of "recovery" change?

      We always quantify at least all vesicles within 100 nm of the active zone; these data are shown in Figure S6D. We show only docking in the main figures because, consistent with our previous work and as stated in the text, we found no change in the number of vesicles at any distance from the plasma membrane at the active zone after stimulation, nor did we find any difference in the mutants. In our previous work on syt7 (Vevea et al., 2021) we quantified all the vesicles within the synapse and also found no differences after stimulation or in the KO further from the active zone.

      The reviewer is correct that the term ‘docking’ at synapses is often used quite arbitrarily; even among morphological studies the definition is inconsistent. We consider our strict docking definition that we explain in the manuscript (in high-pressure-frozen and freeze-substituted samples) of no visible distance between membranes to be less arbitrary, since only the number of these attached vesicles decreases after stimulation (Watanabe et al., 2013, Kusick et al., 2020, Li et al., 2021, this study) and in SNARE knockouts (Imig et al., 2014). Broadening the definition, as is done in some other studies (for example Chang et al., 2018), retains the effect, since the majority of vesicles within 10 nm are at ~0 nm, but again all that is actually changing is the number of vesicles at ~0 nm.

      (6) My overall impression is that this model is not adding much to the story. Specifically, the model was not fit to any data and has a huge number of states and free parameters given the dynamics that it is trying to capture (ie I think this is overkill). Many of the free parameters were arbitrarily constrained with little to no justification and there was minimal parameter space exploration, in part because the model wasn't being quantitatively constrained to any data. While advertised to be a 3-state model, there is a combinatorial explosion of substates by distinguishing between levels of calcium occupancy simultaneously in three separate calcium sensors so that one ends up with 9 empty states, 9 tethered states, and 45 docked states for a total of 63 distinguishable states. At 63 states and 21 free parameters, one could of course model just about any dynamics imaginable. But the relatively simple dynamics of AR and its perturbation by removal of Doc2 and Syt7 can likely be captured with far fewer states and parameters (such as Neher's recent proposal). Specifically, starting with the Neher ES-LS-TS model along with adding a transient labile docked state affected by Syt7 and Doc2 (TSL in Neher nomenclature), I wonder if the authors could more or less capture what they are observing during stimulus trains. The advantage of a minimal model is that readers don't have to struggle with fairly elaborate systems of differential equations and parameter plots to get a feel for what's going on. Especially since the point of this model is to develop intuition rather than to capture with physical accuracy exactly what is transpiring at a docked vesicle (which would require many more details excluded from the current model).

      We would like to thank the reviewer for pointing out unclarities and mistakes in the description of the model. We have worked on improving on these points. We now more elaborately explain why we have made certain assumptions and what decisions we have made to constrain the parameter values in the model. As the reviewer points out other models might also work in explaining the dynamics of the experimental data presented in this paper. Thus, we agree that it is unlikely that this theory and model implementation is the only one that can account for the observations. With this model we aimed to investigate whether the theory proposed based on the experimental data could indeed reproduce the dynamics that are observed experimentally. In the section below we will briefly explain why we made different decisions in constructing the model to comment on the reviewer’s concerns. We will also discuss more precisely what adjustments we have made to the model’s description to improve its readability and be open about its limitations.

      One of the main concerns of the reviewer is that the model has many states and free parameters, some of which are poorly constrained. We agree that the model indeed contains many states. However, in essence, the model corresponds to a two-step docking model, in which SVs get tethered to an empty release site and subsequently dock/prime in a fusion-competent state. This structure of the model corresponds to the ES-LS-TS model (Neher and Brose 2018, Neuron) mentioned by the reviewer or the replacement-docking model (Miki et al., 2016, Neuron). As the reviewer points out, by making the transition rates calcium-dependent in those models, we would indeed be able to capture similar dynamics with these models as with ours. However, instead of directly implementing calcium-dependent rates, we let the rates depend on the number of calcium ions bound to syt7, Doc2 and Syt1. We decided to do so, as some information on the calcium binding dynamics of these proteins is available. By simulating the calcium binding to the proteins explicitly we could integrate this knowledge into our model. Moreover, by explicitly simulating calcium-binding to these proteins, we included the time it takes before a new steady state-binding occupancy is reached after a change of calcium levels. Especially for Ca2+ sensors with slow kinetics such as, syt7 and Doc2, this is crucial. These properties are highly relevant for asynchronous release (which we quantified as the release >5 ms after onset of AP). The consequence is that because of combinatorics (e.g., if we assume 5 calcium ions to bind to syt1 and 2 to Doc2 this leads to 24 different states), explicit simulation of all relevant states extends the number of potential different states a vesicle can be in. In the main text of the manuscript, we added this explanation on why we decided on the structure of the model as it is presented and discussed it in context of other previous models.

      Our decision to simulate calcium binding to syt1, syt7 and Doc2 also increased the number of parameters in our model. As the reviewer points out, the large number of parameters in our model compared to the relative low number of features in the experimental behavior the model is compared to – is a limitation. However, after thorough exploration of the model, we are certain that the model cannot create any type of desired dynamics. The large number of parameters does make it possible that different combinations of parameter values would lead to similar responses, as can be seen in the parameter space exploration in Figure S9. This means that our modelling effort does not provide estimates of parameter values. We now mention this explicitly in the discussion section of the model. Some of the parameter values we were able to constrain based on previous literature (10 parameters), others were more arbitrary set (8 parameters), and some of them were adjusted to match the experimental data closely (7 parameters). We indicated more clearly now in Supplementary Table 3 to which category each parameter value belongs in table. We determined the values of the model parameters through a manual exploration of the parameter space. One of the main reasons why we decided not to perform a fitting of the model to data obtained in this work is that the obtained parameters would not be informative (e.g., multiple combinations of parameters will lead to similar results). We agree with the reviewer that a direct quantitative comparison between model predictions and experimental data obtained by fitting would be nice. However, fitting the model to experimental data would be close to impossible computationally. This is in part because of the large number of states, but mainly due to the large number of APs that need to be simulated. Especially since the transients in our model have slow and fast parts (the decay of the residual Ca2+-transient, and the peak of the local Ca2+transient), the model is challenging to solve with ODE solvers available in Matlab, even when using a high-performance computer system optimized for parallel computation (32 cores). Moreover, fitting the model to experimental data would require the addition of extra assumptions and parameters to the model. As the experiments are performed using different samples, different parameter settings are probably required (e.g. it is likely that the number of release site or the fusion probability differs between cultured hippocampal neurons and hippocampal slices). Additionally, if we decide to fit the model, we would need to define a cost function (i.e., a quantitative measure of how well the model is fitting to experimental data), which requires us to determine the different weights the different experiments we are comparing our model predictions to have. The decision on how to weight the different types of data is very difficult (not to say arbitrary).

      Therefore, we constrained the parameter values in our model based on a manual (but systematic) exploration of the parameter space. The simulations of the model were evaluated based on the increase in the number of docked vesicles between 5 and 15 ms after AP stimulation (this should be as large as possible for the control and Doc2- model, and close to 0 for the syt7- model simulations), the peak release rates in response to the first AP (to be equal between all conditions), the ratio between the peak release rate of the 1st and 10th response (depressive phenotype should be more prominent in the syt7- model simulation and the least in the Doc2- simulation), and the amount of asynchronous release (syt7- and Doc2- simulations should have approximately half of the total amount of asynchronously released vesicles compared to the control simulations). Moreover, the parameter values for the calcium transient should be realistic. We do not know the exact parameter values of the calcium transient in the samples used in the experiments performed here, but previous studies have provided a range of realistic parameter values (Brenowitz and Regehr 2007, PMID: 17652580; Helmchen et al., 1998, PMID: 9138591; Sabatini and Regehr 1998, PMID: 9512051; Wang et al., 2008, PMID: 19118179). Furthermore, we decided to set the parameters describing calcium binding to syt7 and Doc2 to the same values, as the scope of the model was to investigate the role of syt7 and Doc2 in asynchronous release when they act on different steps in the reaction scheme. By using the same parameter values both proteins are identical except for their mechanism of action. We added this section to the methods of the manuscript.

      In the parameter space evaluation, we decided to vary parameters one-by-one or in pairs of two. We decided not to further extend the parameter space evaluation as it will be challenging to give a proper interpretation of these results, to visualize them, and to simulate it (computationally expensive).

      (7) The graphics, equations, and nomenclature all need some work. The equations aren't numbered or indexed, so I can't really refer to any of them in particular, but the symbols being used generally were not defined well enough for a naïve reader to follow. The 15 diffEQs compressed into a single expression at the bottom of page 19 are basically impenetrable. The 'equation' near the bottom of p. 20 is not an equation - it is a set of four symbols lacking a definition. The fusion rate equation (with f1 and f2 factors) isn't spelled out clearly enough (top of p. 20). Can fusion occur from any of the 45 docked states but just with a different probability? Or does fusion only occur from the 3 states where Doc2+Syt1 Ca occupancy = 5? The graphical representation of Syt7 occupancy and its effects in Fig S7 doesn't work well. Tons of color and detail but very hard to decipher and intuit what Syt7 is doing to the SV buried in the arrow lengths. And this is a crucial point of the paper - it really needs to shine through in this figure.

      We thank the reviewer for pointing out the unclarities in the description of the model. We have worked on improving this section. Specifically, we have improved the equations and now more clearly explain the symbols used in these equations. We have altered the graphical representation of the effect of calcium binding to syt7 on docking and undocking rates.

      (8) I would strongly recommend abandoning this large-scale soft modeling effort altogether, but if the authors feel that all the states and parameters are absolutely required, they need to justify this point, define all symbols systematically, number all equations, and provide some evidence of actual data fitting, systematic parameter space exploration, and more exposition of why they are making the various assumptions and constraints that were used to lower the number of free parameters. For instance, why are the tethering and untethering (or docking and undocking) rate constants set to equal each other? And why is it assumed that Syt7 enhances both the docking and undocking rates? Why is fusion set to occur as long as the sum of Syt1 and Doc2 calcium occupancy is exactly 5 regardless of the specific occupancy of either Syt1 or Doc2? Again probably quite important but unjustified physically. Given the efforts of this model to capture some sort of realistic calcium liganding by Syt1, Syt7, and Doc2, the model doesn't seem to take into account the copy number of each protein at a release site. Shouldn't it matter if there are 2 Syt7s vs 20 Syt7s? Or the stoichiometry between Doc2 and Syt1? Either this model assumes that there is exactly one copy of each protein at a release site or that all copies are always identically liganded and strictly act as a unit. Neither of these possibilities seems plausible.

      Despite the fact that this model (as all models) is a simplified version of reality and despite the fact that this model (as all models) has its limitations, we decided to keep the model in our work to illustrate that this well-defined hypothesis put forth in this paper is consistent with the experimental data. Again, we are not claiming that this model is the only one that may explain this, nor do we claim that we have uniquely identified its parameters. As indicated above, we worked on improving the description of the model in the methods and improved on our description of how the parameter values are constrained. For the reasons mentioned above (first and foremost because of infeasibility due to excessive computation time) we did not perform data fitting or changed the parameter space exploration. We would like to thank the reviewer for pointing out that some of the assumptions of the model are not well enough explained. We added an extra explanation of these assumptions to the main text.

      One of the assumptions we made, as the reviewer points out, is that the tethering and untethering and docking and undocking rates constants are set to equal each other. This is indeed an arbitrary assumption, with the main aim of reducing the number of free parameters in our model given that there is currently no experimental constraint on the relation between the two rate constants. We agree that this assumption is as good as any other, and we have pointed this out more clearly in the main text.

      In the model syt7 enhances both docking and undocking rates as we assumed it to function as a catalyst of the docking reaction. A catalyst lowers the energy barrier for the reaction and thereby promotes both forward and backward rates. One of the main reasons we decided on this is because in the model also syt1 and Doc2 are assumed to function by lowering the energy barrier for the fusion reaction. However, since fusion is irreversible this would only affect the forward reaction rate. We cannot exclude that syt7 acts on the forward rate only, which we now mention in the results section of the model.

      In our model fusion can occur from any possible docked SV state. The probability of fusion however increases the more calcium ions are bound to Doc2 or Syt1, with Syt1-bound to Calcium being more effective in promoting fusion. This structure matches the dual-sensor model proposed by Sun et al., 2007, Science (PMID: 18046404) and Kobbersmed et al. 2020, Elife (PMID: 32077852), and is based on the assumption that each protein bound to calcium lowers the energy barrier with a certain amount. We have explained this more in the results section of the model.

      We decided that syt1 and Doc2 together could have no more than five calcium ions bound to them. This is based on the idea that syt1 and Doc2 are competing for the same type of resources, which could for instance be a limited number of SNARE complexes that are available to execute the reaction. An indication for competition between the two proteins can be found in the synchronous release amplitudes after stimulus 2, which are larger in the Doc2KO.

      The reviewer rightfully points out that for realistic simulations of the role of syt1, syt7 and Doc2 the stoichiometry of these proteins at the release site is relevant. In the ideal scenario, we would have included this in our model. However, this would massively increase the possible number of states (which this reviewer criticizes already in our simpler model), making the model even more computationally expensive to run. Additionally, we currently have no reliable estimates of the number of syt7 and Doc2 molecules per release site. In our model, all syt1s expressed on an SV can bind up to five calcium ions. We have recently shown that this simplified model can capture the features of all syt1 proteins per vesicle that compete for the binding of three substrates on the plasma membrane to exert their function in speeding up fusion (Kobbersmed et al., 2022 eLife PMID: 35929728). This means that the copy number is indirectly covered in our model. This number of five calcium ions (and two for Doc2 and syt7) however is not based on the estimated number of syt1s on an SV (which would be around 15, Takamori 2006), but rather on the calcium-dependence of the fusion reaction. Similarly, the number of two calcium ions binding to Doc2 is based on the Calcium-dependence of asynchronous fusion rates (Sun et al., 2007). Based on the reviewer’s comment we now more explicitly mention in the text that the numbers of calcium ions binding to syt1, Doc2 and syt7 corresponds to the total number of calcium ions that can bind to each of these molecules per release site/SV.

      We again would like to thank the reviewer for asking us to improve the explanation on the assumptions made to construct our model and how we constrained the parameter values in our model.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are pleased to send you a revised version of our manuscript entitled “voyAGEr: free web interface for the analysis of age-related gene expression alterations in human tissues” and the associated shiny web app, in which we incorporate the referees’ feedback. We would like to express our gratitude for their time and valuable insights, which have contributed to the improvement of our work. We appreciate the rigorous evaluation process that eLife maintains.

      In this letter, we address each of the reviewers' comments and concerns, point-by-point, offering detailed responses and clarifications. We have made several revisions to our manuscript following their recommendations.

      We must note that the revised version of the manuscript has two novel joint first authors, Rita Martins-Silva and Alexandre Kaizeler, who performed all the requested reanalyses, given that the initial first author, Arthur Schneider, already left our lab. We must also point to the following minor unsolicited improvements we took the opportunity to make:

      • Added a comprehensive tutorial to the GitHub repository on how to navigate through voyAGEr’s features.

      • Implemented sample randomisation in the scatter plots depicting gene expression across the age axis to ensure data privacy.

      • Implemented minor adjustments within the web app to enhance user comprehension and clarity when visualizing the data.

      • Improved clarity of the methodological sections.

      Reviewer 1

      (1.1) While this may be obvious to others for some reason that escaped me, I was unsure what was the basis for the authors' choice of 16 years as the very specific sliding window size. If I'm not alone in this, it might add clarity for other readers and users if this parameter choice were explained and justified more explicitly.

      We apologise for our omission in providing the rationale behind our choice in the previous version. We chose 16 years as our sliding window size because this was the minimum needed to guarantee the presence of more than one sample per window, across all the tissues considered in the study (Figure R1 below).

      We added the following sentence to the manuscript (v. Methods, ShARP-LM):

      “This was the minimum age span needed to guarantee the presence of more than one sample per window, across all considered tissues.”

      (1.2) "In particular, tissue-specific periods of major transcriptional changes in the fifth and eighth decades of human lifespan have been revealed, reflecting the so-called digital aging and consistently with what is observed in mice" here I think that "consistently" should be "consistent".

      We thank the reviewer for the comment and following the suggestion, we have revised 'Consistently' to 'consistent' as it is the correct usage in our sentence.

      (1.3) "On a different note, sex biases have been reported in for the expression of SALL1 and KAL1 in adipose tissue and lung, respectively." Here I think that "in for" should be "in".

      As recommended by the reviewer, we have replaced ‘in for’ for ‘in’. As we substituted KAL1, the current sentence now stands as “On a different note, sex biases have been reported in the expression of SALL1 and DDX43 in adipose tissue and lung, respectively”.

      (1.4) "We downloaded the matrix with the RNA-seq read counts for each gene in each GTEx v7 sample from the project's data portal (https://www.gtexportal.org/)." In my pdf manuscript this hyperlink appears to be broken.

      We appreciate the reviewer's attention to the broken link, and we have rectified the issue. The link should now be fully operational, effectively directing users to the GTEx Portal.

      (1.5) Under methods, I might suggest "Development platform" or "Development platforms" over "Development's platform" as a heading.

      We have modified the heading of this section in the methods to 'Development Platforms', as we believe it better reflects the information conveyed.

      Reviewer 2

      (2.1) In this tool/resource paper, it is crucial that the data used is up-to-date to provide the most comprehensive and relevant information to users. However, the authors utilized GTEx v7, which is an outdated (2016) version of the dataset. It is worth noting that GTEx v8 includes over 940 individuals, representing a 35% increase in individuals, and a 50% increase in the total number of samples. The authors should check the newer versions of GTEx and update the data.

      When the development of the voyAGEr web application began, GTEx version 7 was the most up to date. Nevertheless, we agree that the version 8 offers a notably more extensive dataset, encompassing a larger number of individuals, samples, and introducing new tissues. Consequently, we have updated our application to incorporate the data from GTEx version 8.

      (2.2) The authors did not address any correction for batch effects or RNA integrity numbers, which are known to affect transcriptome profiles. For instance, our analysis of GTEx v8 Cortex tissue revealed that after filtering out lowly expressed genes, in the same way authors did, PC1 (which accounts for 24% of the variation) had a Spearman's correlation value of 0.48 (p<6.1e-16) with RNA integrity number.

      We acknowledge the validity of the reviewer’s comment and appreciate the importance of such corrections to enhancing data interpretation. In response, we conducted a thorough unbiased investigation into potential batch effects, with the COHORT variable emerging as the primary driver of those observed across most tissues. Furthermore, SMRIN (as the reviewer pointed), DTHHRDY, MHSMKYRS and the number of detected genes in each sample were consistently associated with the primary sources of variation. As a result, we implemented batch effect correction for those five conditions, in a tissue-specific manner.

      We provide a detailed explanation of the batch effect correction methodology and its importance in the biological interpretation of results in the Methods section, specifically under "Read count data pre-processing". Additionally, we have included two new supplementary figures, Sup. Figures 7 and 8, to illustrate a batch effect example in lung tissue and emphasise the critical role of this correction in data interpretation.

      (2.3) The data analyzed in the GTEx dataset is not filtered or corrected for the cause of death, which can range from violent and sudden deaths to slow deaths or cases requiring a ventilator. As a result, the data may not accurately represent healthy aging profiles but rather reflect changes in the transcriptome specific to certain diseases due to the age-related increase in disease risk. While the authors do acknowledge this limitation in the discussion, stating that it is not a healthy cohort and disease-specific analysis is not feasible due to the limited number of samples, it would be useful for users to have the option to analyze only cases of fast death, excluding ventilator cases and deaths due to disease. This is typically how GTEx data is utilized in aging studies. Alternatively, the authors should consider including the "cause of death" variable in the model.

      This comment is closely related to the prior discussion (point 2.2). Notably, two of the covariates selected for batch effect correction, namely, DTHHRDY (Death classification based on the 4-point Hardy Scale1) and COHORT (indicating whether the participant was a postmortem, organ, or surgical donor1), have a direct relevance to this issue, i.e., both relate to the cause of death of the individual.

      1 According to the nomenclature of variables described in https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/ GetListOfAllObjects.cgi?study_id=phs000424.v9.p2&object_type=variable

      We therefore effectively account for their influence on gene expression, mitigating these factors' impact.

      This approach represents a compromise, as it is practically infeasible to ascertain the absence of underlying health conditions in the remaining samples, even if only considering cases of “fast death”. Hence, we opted to keep all samples, independently of the cause of death of its donor, to dilute potential effects associated with individual causes of death.

      (2.4) The age distribution varies across tissues which may impact the results of the study. The authors' claim that age distribution does not affect the outcomes is inconclusive. Since the study aims to provide cross-tissue analysis, it is important to note that differing age distributions across tissues can influence the overall results. To address this, the authors should conduct downsampling to different age distributions across tissues and evaluate the level of tissue-specific or common changes that remain after the distributions are made similar.

      We acknowledge that variations in age distributions are evident across different tissues, with brain tissues displaying a notably pronounced disparity (green density lines in Figure R2 below).

      To address this issue comprehensively, we conducted tissue-specific downsampling, by reducing the number of samples in a given age window to the minimum available sample size within all age windows for a given tissue. The histograms (density plots) of the number of samples per age window of 16 years considered in the ShARP-LM model, as well as the minimum number of samples in each age window, per tissue are illustrated in Figure R1. After performing downsampling, we computed the logFC and p-value of differential expression for each gene, per age window, and compared them (for all genes in a given age window) with those involving all samples.

      Despite changes in logFC with downsampling, a considerable positive correlation is maintained (Figure R3, top panel). This suggests that the overall trends in gene expression changes persist. However, the downsampling process expectedly results in a decrease of statistical power within each age window concomitant with the decreased sample size, evident from the shift of genes from the third to the first quadrant in Figure R3, bottom panel. Consequently, we have opted for maintaining results encompassing all samples and removing the paragraph in the Discussion that asserted the absence of age distribution impact on the overall outcomes (“Indeed, we found no confounding between the distribution of samples’ ages and the trend of gene expression progression over age in any tissue.”), as we deem it inaccurate, potentially leading to misinterpretation. We have added a supplementary figure (Supplementary Figure 8, identical to Figure R3) illustrating the effect of downsampling, and the following paragraph to the manuscript’s Discussion section:

      “When downsampling to ensure a balanced age distribution, a loss of statistical power is apparent but a considerable positive correlation with the original results is maintained and a substantial number of significant alterations remain so (Supplementary Figure 8).”

      We acknowledge that this limitation can be addressed with the growing accumulation of human tissue transcriptomes in publicly available databases, a trend we anticipate in the near future. We are committed to promptly updating voyAGEr with any new data releases that may offer a solution to this concern.

      Nonetheless, we want to underscore, as the reviewer has astutely pointed out, that while voyAGEr can facilitate cross-tissue comparisons, it must be done with caution. In this regard, we inserted the following paragraph into the Discussion:

      “Due to the tissue-specific nature of the pre-processing steps (v. Read count data preprocessing in the Methods section), and given that most of the plotted gene expression distributions are centred and scaled by tissue, it is important to note that voyAGEr may not be always suited for direct comparisons between different tissues. For instance, it does not allow to directly ascertain if a gene exhibits different expression levels in different tissues or if the expression of a particular gene in one tissue changes more drastically with age than in another tissue.”

      (2.5) The GTEx resource is extremely valuable, however, it comes with challenges. GTEx contains tissue samples from the same individuals across different tissues, resulting in varying degrees of overlap in sample origin across tissues as not all tissues are collected for all individuals. This could affect the similar/different patterns observed across tissues. As this tool is meant for broader use by the community, it is crucial for the authors to either rule out this possibility by conducting a cross-tissue comparison using a non-parametric model that accounts for the dependency between samples from the same individual, or to provide information on the degree of similarity between samples so that the users can keep this possibility in mind when using the tool for hypothesis generation.

      We agree that the variable degrees of overlap between tissues (Figure R4) could lead to a confounding between trends in a population of common individuals and those associated with age. We therefore examined the contributions of variables 'donor,' 'tissue,' and 'age' to the overall variance in the data (Figure R5, panel A), having normalised the data collectively across all tissues. Tissue and donor contribute approximately 90% and 10% of the variance, respectively. Age exhibits minimal impact (around 1%), which may be attributed to the relative subtlety of its effects on gene expression and to the tissue specificity of ageing-associated changes. Notably, removing the 'donor' variable does not transfer this variance to 'age', suggesting a limited confounding between these variables (see Figure R5, panel B).

      We also specifically examined the pairs of tissues exhibiting the lowest (Brain Amygdala / Small Intestine), median (Pancreas / Heart Left Ventricle), and highest (Kidney Cortex / Muscle Skeletal) percentages of shared donors. We identified and selectively removed samples from shared donors while maintaining the original sample size imbalance between tissues. Subsequently, we calculated each gene’s mean expression within each age window from the ShARP-LM pipeline, followed by each gene’s Pearson’s correlation of expression between tissue pairs. The resulting coefficients, both with and without the removal of common donors, were compared in scatter plots (Figure R6, left plots). As this process inherently involves downsampling, which may impact results (v. comment 2.4), we performed additional downsampling by randomly removing samples from both tissues according to the proportions defined for the removal of common donors (Figure R6, right plots).

      In the chosen scenarios, we note a similar impact between the targeted removal of common donors and random downsampling. Nevertheless, the effects of removing samples may vary according to the absolute number of remaining samples. Consequently, singling out individual cases may not provide conclusive insights. To systematically address this, we represented all tissue pairs in a heatmap, colour-coded based on whether the removal of common donors is more impactful (red) or less impactful (blue) than random downsampling (Figure R7). The values depicted in the heatmap, denoted as the Impact of Common Donors (ICD), are computed for each tissue pair. This calculation involves several steps: first, we determined the absolute difference in Pearson’s correlation for each gene’s mean expression within each age window from the ShARP-LM pipeline, between the original data and the subset of data without common donors (DiffWoCD) or with random downsampling (DiffRD). Subsequently, the medians of DiffWoCD and DiffRD are computed, and the difference between these median values provides the ICD for each tissue pair. Due to the unidirectional nature of correlation (i.e., the results for tissue 1 vs tissue 2 mirror those for tissue 2 vs tissue 1), the resulting matrix is triangular in form.

      We have added a supplementary figure (Supplementary Figure 4, a composition of Figures R4-R7, together with a scatterplot relating the values of heatmaps R4 and R7) that aims to provide guidance to users when interpreting specific tissue pairs, acknowledging inherent limitations (refer to comment 2.4). We have also inserted the following paragraph into the manuscript’s Discussion section:

      “Furthermore, we must emphasise that the majority of GTEx donors contributed samples to multiple tissues (Supplementary Figure 4A), potentially introducing biases and confounders when comparing gene expression patterns between tissues. Our analyses of variance (Supplementary Figure 4B) and downsampling to control for common donors (Supplementary Figures 4C-E) suggest very limited global confounding between the impacts of donor and age on gene expression and that any potential cross-tissue bias not to depend much on the proportion of common donors (Supplementary Figure 4E). However, this effect must be taken into account when comparing specific pairs of tissues (e.g., Colon – Transverse and Whole Blood, Supplementary Figure 4D).”

      (2.6) The authors aimed to create an open-source and ever-evolving resource that could be adapted and improved with new functionality. However, this goal was only partially achieved. Although the code for the web app is open source, crucial components such as the statistical tests or the linear model are not included in the repository, limiting the tool's customizability and adaptability.

      We greatly appreciate the reviewer’s concern and share their commitment to maintaining the principles of openness, reproducibility, and adaptability for voyAGEr. voyAGEr was primarily designed as a visualisation tool, displaying pre-processed results, and indeed only the code for the Shiny app itself was accessible through the project's GitHub repository.

      To address this shortcoming, we have made the entire data preprocessing script publicly available in the GitHub repository of voyAGEr. This script encompasses, among others, filtration, normalisation, batch effect correction, the ShARP-LM pipeline and statistical tests employed, and module definition. Moreover, the web app itself offers functionality to export relevant plots and tables.

      (2.7) Furthermore, the authors' choice of visualization platform (R shiny) may not be the best fit for extensibility and open-source collaboration, as it lacks modularity. A more suitable alternative could be production-oriented platforms such as Flask or FastAPI.

      We appreciate this thoughtful concern. The decision to use Shiny was primarily driven by our data having already been prepared in the R environment during pre-processing steps. Consequently, and as the web app serves the purpose of visualisation only (and not data processing), Shiny is as a natural and convenient extension of our scripts, enabling data visualisation seamlessly.

      We acknowledge that Shiny may lack the modularity required for optimal open-source collaboration. While we recognise the merits of alternative platforms like Flask or FastAPI, we decided to keep Shiny because the current iteration of voyAGEr offers significant value to the community. Transitioning to a different platform would be a time-consuming endeavour, that would postpone the release of such resource.

      However, the reviewer’s feedback regarding modularity and open-source collaboration is duly noted and highly valuable. We will certainly take it into account when developing new web applications within our laboratory.

      (2.8) To facilitate collaboration and improve the tool's adaptability, data resulting from the preprocessing pipeline should be made publicly available. This would make it easier for others to contribute and extend the tool's functionality, ultimately enhancing its value for the scientific community.

      As outlined in point 2.6 of this rebuttal letter, certain metadata used in our analysis are subject to restricted access. To address this, we have taken several measures to foster transparency and reproducibility of our analyses. First, we have made the scripts for data pre-processing publicly available, along with a comprehensive explanation of our methodology within the main manuscript. This empowers users to replicate our analyses and provides a foundation for those interested in contributing to the tool's development. Furthermore, we have created new issues on voyAGEr’s GitHub repository, outlining novel features and improvements we envision for the application in the future. We actively encourage users to engage with this section.

      (2.9) It is unfortunate that the manuscript has no line numbers, which makes pointing out language issues or typos cumbersome. Below are some minor typos present in the current version mostly due to inconsistent usage of British vs US English, and the authors would be advised to do a thorough proofreading for the final submission.

      • Page 12: Inconsistent spelling of "analyzed" and "analysed". Should be "analyzed", since US English is used throughout the rest of the paper.

      • Page 14: "randomised"

      • Page 15: "emphasise"

      We apologise for it and include line numbers in the revised version. We have opted for British English and corrected the manuscript accordingly.

      (2.10) Some figures in the supplemental material have a low resolution (e.g. S. Fig 5). Especially figures that are not based on screenshots would ideally be of a higher resolution.

      As voyAGEr is designed as a web application for visualisation, it is inherent that some screenshots of the final resource may have lower resolutions. In response to this concern, we re-generated the figures in this manuscript with a resolution that maintains clarity and readability. We also recreated figures not derived from screenshots, further improving their resolution.

      We saved all figures in PDF format and are sending them together with this letter and the revised manuscript, to address any potential issues related to low-resolution figures that may occur during the export of the Word document.

      <(2.11) In Fig. 1 in the bottom row the sex labels are hard to see.

      We have adapted the figure to address this concern.

      (2.12) Math symbols and equations are not well formatted. For example, the GE equation on p. 13, or Oiij equation should be properly typeset. Also, the Oiij notation might be confusing, I believe the authors meant to use a capital "I", i.e. OI_ij.

      We have incorporated these recommendations into the revised manuscript.

      (2.13) The Readme file in the git repo is very short. It would be helpful to have build and run instructions.

      We have updated the README file in the GitHub repository, which now contains, among other features, instructions for launching the Shiny app and building the associated Docker image. Additionally, a simple tutorial has also been included to assist users in navigating through voyAGEr's functionalities.

      (2.14> "Module" tab's UI inconsistent to other tabs (i.e. "Gene" and "Tissue"), since it contains an "About" page. Adding the "About" page in the actual "Module" page might make the UI clearer.

      We believed that the Modules section, due to its distinct methodology, would benefit from an additional tab explaining its underlying rationale. We relate to the reviewer’s concern regarding the use of tabs throughout the application and made changes to the app in order to ensure consistency.

      (2.15) I would suggest changing the type of the article to "Tools and Resources".

      We agree and followed the reviewer’s suggestion.

      Reviewer 3

      (3.1) In the gene-centric analyses section of the result, to improve this manuscript and database, linear regression tests accounting for the entire range of age should be added. The authors' algorithm, ShARP-LM, tests locally within a 16-year window which makes it has lower power than the linear regression test with the whole ages. I suspect that the power reduction is strongly affected in the younger age range since a larger number of GTEx donors are enriched in old age. By adding the results from the lm tests, readers would gain more insight and evidence into how significantly their interest genes change with age.

      We are grateful for the reviewer's thoughtful and pertinent recommendation and have thus conducted linear regression tests covering the entire age range. The outcomes of these tests have been integrated into the web application, denoted by a dotted orange line on the 'Gene Expression Alterations Over Age' plots. Additionally, a summary of statistics of overall changes, encompassing pvalues, t-statistics, and logFC per year, has been included below the plot title. We have also updated the manuscript to include such changes (v. Methods, Gene-centric visualisation of tissue-specific expression changes across age):

      “We also applied a linear model across the entire age range, thereby providing users with more insight and supporting evidence into how a specific gene changes with age. For visualisation purposes, we incorporated a dashed orange line, with the logFC per year for the Age effect as slope, in the respective scatter plots (Figure 3B c). We depict the Sex effect therein by prominent dots on the average samples, with pink and blue denoting females and males, respectively.”

      Concerning the observation about the potential reduction in statistical power due to the limited number of samples in younger ages, we acknowledge its validity. Indeed, we have addressed this issue in the manuscript's Discussion (v. Supplementary Figure 6).

      (3.1) In line with the ShARP-LM test results, it is not clear which criterion was used to define the significant genes and the following enrichment analyses. I assume that the criterion is P < 0.05, but it should be clearly noted. Additionally, the authors should apply adjusted p-values for multiple-test correction. The ideal criterion is an adjusted P < 0.05. However, if none or only a handful of genes were found to be significant, the authors could relax the criteria, such as using a regular P < 0.01 or 0.05.

      We apologise for any confusion regarding the terminology "significant genes." Our choice to use nonadjusted p-values for determining the significance of gene expression changes with Age, Sex, and their interaction was deliberate, and we would like to clarify our reasoning:

      (1) In the "Gene" tab of the application, individual genes are examined. When users inquire about a specific gene, multiple-testing correction of the p-value does not apply.

      (2) In the "Tissue" tab, using adjusted p-values and a threshold of 0.05 yielded very few differentially expressed genes, limiting the utility of Peaks. Our objective therein is not to assess the significance of alterations in individual genes but to provide a metric for global alterations within a tissue. We then determine significance based on the False Discovery Rate (FDR), using the p-values as a nominal metric of gene expression alterations.

      To avoid using the concept of “differential expression”, commonly linked to significance, we now refer to 'altered genes' in both the manuscript and the app. For clarity and to align with voyAGEr's role as a hypothesis-generation tool, we define 'altered genes' as those with non-adjusted p-values < 0.01 or < 0.05, as discriminated in the Methods section.

      (3.3) In the gene-centric analyses section, authors should provide a full list of donor conditions and a summary table of conditions as supplementary.

      We appreciate the suggestion and we have now included a reference that directs readers to those data, alternatively to including this information as an additional supplementary table. We would like to emphasise that the web app includes information on donor conditions we hypothesise to affect gene expression.

      3.4) The tissue-specific assessment section has poor sub-titles. Every title has to contain information.

      We agree and revised the sub-titles to more accurately reflect the information conveyed in each corresponding section.

      (3.5) I have an issue understanding the meaning of NES from GSEA in the tissue-specific assessment section. The authors performed GSEA for the DEGs against the background genes ordered by tstatistics (from positive to negative) calculated from the linear model. I understand the p-value was two-tailed, which means that both positive and negative NES are meaningful as they represent up-regulated expression direction (positive coefficient) and down-regulated expression direction (negative coefficient) with age, respectively, within a window. However, in the GSEA section of Methods, authors were not fully elaborate on this directionality but stated, "The NES for each pathway was used in subsequent analyses as a metric of its over- or downrepresentation in the Peak". The authors should clearly elaborate on how to interpret the NES from their results.

      We added the following paragraph to the manuscript’s Methods section, in order to clarify the NES’ directionality:

      “We extracted the GSEA normalised enrichment score (NES), which represents the degree to which a certain gene set is overrepresented at the extreme ends of the ranked list of genes. A positive NES corresponds to the gene set’s overrepresentation amongst up-regulated genes within the age window, whereas a negative NES signifies its overrepresentation amongst down-regulated genes. The NES for each pathway was used in subsequent analyses as a metric of its up- or down-regulation in the Peak.”

      (3.6) In the Modules of co-expressed genes section, the authors did not explain how or why they selected the four tissues: brain, skeletal muscle, heart (left ventricle), and whole blood. This should be elaborated on.

      We apologise for not providing a detailed explanation for this selection. As the ‘Modules of coexpressed genes’ section was primarily intended as a proof of concept, we opted to include tissues for which we had a substantial number of samples available and availability of comprehensive cell type signatures, those being the tissues that met such criteria. Nonetheless, as the diversity of cell type signatures increases (e.g., through the increasing availability of scRNA-seq datasets), we plan to encompass a wider range of tissues in the near future. However, as this task is time-demanding and in order to avoid a substantial delay in the release of voyAGEr, we opted to approach this issue in the next version of the App and included a dedicated issue in the projects’ GitHub repository so that users can share their preferences of the next tissues to include.

      We also added a brief sentence in this regard to the Methods section of the manuscript:

      “The four tissues (Brain - Cortex, Muscle - Skeletal, Heart - Left Ventricle, and Whole Blood) covered by the Module section of voyAGEr were selected due to their relatively high sample sizes and availability of comprehensive cell type signatures. The increasing availability of human tissue scRNA-seq datasets (e.g., through the Human Cell Atlas) will allow future updates of voyAGEr to encompass a wider range of tissues.”

      (3.7) In the modules of the co-expressed genes section, the authors did not provide an explanation of the "diseases-manual" sub-tab of the "Pathway" tab of the voyAGEr tool. It would be helpful for readers to understand how the candidate disease list was prepared and what the results represent.

      We greatly appreciate the reviewer's feedback, and in response, we have restructured the 'Modules of co-expressed genes' method section to provide a more comprehensive explanation of the 'diseases' sub-section. To clarify, we obtained a curated set of diseases and their associated genes from DisGeNET v.7.0. We assessed the enrichment of modules in relation to these diseases through two methods: a manual approach utilising Fisher’s tests (i.e. comparing the genes of a given module with the genes associated with a given disease) and another through use of the disgenet2r package, employing the function disease_enrichment. Significance of these enrichments were determined by adjusting p-values using the Benjamini-Hochberg correction.

      (3.8) Most figures have low resolutions, and their fonts are too small to read.

      As already mentioned in issue 2.10, we have recreated all of the images with better resolution to enhance legibility. We also exported such figures in PDF, which we attach to this revision.

      (3.9) Authors used GTEx V7, which is not latest version. Although researchers have developed a huge amount of pipelines and tools for their research, most of them were neglected without a single update. I am sure many users, including myself, would appreciate it if the authors kept updating the database with GTEx V8 for the future version of the database.

      We express our gratitude to the reviewer for their valuable suggestion, and, as already explained in issue 2.1, we have incorporated GTEx V8 into voyAGEr.

      (3.10) I would like to have an option for downloading the results as a whole for gene, tissue, and coexpressed genes. This would be a great option for secondary analysis by users.

      The implementation of such feature would be a time-demanding endeavour that would delay the release of voyAGEr, and we therefore chose not to perform it for this version. However, we agree that it would be a good resource for secondary analyses and acknowledge the possibility of adding this feature in the future. For now, voyAGEr allows the user to download all plots and corresponding data.

      (3.11) How the orders of tissues in the heatmaps (both gene and tissue section) were determined? Did the authors apply hierarchical clustering? If not, I would recommend the authors perform the hierarchical clustering and add it to display the heatmap display.

      We apologise for the oversight in explaining the process behind determining the order of tissues. To clarify, we employed hierarchical clustering to establish the tissue order for visualisation within the app. Although the reviewer suggested adding a dendrogram to illustrate this clustering, we decided against it. The reason for such is that including a dendrogram, while informative, is not essential for the app's primary purpose.

      (3.12) I understand that this is a vast amount of work, but I hope that the authors can expand the coexpressed module analysis to include other tissues in the future version of the database.

      Knowing what co-expressed genes in line with aging are and their pathway and disease enrichments across tissues would be highly informative, and I'm sure many users, including myself, would greatly appreciate it. <br /> We express our gratitude to the reviewer for the valuable suggestion and for acknowledging the extensive effort required to incorporate new tissues into the module section. We completely agree that understanding co-expressed genes across the aging process is of significant value, and we are committed to the ongoing inclusion of additional tissues. As already stated in issue 3.6, comprehensive list of tissues slated for integration in future voyAGEr versions is readily available on voyAGEr’s GitHub repository.

      Author response image 1.

      Density plots (“smoothed” histograms) of the distribution of numbers of samples per moving age window for the ShARP-LM pipeline, categorised by tissue. The numerical value within each rectangle represents the minimum number of samples observed across all age windows for that particular tissue.

      Author response image 2.

      Density lines (“smoothed” histograms) of the distribution of the age of donors per tissue. As depicted in the chart, there are more samples for older ages, particularly of brain tissues.

      Author response image 3.

      Effect of downsampling in ShARP-LM results. A – Per tissue violin plots of gene-wide distributions of Pearson’s correlation coefficients between original and downsampled logFC values for the Age variable across age windows, with tissues coloured by and ordered by increasing percentage of downsampling-associated reduction in the number of samples. B – Density scatter plots of comparison of associated original and downsampled p-values for each tissue, coloured by the downsampling percentage in each age window, highlighting the low range of p-values (from 0 to 0.1). Despite changes in logFC with downsampling, a considerable correlation in significance is maintained, although downsampling naturally results in a loss of statistical power, evident by the shift of points towards the first quadrant (dashed lines: p-value = 0.05).

      Author response image 4.

      Heatmap depicting the percentage of common donors between pairs of tissues. A given square illustrates the percentage of all samples of tissue in the x axis (Tissue 1) that is in common with the tissue in the y axis (Tissue 2)

      Author response image 5.

      Assessment of the relative contributions of different sources to the dataset’s variance. A - tissue accounts for approximately 90% of the total variance, while donor contributes around 10%; age has a minimal impact (1%), likely due to the relative subtlety of its effects on gene expression and to the tissue specificity of ageing dynamics. B - Removal of the donor variable does not transfer variance to age, suggesting limited confounding between the two variables.

      Author response image 6.

      Impact of the relative proportion of common donors on gene expression correlation between tissue pairs. Panels A, B, and C showcase the tissue pairs with the highest (Muscle Skeletal / Kidney Cortex), median (Pancreas / Heart Left Ventricle), and lowest (Small Intestine / Brain Amygdala) percentages of common donors, respectively. The left panels illustrate gene-bygene Pearson’s correlations of gene expression between the two tissues, comparing the scenarios with (x-axis) and without (yaxis) the removal of common donors. The ri ght panels depict the same comparisons, but with random downsampling (y-axis) in both tissues based on the proportions defined for common donor removal. The depicted examples show that the outcomes are comparable when removing common donors or employing random downsampling.

      Author response image 7.

      Comparison of the impacts of removing common donor samples and random downsampling across tissue pairs. The heatmap is coloured based on whether the removal of common donors has a greater (red) or lesser impact (blue) than random downsampling. The values depicted in the heatmap, denoted as the Impact of Common Donors (ICD), are computed for each tissue pair. This calculation involves several steps: first, by determining the absolute difference in Pearson’s correlation for each gene’s mean expression within each age window from the ShARP-LM pipeline, between the original data and the subset of data without common donors (DiffWoCD) or with random downsampling (DiffRD). Subsequently, the medians of DiffWoCD and DiffRD are computed, and the difference between these median values provides the ICD for each tissue pair. Due to the unidirectional nature of correlation (i.e., the results for tissue 1 vs tissue 2 mirror those for tissue 2 vs tissue 1), the resulting matrix is triangular in form. Grey tiles denote NA values, i.e., where the tissue-tissue comparison does not have a meaning, namely self-self and between sex-specific tissues. Top right insert: density line (“smoothed” histogram) of all ICD values.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Summary:

      In the revised manuscript, the authors aim to investigate brain-wide activation patterns following administration of the anesthetics ketamine and isoflurane, and conduct comparative analysis of these patterns to understand shared and distinct mechanisms of these two anesthetics. To this end, they perform Fos immunohistochemistry in perfused brain sections to label active nuclei, use a custom pipeline to register images to the ABA framework and quantify Fos+ nuclei, and perform multiple complementary analyses to compare activation patterns across groups.

      In the latest revision, the authors have made some changes in response to our previous comments on how to fix the analyses. However, the revised analyses were not changed correctly and remain flawed in several fundamental ways.

      Critical problems:

      (1) Before one can perform higher level analyses such as hiearchal cluster or network hub (or PC) analysis, it is fundamental to validate that you have significant differences of the raw Fos expression values in the first place. First of all, this means showing figures with the raw data (Fos expression levels) in some form in Figures 2 and 3 before showing the higher level analyses in Figures 4 and 5; this is currently switched around. Second and most importantly, when you have a large number of brain areas with large differences in mean values and variance, you need to account for this in a meaningful way. Changing to log values is a step in the right direction for mean values but does not account well for differences in variance. Indeed, considering the large variances in brain areas with high mean values and variance, it is a little difficult to believe that all brain regions, especially brain areas with low mean values, passed corrections for multiple comparisons test. We suggested Z-scores relative to control values for each brain region; this would have accounted for wide differences in mean values and variance, but this was not done. Overall, validation of anesthesia-induced differences in Fos expression levels is not yet shown.

      (a) Reordering the figures.

      Thank you for your suggestion. We have added Figure 2 (for 201 brain regions) and Figure 2—figure supplement 1 (for 53 brain regions) to demonstrate the statistical differences in raw Fos expression between KET and ISO compared to their respective control groups. These figures specifically present the raw c-Fos expression levels for both KET and ISO in the same brain areas, providing a fundamental basis for the subsequent analyses. Additionally, we have moved the original Figures 4 and 5 to Figures 3 and 4.

      (b) Z-score transformation and validation of anesthesia-induced differences in Fos expression.

      Thank you for your suggestion. Before multiple comparisons, we transformed the data into log c-Fos density and then performed Z-scores relative to control values for each brain region. Indeed, through Z-score transformation, we have identified a larger number of significantly activated brain regions in Figure 2. The number of brain regions showing significant activation increased by 100 for KET and by 39 for ISO. We have accordingly updated the results section to include these findings in Line 80-181. Besides, we have added the following content in the Statistical Analysis section in Line 489: "…In Figure 2 and Figure 2–figure supplement 1, c-Fos densities in both experimental and control groups were log-transformed. Z-scores were calculated for each brain region by normalizing these log-transformed values against the mean and standard deviation of its respective control group. This involved subtracting the control mean from the experimental value and dividing the result by the control standard deviation. For statistical analysis, Z-scores were compared to a null distribution with a zero mean, and adjustments were made for multiple comparisons using the Benjamini–Hochberg method with a 5% false discovery rate (Q)..…".

      Author response image 1.

      KET and ISO induced c-Fos expression relative to their respective control group across 201 distinct brain regions. Z-scores represent the normalized c-Fos expression in the KET and ISO groups, calculated against the mean and standard deviation from their respective control groups. Statistical analysis involved the comparison of Z-scores to a null distribution with a zero mean and adjustment for multiple comparisons using the Benjamini–Hochberg method at a 5% false discovery rate (p < 0.05, p < 0.01, **p < 0.001). n = 6, 6, 8, 6 for the home cage, ISO, saline, and KET, respectively. Missing values resulted from zero standard deviations in control groups. Brain regions are categorized into major anatomical subdivisions, as shown on the left side of the graph.

      Author response image 2.

      KET and ISO induced c-Fos expression relative to their respective control group across 53 distinct brain regions. Z-scores for c-Fos expression in the KET and ISO groups were normalized to the mean and standard deviation of their respective control groups. Statistical analysis involved the comparison of Z-scores to a null distribution with a zero mean and adjustment for multiple comparisons using the Benjamini–Hochberg method at a 5\% false discovery rate (p < 0.05, p < 0.01, **p < 0.001). Brain regions are organized into major anatomical subdivisions, as indicated on the left side of the graph.

      (2) Let's assume for a moment that the raw Fos expression analyses indicate significant differences. They used hierarchal cluster analyses as a rationale for examining 53 brain areas in all subsequent analyses of Fos expression following isoflurane versus home cage or ketamine versus saline. Instead, the authors changed to 201 brain areas with no validated rationale other than effectively saying 'we wanted to look at more brain areas'. And then later, when they examined raw Fos expression values in Figures 4 and 5, they assess 43 brain areas for ketamine and 20 brain areas for isoflurane, without any rationale for why choosing these numbers of brain areas. This is a particularly big problem when they are trying to compare effects of isoflurane versus ketamine on Fos expression in these brain areas - they did not compare the same brain areas.

      (a) Changing to 201 brain areas with validated rationale.

      Thank you for your question. We have revised the original text from “To enhance our analysis of c-Fos expression patterns induced by KET and ISO, we expanded our study to 201 subregions.” to Line 100: "…To enable a more detailed examination and facilitate clearer differentiation and comparison of the effects caused by KET and ISO, we subdivided the 53 brain regions into 201 distinct areas. This approach, guided by the standard mouse atlas available at http://atlas.brain-map.org/atlas, allowed for an in-depth analysis of the responses in various brain regions…". For hierarchal cluster analyses from 53 to 201 brain regions, Line 215: "…To achieve a more granular analysis and better discern the responses between KET and ISO, we expanded our study from the initial 53 brain regions to 201 distinct subregions…"

      (b) Compare the same brain areas for KET and ISO and the rationale for why choosing these numbers of brain areas in Figures 3 and 4.

      We apologize for the confusion and lack of clarity regarding the selection of brain regions for analysis. In Figure 2 and Figure 2—figure supplement 1, we display the c-Fos expression in the same brain regions affected by KET and ISO. In Figures 3 and 4, we applied a uniform standard to specifically report the brain areas most prominently activated by KET and ISO, respectively. As specified in Line 104: "…Compared to the saline group, KET activated 141 out of a total of 201 brain regions (Figure 2). To further identify the brain regions that are most significantly affected by KET, we calculated Cohen's d for each region to quantify the magnitude of activation and subsequently focused on those regions that had a corrected p-value below 0.05 and effect size in the top 40% (Figure 3, Figure 3—figure supplement 1)…" and Line 142: "…Using the same criteria applied to KET, which involved selecting regions with Cohen's d values in the top 40% of significantly activated areas from Figure 2, we identified 32 key brain regions impacted by ISO (Figure 4, Figure 4—figure supplement 1).…".

      Moreover, we illustrate the co-activated brain regions by KET and ISO in Figure 4C. As detailed in Lines 167-180:"…The co-activation of multiple brain regions by KET and ISO indicates that they have overlapping effects on brain functions. Examples of these effects include impacts on sensory processing, as evidenced by the activation of the PIR, ENT 1, and OT2, pointing to changes in sensory perception typical of anesthetics. Memory and cognitive functions are influenced, as indicated by the activation of the subiculum (SUB) 3, dentate gyrus (DG) 4, and RE 5. The reward and motivational systems are engaged, involving the ACB and ventral tegmental area (VTA), signaling the modulation of reward pathways 6. Autonomic and homeostatic control are also affected, as shown by areas like the lateral hypothalamic area (LHA) 7 and medial preoptic area (MPO) 8, emphasizing effects on functions such as feeding and thermoregulation. Stress and arousal responses are impacted through the activation of the paraventricular hypothalamic nucleus (PVH) 10,11 and LC 12. This broad activation pattern highlights the overlap in drug effects and the complexity of brain networks in anesthesia…". Below are the revised Figures 3 and 4.

      (1) Chapuis, J. et al. Lateral entorhinal modulation of piriform cortical activity and fine odor discrimination. J. Neurosci. 33, 13449-13459 (2013). https://doi.org:10.1523/jneurosci.1387-13.2013

      (2) Giessel, A. J. & Datta, S. R. Olfactory maps, circuits and computations. Curr. Opin. Neurobiol. 24, 120-132 (2014). https://doi.org:10.1016/j.conb.2013.09.010

      (3) Roy, D. S. et al. Distinct Neural Circuits for the Formation and Retrieval of Episodic Memories. Cell 170, 1000-1012.e1019 (2017). https://doi.org:10.1016/j.cell.2017.07.013

      (4) Sun, X. et al. Functionally Distinct Neuronal Ensembles within the Memory Engram. Cell 181, 410-423.e417 (2020). https://doi.org:10.1016/j.cell.2020.02.055

      (5) Huang, X. et al. A Visual Circuit Related to the Nucleus Reuniens for the Spatial-Memory-Promoting Effects of Light Treatment. Neuron (2021).

      (6) Al-Hasani, R. et al. Ventral tegmental area GABAergic inhibition of cholinergic interneurons in the ventral nucleus accumbens shell promotes reward reinforcement. Nat. Neurosci. 24, 1414-1428 (2021). https://doi.org:10.1038/s41593-021-00898-2

      (7) Mickelsen, L. E. et al. Single-cell transcriptomic analysis of the lateral hypothalamic area reveals molecularly distinct populations of inhibitory and excitatory neurons. Nat. Neurosci. 22, 642-656 (2019). https://doi.org:10.1038/s41593-019-0349-8

      (8) McGinty, D. & Szymusiak, R. Keeping cool: a hypothesis about the mechanisms and functions of slow-wave sleep. Trends Neurosci. 13, 480-487 (1990). https://doi.org:10.1016/0166-2236(90)90081-k

      (9) Mullican, S. E. et al. GFRAL is the receptor for GDF15 and the ligand promotes weight loss in mice and nonhuman primates. Nat. Med. 23, 1150-1157 (2017). https://doi.org:10.1038/nm.4392

      (10) Rasiah, N. P., Loewen, S. P. & Bains, J. S. Windows into stress: a glimpse at emerging roles for CRH(PVN) neurons. Physiol. Rev. 103, 1667-1691 (2023). https://doi.org:10.1152/physrev.00056.2021

      (11) Islam, M. T. et al. Vasopressin neurons in the paraventricular hypothalamus promote wakefulness via lateral hypothalamic orexin neurons. Curr. Biol. 32, 3871-3885.e3874 (2022). https://doi.org:10.1016/j.cub.2022.07.020

      (12) Ross, J. A. & Van Bockstaele, E. J. The Locus Coeruleus- Norepinephrine System in Stress and Arousal: Unraveling Historical, Current, and Future Perspectives. Front Psychiatry 11, 601519 (2020). https://doi.org:10.3389/fpsyt.2020.601519

      Author response image 3.

      Brain regions exhibiting significant activation by KET. (A) Fifty-five brain regions exhibited significant KET activation. These were chosen from the 201 regions analyzed in Figure 2, focusing on the top 40\% ranked by effect size among those with corrected p values less than 0.05. Data are presented as mean ± SEM, with p-values adjusted for multiple comparisons (p < 0.05, p < 0.01, **p < 0.001). (B) Representative immunohistochemical staining of brain regions identified in Figure 3A, with control group staining available in Figure 3—figure supplement 1. Scale bar: 200 µm.

      Author response image 4.

      Brain regions exhibiting significant activation by ISO. (A) Brain regions significantly activated by ISO were initially identified using a corrected p-value below 0.05. From these, the top 40% in effect size (Cohen’s d) were further selected, resulting in 32 key areas. p-values are adjusted for multiple comparisons (p < 0.01, *p < 0.001). (B) Representative immunohistochemical staining of brain regions identified in Figure 4A. Control group staining is available in Figure 4—figure supplement 1. Scale bar: 200 µm. Scale bar: 200 µm. (C) A Venn diagram displays 43 brain regions co-activated by KET and ISO, identified by the adjusted p-values (p < 0.05) for both KET and ISO. CTX: cerebral cortex; CNU: cerebral nuclei; TH: thalamus; HY: hypothalamus; MB: midbrain; HB: hindbrain.

      Less critical comments:

      (3) The explanation of hierarchical level's in lines 90-95 did not make sense.

      We have revised the section that initially stated in lines 90-95, "…Based on the standard mouse atlas available at http://atlas.brain-map.org/, the mouse brain was segmented into nine hierarchical levels, totaling 984 regions. The primary level consists of grey matter, the secondary of the cerebrum, brainstem, and cerebellum, and the tertiary includes regions like the cerebral cortex and cerebellar nuclei, among others, with some regions extending to the 8th and 9th levels. The fifth level comprises 53 subregions, with detailed expression levels and their respective abbreviations presented in Supplementary Figure 2…". Our revised description, now in line 91: "…Building upon the framework established in previous literature, our study categorizes the mouse brain into 53 distinct subregions1…"

      (1) Do JP, Xu M, Lee SH, Chang WC, Zhang S, Chung S, Yung TJ, Fan JL, Miyamichi K, Luo L et al: Cell type-specific long-range connections of basal forebrain circuit. Elife 2016, 5.

      (4) I am still perplexed by why the authors consider the prelimbic and infralimbic cortex 'neuroendocrine' brain areas in the abstract. In contrast, the prelimbic and infralimbic were described better in the introduction as "associated information processing" areas.

      Thank you for bringing this to our attention. We agree that classifying the prelimbic and infralimbic cortex as 'neuroendocrine' in the abstract was incorrect, which was an oversight on our part. In the revised version, as detailed in line 167, we observed an increased number of brain regions showing overlapping activation by both KET and ISO, which is depicted in Figure 4C. This extensive co-activation across various regions makes it challenging to narrowly define the functional classification of each area. Consequently, we have revised the abstract, updating this in line 21: "…KET and ISO both activate brain areas involved in sensory processing, memory and cognition, reward and motivation, as well as autonomic and homeostatic control, highlighting their shared effects on various neural pathways.…".

      (5) It looks like overall Fos levels in the control group Home (ISO) are a magnitude (~10-fold) lower than those in the control group Saline (KET) across all regions shown. This large difference seems unlikely to be due to a biologically driven effect and seems more likely to be due to a technical issue, such as differences in staining or imaging between experiments. The authors discuss this issue but did not answer whether the Homecage-ISO experiment or at least the Fos labeling and imaging performed at the same time as for the Saline-Ketamine experiment?

      Thank you for highlighting this important point. The c-Fos labeling and imaging for the Home (ISO) and Saline (KET) groups were carried out in separate sessions due to the extensive workload involved in these processes. This study processed a total of 26 brain samples. Sectioning the entire brain of each mouse required approximately 3 hours, yielding 5 slides, with each slide containing 12 to 16 brain sections. We were able to stain and image up to 20 slides simultaneously, typically comprising 2 experimental groups and 2 corresponding control groups. Imaging these 20 slides at 10x magnification took roughly 7 hours, while additional time was required for confocal imaging of specific areas of interest at 20x magnification. Given the complexity of these procedures, to ensure consistency across all experiments, they were conducted under uniform conditions. This included the use of consistent primary and secondary antibody concentrations, incubation times, and imaging parameters such as fixed light intensity and exposure time. Furthermore, in the saline and KET groups, intraperitoneal injections might have evoked pain and stress responses in mice despite four days of pre-experiment acclimation, which could have contributed to the increased c-Fos expression observed. This aspect, along with the fact that procedures were conducted in separate sessions, might have introduced some variations. Thus, we have included a note in our discussion section in Line 353: "…Despite four days of acclimation, including handling and injections, intraperitoneal injections in the saline and KET groups might still elicit pain and stress responses in mice. This point is corroborated by the subtle yet measurable variations in brain states between the home cage and saline groups, characterized by changes in normalized EEG delta/theta power (home cage: 0.05±0.09; saline: -0.03±0.11) and EMG power (home cage: -0.37±0.34; saline: 0.04±0.13), as shown in Figure 1–figure supplement 1. These changes suggest a relative increase in brain activity in the saline group compared to the home cage group, potentially contributing to the higher c-Fos expression. Additionally, despite the use of consistent parameters for c-Fos labeling and imaging across all experiments, the substantial differences observed between the saline and home cage groups might be partly attributed to the fact that the operations were conducted in separate sessions.…"

      Reviewer #3 (Public Review):

      The present study presents a comprehensive exploration of the distinct impacts of Isoflurane and Ketamine on c-Fos expression throughout the brain. To understand the varying responses across individual brain regions to each anesthetic, the researchers employ principal component analysis (PCA) and c-Fos-based functional network analysis. The methodology employed in this research is both methodical and expansive. Notably, the utilization of a custom software package to align and analyze brain images for c-Fos positive cells stands out as an impressive addition to their approach. This innovative technique enables effective quantification of neural activity and enhances our understanding of how anesthetic drugs influence brain networks as a whole.

      The primary novelty of this paper lies in the comparative analysis of two anesthetics, Ketamine and Isoflurane, and their respective impacts on brain-wide c-Fos expression. The study reveals the distinct pathways through which these anesthetics induce loss of consciousness. Ketamine primarily influences the cerebral cortex, while Isoflurane targets subcortical brain regions. This finding highlights the differing mechanisms of action employed by these two anesthetics-a top-down approach for Ketamine and a bottom-up mechanism for Isoflurane. Furthermore, this study uncovers commonly activated brain regions under both anesthetics, advancing our knowledge about the mechanisms underlying general anesthesia.

      We are thankful for your positive and insightful comments on our study. Your recognition of the study's methodology and its significance in advancing our understanding of anesthetic mechanisms is greatly valued. By comprehensively mapping c-Fos expression across a wide range of brain regions, our study reveals the distinct and overlapping impacts of these anesthetics on various brain functions, providing a valuable foundation for future research into the mechanisms of general anesthesia, potentially guiding the development of more targeted anesthetic agents and therapeutic strategies. Thus, we are confident that our work will captivate the interest of our readers.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript from Mukherjee et al examines potential connections between telomere length and tumor immune responses. This examination is based on the premise that telomeres and tumor immunity have each been shown to play separate, but important, roles in cancer progression and prognosis as well as prior correlative findings between telomere length and immunity. In keeping with a potential connection between telomere length and tumor immunity, the authors find that long telomere length is associated with reduced expression of the cytokine receptor IL1R1. Long telomere length is also associated with reduced TRF2 occupancy at the putative IL1R1 promoter. These observations lead the authors towards a model in which reduced telomere occupancy of TRF2 - due to telomere shortening - promotes IL1R1 transcription via recruitment of the p300 histone acetyltransferase. This model is based on earlier studies from this group (i.e. Mukherjee et al., 2019) which first proposed that telomere length can influence gene expression by enabling TRF2 binding and gene transactivation at telomere-distal sites. Further mechanistic work suggests that G-quadruplexes are important for TRF2 binding to IL1R1 promoter and that TRF2 acetylation is necessary for p300 recruitment. Complementary studies in human triple-negative breast cancer cells add potential clinical relevance but do not possess a direct connection to the proposed model. Overall, the article presents several interesting observations, but disconnection across central elements of the model and the marginal degree of the data leave open significant uncertainty regarding the conclusions.

      Strengths:

      Many of the key results are examined across multiple cell models.

      The authors propose a highly innovative model to explain their results.

      Weaknesses:

      Although the authors attempt to replicate most key results across multiple models, the results are often marginal or appear to lack statistical significance. For example, the reduction in IL1R1 protein levels observed in HT1080 cells that possess long telomeres relative to HT1080 short telomere cells appears to be modest (Supplementary Figure 1I). Associated changes in IL1R1 mRNA levels are similarly modest.

      Related to the point above, a lack of strong functional studies leaves an open question as to whether observed changes in IL1R1 expression across telomere short/long cancer cells are biologically meaningful.

      Statistical significance is described sporadically throughout the paper. Most major trends hold, but the statistical significance of the results is often unclear. For example, Figure 1A uses a statistical test to show statistically significant increases in TRF2 occupancy at the IL1R1 promoter in short telomere HT1080 relative to long telomere HT1080. However, similar experiments (i.e. Figure 2B, Figure 4A - D) lack statistical tests.

      TRF2 overexpression resulted in ~ 5-fold or more change in IL1R1 expression. Compared to this, telomere length-dependent alterations in IL1R1 expression, although about 2-fold, appear modest (~ 50% reduction in cells with long telomeres across different model systems used). Notably, this was consistent and significant across cell-based model systems and xenograft tumors (see Figure 1). Unlike TRF2 induction, telomere elongation or shortening vary within the permissible physiological limits of cells. This is likely to result in the observed variation in IL1R1 levels. For biological relevance, we further demonstrated that IL1 signalling in TNBC tissue and tumor organoids, and M2 macrophage infiltration, was significantly dependent on telomere length. Details of tests of significance were included in the individual figure legends. Based on the comment here we will expand on it in a dedicated paragraph in the methods section to make the information clearer for readers. We noticed that the stars (*) denoting statistical significance were omitted in some ChIP-experiment figures. This was likely an error during figure assembly for PDF conversion. We thank the reviewer for bringing this up; necessary changes will be made in the revised manuscript.

      Reviewer #2 (Public Review):

      This study highlights the role of telomeres in modulating IL-1 signaling and tumor immunity. The authors demonstrate a strong correlation between telomere length and IL-1 signaling by analyzing TNBC patient samples and tumor-derived organoids. Mechanistic insights revealed non-telomeric TRF2 binding at the IL-1R1. The observed effects on NF-kB signaling and subsequent alterations in cytokine expression contribute significantly to our understanding of the complex interplay between telomeres and the tumor microenvironment. Furthermore, the study reports that the length of telomeres and IL-1R1 expression is associated with TAM enrichment. However, the manuscript lacks in-depth mechanistic insights into how telomere length affects IL-1R1 expression. Overall, this work broadens our understanding of telomere biology.

      The mechanism of how telomere length affects IL1R1 expression involves sequestration and reallocation of TRF2 between telomeres and gene promoters (in this case, the IL1R1 promoter). We have previously shown this across multiple genomic sites (Mukherjee et al, 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). We have described this in the manuscript along with references citing the previous works. A scheme explaining the model was provided as Additional Supplementary Figure 1, along with a description of the mechanistic model.

      Figure 1-4 in main figures describe the molecular mechanism of telomere-dependent IL1R1 activation. This includes ChIP data for TRF2 on the IL1R1 promoter in long/short telomeres, as well as TRF2-mediated histone/p300 recruitment and IL1R1 gene expression. We further show how specific acetylation on TRF2 is crucial for TRF2-mediated IL1R1 regulation (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, entitled "Telomere length sensitive regulation of Interleukin Receptor 1 type 1 (IL1R1) by the shelterin protein TRF2 modulates immune signalling in the tumour microenvironment", Dr. Mukherjee and colleagues pointed out clarifying the extra-telomeric role of TRF2 in regulating IL1R1 expression with consequent impact on TAMs tumor-infiltration.

      Strengths:

      Upon careful manuscript evaluation, I feel that the presented story is undoubtedly well conceived. At the technical level, experiments have been properly performed and the obtained results support the authors' conclusions.

      Weaknesses:

      Unfortunately, the covered topic is not particularly novel. In detail, the TRF2 capability of binding extratelomeric foci in cells with short telomeres has been well demonstrated in a previous work published by the same research group. The capability of TRF2 to regulate gene expression is well-known, the capability of TRF2 to interact with p300 has been already demonstrated and, finally, the capability of TRF2 to regulate TAMs infiltration (that is the effective novelty of the manuscript) appears as an obvious consequence of IL1R1 modulation (this is probably due to the current manuscript organization).

      Here we studied the TRF2-IL1R1 regulatory axis (not reported earlier by us or others) as a case of the telomere sequestration model that we described earlier (Mukherjee et al., 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). This manuscript demonstrates the effect of the TRF2-IL1R1 regulation on telomere-sensitive tumor macrophage recruitment. To the best of our knowledge, no previous study connects telomeres of tumor cells mechanistically to the tumor immune microenvironment. Here we focused on the IL1R1 promoter and provided mechanistic evidence for acetylated-TRF2 engaging the HAT p300 for epigenetically altering the promoter. This mechanism of TRF2 mediated activation has not been previously reported. Further, the function of a specific post translational modification (acetylation of the lysine residue 293K) of TRF2 in IL1R1 regulation is described for the first time. Additional experiments showed that TRF2-acetylation mutants, when targeted to the IL1R1 promoter, significantly alter the transcriptional state of the IL1R1 promoter. To our knowledge, the function of any TRF2 residue in transcriptional activation had not been previously described. Taken together, these demonstrate novel insights into the mechanism of TRF2-mediated gene regulation, that is telomere-sensitive, and affects the tumor-immune microenvironment. We are considering the suggestion to reorganize the manuscript to highlight the novel aspects of our work more convincingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses

      • Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.

      In accordance with the reviewer’s suggestion, we provided the composition of the T cell subset per subject (all 8 subjects) in the revised manuscript (shown below).

      Author response image 1.

      • S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.

      We thank the reviewer for raising the question related to our experimental strategy. We chose this method because a background unrelated to S protein was lower than widely used AIM methods, which is verified by reconstituting many TCRs and testing the responses in vitro. One more reason is this method can identify S-reactive functional (proliferative) T cell clonotypes than anergic or less-responsive T cells as the reviewer mentioned, which is our objective in this study. In accordance with the reviewer’s suggestion, we have carefully described our limitation and rationale of our experimental strategy in the revised manuscript.

      • Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.

      As the reviewer mentioned, the pre-existing highly expanded clonotypes that we analyzed did not react to HCoV-derived peptides. After we determined the epitopes of the clonotypes, the S peptide sequences were analyzed for homology in HCoVs. The only two clonotypes whose epitope sequences were relatively conserved in HCoV strains (clonotypes #8-pre_9 and #8-pre_10) were tested for their reactivity to the similar HCoV epitope counterparts, but no activation was observed (shown below). We added these data in the revised manuscript.

      Author response image 2.

      • As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.

      As the reviewer mentioned, some pre-existing S-reactive T cells might appear to react with S peptides judging from the NFAT-GFP expression of their reporter cell lines. However, the percentage of GFP-expressing cells is affected by many factors such as TCR expression level and HLA molecule expression level. Thus, the affinity of pre-existing S-reactive T cells was not fully deduced from the activation of reporter cell lines as shown in Fig. S3 in the present manuscript. We thank the reviewer for this constructive suggestion, but we therefore decided not to use these data quantitatively to evaluate affinity in this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      A short-term comparison of durability of S antibody levels after 2-dose vaccination, showing that better or more poorly sustained responses correlate with the presence of Tfh cells.

      Strengths:

      Novelty of approach in expanding, sequencing and expressing TCRs for functional studies from the implicated populations.

      Weaknesses:

      Somewhat outdated question, short timeline, small numbers, over-interpretation of sequence homology data

      Reviewer #2 (Recommendations For The Authors):

      In line with my above comments, it might be useful for the authors to look at moderating some of the assertions in what is a rather small-scale descriptive account of correlates of some quite nuanced, short-term, S antibody response differences

      We clearly described that some homologous microbe-derived peptides were indeed recognized by S-reactive T cells. Also, we have removed our overstatement from the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:

      (1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.

      In accordance with the reviewer’s request, we have also analyzed the T cells separately (shown below). We observed the average frequency was much lower in decliners than sustainers, while the difference did not reach statistical significance partly because of the large deviation due to one sustainer (#27) who possessed quite a high Tfh%. We modified our description in the revised manuscript.

      Author response image 3.

      (2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.

      We agree with the reviewer’s concern. Since antigen stimulation only induced the proliferation of antigen-specific T cells, the multiple clusters were mostly due to the fluctuation of cell cyclerelated genes. We therefore carefully and manually annotated these clusters by selecting the cell type-related genes (Kaech et al, Nat. Rev. Immunol., 2002; Sallusto et al, Annu Rev Immunol., 2004) and determined their subsets regardless of the automatic clustering based on the whole transcriptome. Indeed, antigen-responded Tfh and Treg are close, as ICOS and PDCD1 are expressed. We mainly used IL21 and FOXP3 to distinguish the Tfh and Treg populations, respectively. We thank the reviewer for pointing out this important process that we carefully addressed. We added the description of annotation methods to the revised manuscript.

      (3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.

      We thank the reviewer for raising this important issue. As the reviewer pointed out, culturing T cells for 10 days indeed changed the repertoire and features, so the Tfh clonotypes we detected after the expansion may not correspond to the cTfh clonotypes in vivo. Because our observation and analysis were mostly based on the dominant T cell clonotypes expanded in vitro, we modified our description and conclusion accordingly in the revised manuscript.

      (4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.

      We thank the reviewer for pointing out our inaccuracy. As the reviewer suggested, we used percentages to demonstrate the relative abundance of each clonotype in Fig. 4B of the revised manuscript.

      (5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.

      We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have the exactly same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in the rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to my public review 1. To make a solid conclusion, I think the author can include more sustainers and decliners if possible, can just stimulate their PBMCs for 10 days and check the Tfh features in proliferated CD4 T cells (e.g. IL21 secretion, PD-1 expression etc). And then compare these values in sustainers vs decliners

      We thank the reviewer for the suggestion. Unfortunately, additional PBMCs from more sustainers and decliners are not available to us. Instead, we carefully described the current observation in the revised manuscript.

      (2) Related to my public review 3. The author can attempt to sort CXCR5+ cTfh and CXCR5- non cTfh, stimulate in vitro for 10 days and compare whether the stimulated cTfh still have more Tfh-related features such as increased IL- 21 secretion.

      As the reviewer recommended, sorting and culturing the cTfh and non cTfh separately will clarify this issue. Due to the limitation of the samples, we could not perform these experiments.

      (3) I couldn't find information about the availability of data and code to analyze the single cell RNA-seq dataset in the manuscript

      We clarified the availability of data and added the codes for the single cell RNA-seq dataset in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Comment 1: The authors showed increased plasma IL-22 and its expression in the intestine. Are intestinal ILC3s the main source of plasma IL-22?

      Reply: ILC3s are the main source of IL-22 as reported previously (PMID: 30700914). In the small intestine, ILC3s account for about 62% of IL22+ cells. Other IL22+ cells include γδ T, Foxp3+T and CD4+T cells.

      Comment 2: The authors transplanted intestinal ILC3s from NCD mice to DIO mice and showed significant metabolic improvements. However, in Fig. 1, intermittent fasting increased IL-22positive ILC3s proportion rather than changing the total number. Please clarify whether this transplantation is due to increasing ILC3s number or introducing more IL-22 positive ILC3s (which are decreased in DIO). Are these transplanted ILC3s by default homing to the intestine rather than to other tissues?

      Reply: We believe that the transplantation increases ILC3s number, leading to the increment in IL22 levels. The transplanted ILC3s by default are homing to the intestine rather than to other tissues because ILC3s express several homing receptors such as CCR7, CCR9, and α4β7, which modulate their capacity to migrate to the gut (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). Our observation that ILC3s in adipose tissue remained unchanged by ILC3 cell transplantation (Supplementary Figure 5F) also supports this concept.

      Comment 3: Thermogenesis in this acute cold challenge is mainly by brown adipose tissue. Beiging is a chronic and adaptive response. Based on the data in WAT, there is a beiging phenotype, but the core body temperature in acute cold challenge is not an accurate readout. It would be a missed opportunity by not evaluating thermogenic activity in BAT. More browning genes should be included to strengthen the beiging phenotype of WAT. Moreover, inflammation in WAT can be examined to provide a whole picture of adipose tissue remodeling through this pathway.

      Reply: Per suggestion, we performed additional experiments to measure levels of inflammation genes such as Il4, Il1b, Il6, Il22, Il23, Il17a. As shown in supplemental figure 2D, these inflammation relevant genes were not altered.

      Comment 4: For the SVF beige adipocyte differentiation, 100 ng/mL IL-22 was used. This is highly above the physiological concentration at ~5 pg/mL. Please justify this high concentration used.

      Reply: We agree with the reviewer that the dose of IL-22 used is high. However, the efficient dose at 100 ng/ml used in our studies is consistent with the literatures. Previous reports have shown that IL-22 directly activates Stat3 in adipose tissue and primary adipocytes, and promotes the expression of genes involved in triglyceride lipolysis (Lipe and Pnpla2) and fatty-acid β-oxidation (Acox1) at the dose of 100 ng/ml (Wang X, Ota N, et al. Nature. 2014). Consistently, other studies have reported that IL-22 at 100 ng/ml significantly reversed the enhanced expression of CCL2, CCL20 and IL1B mRNAs in granulosa cells in vitro (Qi X, et al. Nat Med. 2019).

      Comment 5: The authors showed increased Ucp1 and Cidea expression by IL-22 treatment in SVFs. Please be aware that these increases are likely due to boosted adipogenesis as told by the morphology. Please examine more adipogenic markers to confirm. Is this higher adipogenesis caused by the high concentration of IL-22?

      Reply: Per suggestion, we examined the expression of adipogenic marker genes such as Pparγand Fabp4. We found that IL-22 did not increase the levels of these adipogenic marker genes relevant to the PBS control as shown in supplemental figure 6F.

      Author response image 1.

      Comment 6: In line 201, the authors drew the conclusion that IL-22 increased SVF beige differentiation. To fully support this conclusion, the authors should assure adipogenesis at the same baseline and then compare beiging, or examine the effect of IL-22 on normal adipogenesis to compare with beige differentiation.

      Reply: We examined the expression of adipogenic marker genes such as Pparγ and Fabp4 and found that IL-22 did not increase the expression of these adipogenic marker genes relevant to the PBS control.

      Reviewer #2:

      This study aims to investigate the mediatory role of intestinal ILC3-derived IL-22 in intermittent fasting-elicited metabolic benefits.

      Strengths:

      The observation of induction of IL-22 production by intestinal ILC3 is significant, and the scRNAseq provides new information into intestine-resident immune cell profiling in response to repeated fasting and refeeding.

      Weaknesses:

      The experimental design for some studies needs to be improved to enhance the rigor of the overall study. There is a lack of direct evidence showing that the metabolically beneficial effects of IF are mediated by intestinal ILC3 and their derived IL-22. The mechanism by which IL-22 induces a thermogenic program is unknown. The browning effect induced by IF may involve constitutive activation of lipolysis, which was not considered.

      Comment 1: Lack of direct evidence showing that IL-22-expressing ILC3s in intestine is the key contributor to intermittent fasting (IF)-mediated elevation of circulating IL-22 levels. The fraction of IL-22-expressing cells was increased threefold by IF but the increase in circulating IL-22 is moderate (Figs. 1J and 1K).

      Reply: IL-22 in circulation is subjected to clearance, degradation, and binding with plasma proteins, et al. Thus, circulating levels of IL-22 may be much lower than the amount secreted by the intestinal IL-22 positive ILC3s.

      Comment 2: The loss of fat mass by IF suggests that the active lipolysis may explain the white fat browning which was not considered. This may apply to the observations in IL-22 treated mice as well as IL-22R KO mice.

      Reply: We analyzed the expression of genes relate to lipolysis in NCD and NCD-IF mice and found that IF did not alter the levels of these genes in white adipose tissues (Supplementary figure 2D). We have addressed this concerns in lines 119, page 6.

      Author response image 2.

      Comment 3: IL-22 administration and adoptive transfer of ILC3 had no significant effect on body weight. Not clear how IL-22 improves insulin sensitivity in this case.

      Reply: Our results are consistent with previous report showing that IL-22 administration improves insulin sensitivity without change in body weight (Qi X, et al. Nat Med. 2019). In addition, previous studies have demonstrated that IL-22 can increase Akt phosphorylation in muscle, liver and adipose tissues, leading to improvement in insulin sensitivity (Wang X, et al. Nature. 2014). We have addressed this potential mechanism in lines192-195, page 9.

      Comment 4: The energy expenditure data look unusual given that there was little increase in oxygen consumption during dark cycle compared to light cycle (Fig.3).

      Reply: The not so obvious difference in oxygen consumption between dark cycle and light cycle may be due to the technical problem of the system.

      Comment 5: The thermogenic capacity for the whole fat pad needs to consider the expression of UCP1 in certain amount of tissue and the total mass for each individual animal because the mRNA level itself does not reflect the whole tissue capacity.

      Reply: We used the whole subcutaneous adipose tissue from one side for qPCR to reflect the whole tissue capacity.

      Comment 6: The design of studies for the adoptive transfer of ILC3 was concerned. The PBS is not a good control for the group with ILC3 cells (Figs. 2A-2H). Similar issue applies for the co-culture study in which beige only is not an ideal control for Beige+ILC3 (Figs. 2I-2J).

      Reply: We agree with the reviewer that the PBS is not a good control. Because we cannot find a similar immune cell without any effect on adipocytes, we designed this experiment based on other studies in which saline or PBS are used as ILC transfer experiment controls (Sasaki T, et al. Cell Rep. 2019; Wang H, et al. Nat Commun. 2019)

      Comment 7: The induction of thermogenesis by IL-22 treatment may be related to enhanced differentiation rather than direct activation of thermogenic genes (Figs. 4G and 4H).

      Reply: Our observation that IL-22 did not alter the levels of genes related to adipogenesis (Supplemental figure 6F) indicates that IL-22 may not alter the differentiation of adipocytes. We addressed this concern in Lines 211-212, page 10.

      Reviewer #3:

      Chen et al. investigated how intermittent fasting causes metabolic benefits in obese mice and found that intestinal ILC3 and IL-22-IL-22R signaling contribute to the beiging of white adipose tissue (WAT) and consequent metabolic benefits including improved glucose and lipid metabolism in diet-induced obese mice. They demonstrate that intermittent fasting causes increased IL22+ILC3 in small intestines of mice. Adoptive transfer of purified intestinal ILC3 or administration of exogenous IL-22 can lead to increases in UCP1 gene expression and energy expenditure as well as improved glucose metabolism. Importantly, the above metabolic benefits caused by intermittent fasting are abolished in IL-22R-/- mice. Using an in vitro experiment, the authors show that ILC3derived IL-22 may directly act on adipocytes to promote SVF beige differentiation. Finally, by performing sc-RNA-seq analysis of intestinal immune cells from mice with different treatments, the authors indicate a possible way of intestinal ILC3 being activated by intermittent fasting. Overall, this study provides a new mechanistic explanation for the metabolic benefits of intermittent fasting and reveals the role of intestinal ILC3 in the enhancement of the whole-body energy expenditure and glucose metabolism likely via IL-22-induced beige adipogenesis.

      Although this study presents some interesting findings, particularly IL-22 derived from intestinal ILC3 could induce beiging of WAT by directly acting on adipocytes, the experimental data are not sufficient to support the key claims in the manuscript.

      Comment 1: Only increased UCP1 expression on mRNA level is not enough to support the beiging of WAT. More methods such as western blotting and immunostaining of UCP1 in WAT are needed to confirm the enhanced beige adipogenesis.

      Reply: Additional experiments have been performed to measure the UCP1 protein by Western blot. The data is included in Figure 4I and Supplementary Figure 2E.

      Comment 2: IL-22 is known to modulate metabolic pathways via multiple downstream functions. The use of whole-body knockout of IL-22R could not exclude the indirect effect on the promotion of beiging of WAT. Specific deletion of IL-22R in adipose tissues is therefore needed to confirm the direct effect of IL-22 on adipocytes which is suggested by the in vitro study.

      Reply: We agreed with the reviewer that specific deletion of IL-22R in adipose tissues is critical to confirm the direct effect of IL-22 on adipocytes. We will generate the AdioQ-IL-22R-/- mice to test this concept further in vivo.

      Comment 3: The authors failed to show the cellular distribution of IL-22R in adipose tissues. This is important because the mechanism that explains the increased beige adipogenesis could be different based on the expression of IL-22R in adipose progenitor cells or mature adipocytes. So it is not appropriate to conclude that "IL-22 then directly activates IL-22R on adipocytes, leading to subsequent induction of beiging of white adipose tissue" in line 407. Additionally, Oil red O staining is needed for Fig 4G and Fig 5J, and protein levels of UCP1 and adipogenesis-related markers are needed to evaluate beige fat differentiation and the whole adipogenesis.

      Reply: Per suggestion, we have added the expression of IL-22R in adipose progenitor cells or mature adipocytes (Supplementary Figure 6E). In addition, protein levels of UCP1 and adipogenesis-related markers to evaluate the whole adipogenesis (Figure 4I, Supplementary figure 6F) are now included. We have also addressed this issue in lines 207-215, page 10.

      Comment 4: Although the authors provided some hypothesis about how intermittent fasting increases IL-22+ILC3 in small intestines by sc-RNA-seq analysis, some functional assays are needed to identify the factors, for example, how about the levels of macrophage-derived IL-23 or AHR ligands in small intestines and whether they contribute to increased percentages of intestinal IL-22+ILC3 following intermittent fasting.

      Reply: We used flow cytometry sorting of macrophages combined with qPCR experiments to preliminarily demonstrate that intermittent fasting increases the expression of molecules such as Cd44 and CCl4 (Supplementary Figure 10B), which may contribute to the increase in the proportion of IL-22+ ILC3s in the intestine under intermittent fasting. Our observation that IL-23 mRNA levels were not changed indicates that this molecule may not the major contributor for the communication between macrophage and ILC3s. Other potential molecules such as AHR ligands remain to be explored.

      Comment 5: What are the differences between adipose ILC3 and intestinal ILC3? Why do transferred ILC3 only migrate to the small intestine but not WAT of recipient mice? It would be better to examine or at least discuss whether other factors from intestinal ILC3 may also contribute to beiging of WAT following intermittent fasting.

      Reply: Intestinal ILC3s specifically express gut homing receptors CCR7, CCR9, and α4β7 (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). This may explain transplantation of intestinal ILC3s can migrate mainly to the intestine instead of adipose tissue (PMID: 34625492). The proportion of ILC3s in adipose tissue of mice is very small. Their functions have not been clarified yet. We have addressed this issue in lines 156-158, page 8.

      There are some other factors from intestinal ILC3 which may also contribute to beiging of WAT following intermittent fasting. By secreting IL-22, ILC3 enhanced the intestinal mucosal barrier, leading to reduction of the influx of LPS and PGN into the bloodstream under high-fat diet conditions, and subsequent increase in the beiging of white adipose tissue (Chen H, et al. Acta Pharm Sin B. 2022). We have addressed this potential mechanism in lines 344-347, page 16.

      Comment 6: The sensitivity of the IL-22 ELISA kit used in the manuscript was 8.2 pg/mL, according to the information from the methods, however, in Fig. 1J and Fig. 2B, the IL-22 levels in mouse plasma were lower than 6 pg/mL, which was below the sensitivity of the ELISA kit and also the assay range. Please explain.

      Reply: We have double-checked the original data and found that we have made a mistake in calculating the concentration of IL-22. We have corrected this error (Fig. 1J, Fig. 2B).

      Comment 7: In Fig 7A, the significance of the Hypothesis testing should be marked. In Fig 7F and 7G, the contrast between the two groups is not apparent, other comparing ways could be used to enhance the readability.

      Reply: Per suggestion, we have marked the significance of the hypothesis testing between HFD vs NCD and HFD-IF vs HFD in Fig7A. Shown in Fig 7F and 7G are the top 20 enriched interacting proteins between different cell types. The dot plot displays the average expression level and significance of protein interactions in cell types.

      Comment 8: The total food intake of fasting mice fed with NCD or HFD was less than those without fasting, and the food intake rate the author showed in Fig S1 represents the value that was normalized to body weight. So the author should describe it precisely In line 114.

      Reply: We have revised the statement accordingly in line 114-115.

      Comment 9: Western blotting analysis has been described in methods, however, there is no corresponding experimental data in the result part.

      Reply: The Western blotting results are now included.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We have made substantial revisions to the manuscript, incorporating new data, which led to a renumbering and relabeling of several figures: • Figure 3F now features a modified graph color.

      • Figure 4I introduces a new experiment.

      • What was previously labeled as Figure 4I-O is now Figure 4J-P.

      • Figure 5H presents another new experiment.

      • The earlier Figure 5H is now rebranded as Figure 5I.

      • A fresh experiment has been incorporated into Supplement Figure 1a.

      • The former Supplement Figure 1a is now Supplement Figure 1b.

      • Supplement Figure 2d describes an additional new experiment.

      • In accordance with the HUGO gene nomenclature committee (HGNC) recommendations, we've updated the names of genes/proteins in both figures and their accompanying legends.

      Reviewer #1 (Recommendations For The Authors):

      Comment #1. Standard practice would include multiple TNBC cell lines to test the author's hypotheses, but the authors rely only on one cell line in the entire paper, MDA-MB-231 cells. The authors do correlate their findings to patient data, but the inclusion of an additional TNBC cell line would strengthen their findings about the L-DOXR cells and help with the assessment as to how reproducible their original microfluidics system is.

      Response: Thank you for your valuable feedback. We recognize the importance of utilizing multiple TNBC cell lines for rigorous validation and reproducibility. There are several reports highlighting the generation of L-DOXR cells in other types of breast cancer cell lines, such as MCF-7 (Fei et al., 2015), and in other cancer types like the prostate cancer cell line PC-3. These studies utilized a microfluidic device with a concentration gradient of Doxorubicin. With this existing evidence, we are confident that a variety of cancer cell types have the potential to form L-DOXR cells in a doxorubicin gradient. The cited reports support our choice of the MDA-MB-231 cell line for our current study:

      “L-DOXR cells exhibit increased genomic content (4N+) as compared to WT cells. The presence of cells with increased nuclear size and increased genomic content has been demonstrated to be associated with poor clinical outcomes in several types of cancers (Alharbi et al., 2018; Amend et al., 2019; Fei et al., 2015; Imai et al., 1999; Liu et al., 2018; Lv et al., 2014; Mukherjee et al., 2022; O’connor et al., 2002; Saini et al., 2022; Trabzonlu et al., 2023). (Page 5, Line 24)”

      However, we acknowledge the validity of your point regarding the strengthening of our findings with the inclusion of additional TNBC cell lines. We are considering expanding our research in future studies to further validate our findings across multiple TNBC cell lines. Thank you for bringing this to our attention, and we hope our response adequately addresses your concerns.

      Comment #2. It would be helpful to comment on the frequency at which doxorubicin is used clinically to treat TNBC patients. The authors equate their resistance phenotype to all chemotherapies (in patient data and title) but only test doxorubicin. Does NUPR1 overexpression result in resistance to other chemotherapies?

      Response: Thank you for raising these pertinent questions. To address your first point regarding the clinical use of doxorubicin for TNBC patients: At the Samsung Medical Center, the typical chemotherapy regimen for TNBC patients involves administering Neo. AC (Doxorubicin 34 mg + Cyclophosphamide 840 mg per session) four times, followed by Adj. D (Docetaxel 25 mg + 80 mg per session) for another four sessions. This provides insight into the clinical relevance and frequency of Doxorubicin's use in treating TNBC.

      Regarding your second point about NUPR1 overexpression and its broader implications for chemotherapy resistance: Yes, NUPR1 overexpression has been documented to result in resistance to various chemotherapies. A study by Lei Jiang et al. in the Journal of Pharmacy and Pharmacology found that NUPR1 plays a role in YAP-mediated gastric cancer malignancy and drug resistance through the activation of AKT and p21 (Jiang et al., 2021, https://doi.org/10.1093/jpp/rgab010). Additionally, another study by Wang et al. in Cell Death and Disease observed that the transcriptional coregulator NUPR1 is linked to tamoxifen resistance in breast cancer cells (Wang et al., 2021, https://doi.org/10.1038/s41419-021-03442-z). In light of this, while our study primarily focused on doxorubicin, the role of NUPR1 in resistance spans across various chemotherapeutic agents, adding depth to our findings and their broader implications in cancer therapy.

      Comment #3. The authors knockdown NUPR1 in L-DOXR cells, but overexpression of NUPR1 in WT TNBC cells to see if this renders the WT cells more resistant would be an important experiment.

      Response: We appreciate the reviewer's suggestion, which indeed underscores an important aspect of our study. In response, we have incorporated additional experiments in the revised manuscript. Specifically, on page 7 (lines 7-8) and in Supplement Figure 2c, we present data from experiments where we overexpressed Nupr1 in WT-MDA-MB231 cells. Our findings revealed that overexpression of GST-Nupr1 not only attenuates Dox-induced cell death but also mildly enhances cell viability in WT cells even without DOX treatment. This implies that cells expressing Nupr1 exhibit resistance to the cytotoxic effects of DOX. We believe these new data further solidify our conclusions and address the valuable point you raised.

      Comment #4. The similar colors/symbols chosen for the different groups in the xenograft plots are hard to easily interpret without zooming in.

      Response: We modified the xenograft plots as you recommended in Figure 3F.

      Comment #5. There are some grammatical errors throughout the paper. Below is an example: In the opening of the Discussion "TNBC is the most aggressive subtype of breast cancer, and chemotherapy is a mainstay of treatment. However, chemoresistance is common and contributes to the long-term survival of TNBC patients" - this sentence makes it seem like chemoresistance makes TNBC patients survive longer. The following sentence "These cells demonstrated a large phenotype with increased genomic content." is abrupt and doesn't make sense. Consider carefully re-reading the manuscript for grammatical errors.

      Response: Thank you for highlighting the grammatical errors and providing specific <br /> examples. We deeply apologize for the oversight. In response to your feedback, we've carefully re-reviewed the manuscript and made the necessary corrections. Based on your example: We've revised the sentences to: “TNBC is the most aggressive subtype of breast cancer, with chemotherapy being a mainstay of treatment. However, the development of chemoresistance frequently occurs and poses significant challenges to the long-term survival prospects of TNBC patients.” “As for the cells in question, they exhibited an enlarged phenotype along with an increased genomic content.”

      We appreciate your meticulous review, and we have made an effort to address and rectify other such errors throughout the manuscript.

      Reviewer #2 (Recommendations for The Authors):

      I recommend the authors to address the following minor issues. Below are specific comments on the manuscript.

      Comments # 1. Thank you for the comment. In CDRA chip, DOXR cells and L-DOXR cells appeared in the mid-DOX region. What is the concentration of DOX in this region? Can the authors calculate the concentrations of DOX in high-, mid-, and low- regions (or ranges of concentrations)?

      Response: Instead of DOX, we used FITC dye to visualize the concentration gradient over the chip as below because DOX generate very low fluorescent light.

      Author response image 1.

      While our method provides an estimation rather than precise measurement due to the difference in molecular weight between FITC (389.38 g/mol) and DOX (579.98 g/mol), it is still possible to approximate the distribution of DOX concentrations across different regions. We utilize a formula where the ratio of the average fluorescence intensity of FITC for each specific region to the highest recorded fluorescence intensity is multiplied by the peak DOX concentration (1.5 μM). This approach gives us an estimated average concentration of DOX in each region, acknowledging that the diffusion characteristics of FITC and DOX may vary due to their differences in molecular weight. The following formula.

      With this formula we can calculate the concentration in each region. High region= 1.161 μM; Mid region = 0.554 μM; Low region = 0.098 μM

      Comment #2. Is there any other phenotypic difference between DOXR cells and L-DOXR cells besides their size?

      Response: "In addition to differences in cell size, L-DOXR cells exhibit several distinct phenotypic characteristics when compared to DOXR cells. These include variations in the cell cycle profile (as detailed in Fig. 2F-H), altered drug efflux capabilities (presented in Fig. 2I-J), and changes in nuclear morphology (illustrated in Fig. S3D). These phenotypic distinctions suggest that L-DOXR cells may have adapted unique mechanisms of resistance and survival, which are comprehensively depicted in the figures mentioned.

      Comment #3. Please add a description of abbreviations when the abbreviation is first used in the manuscript (e.g. NUPR1, HDAC11 etc.).

      Response: We corrected the mistake.

      Comment # 4. Figure 2B is the schematic of the chip, not the dimension of the chip. Please add the dimension of the chip to keep the figure caption as is or change the figure caption.

      Response: Thank you for the correction. We change the figure caption as Schematic of the chip.

      Reviewer #3 (Recommendations for The Authors):

      In this manuscript, Lim and colleagues use an innovative CDRA chip platform to derive and mechanistically elucidate the molecular wiring of doxorubicin-resistant (DOXR) MDA-MB-231 cells. Given their enlarged morphology and polyploidy, they termed these cells as Large-DOXR (L-DORX). Through comparative functional omics, they deduce the NUPR1/HDAC11 axis to be essential in imparting doxorubicin resistance and, consequently, genetic or pharmacologic inhibition of the NUPR1 to restore sensitivity to the drug. Although innovative, some deficiencies in the present manuscript slightly weaken the primary conclusions. A couple of critical issues are the use of a single cell line model (i.e., MDA-MB-231) for all the phenotypic and functional experiments and absolutely no mechanistic insights into how NUPR1 imparts resistance to doxorubicin. Some questions and comments are listed below for the authors' consideration and response:

      Major:

      Comment #1. The authors treated only the MDA-MB-231 cells with doxorubicin in the CDRA chip. Do other TNBC cell lines (namely, MDA-MB-436, HCC1187, or others) respond similarly to dox treatment, eventually yielding enlarged, aneuploid cells with the resistant phenotype? It is important to show that this phenotype is not confined to a single cell line, particularly given the numerous TNBC models that are commonly used.

      Response: Thank you for your insightful query regarding the generalizability of our findings across different TNBC cell lines. In this initial study, we focused exclusively on MDA-MB-231 cells due to their widespread use as a model for aggressive triple-negative breast cancer and the constraints of time and resources. While we cannot definitively claim that the observed phenotypic changes upon doxorubicin treatment will be identical in other TNBC cell lines such as MDA-MB-436 or HCC1187, we hypothesize that the underlying mechanisms of chemoresistance and cellular response could be similar across various TNBC models. This hypothesis is supported by literature indicating common pathways of drug resistance in TNBC. We believe that our findings lay the groundwork for future studies to explore the response of a broader range of TNBC cell lines to doxorubicin treatment. Such studies would greatly enhance our understanding of the cellular adaptations to chemotherapeutic agents in TNBC and help to validate the potential universal application of our findings.

      Comment #2: Do the L-DOXR cells permanently hold onto the enlarged and polyploid states upon prolonged culture in vitro? Does that change given the presence or withdrawal of the drug? In other words, is the physical state of the resistant cells reversible, or is it passed onto the progeny cells regardless of continued stress from the drug?

      Response: Thank you for your question about the stability of the phenotypic changes in L- DOXR cells. Our observations suggest that the enlarged and polyploid states in L-DOXR cells are not permanently fixed. When cultured in vitro over an extended period without the selective pressure of doxorubicin, we have noted that some cells may revert to a non- polyploid state. However, this reversion does not seem to be a stable change as subsequent generations can present with polyploidy again, even in the absence of the drug. This indicates a potential epigenetic or microenvironmental influence on the phenotypic state of these cells, suggesting a complex interplay between the drug-induced stress and the inherent cellular response mechanisms. Further investigation is needed to fully understand the dynamics of these phenotypic changes and whether they are heritable and/or reversible under different culture conditions.

      Comment #3: In Figures 2F-H, the authors perform DNA-staining-based FACS to estimate the ploidy of the cells. These estimations could be improved using 2D cell cycle analyses using EdU or BrdU co-treatment and staining. This would further allow a clear distinction between S-phase and G0/G1 and M-phase cells in the WT, DOXR, and L-DORX populations.

      Response: Thank you for the suggestion to enhance the accuracy of our ploidy estimations. We appreciate the advice to implement 2D cell cycle analyses using EdU or BrdU co-treatment and staining, as this could indeed provide a clearer distinction between the various phases of the cell cycle in our WT (wild-type), DOXR (doxorubicin-resistant), and L-DOXR (large doxorubicin-resistant) cell populations. Incorporating these thymidine analogs would allow us to label newly synthesized DNA and thereby accurately delineate cells in the synthesis phase from those in the G0/G1 and M phases. This approach will likely add depth to our understanding of the cell cycle dynamics and the mechanism behind the drug resistance phenotype. We will consider incorporating these techniques in our future experiments to validate and extend the findings reported in this study.

      Comment #4. In Figure 3H, the authors quantitate the number of enlarged cells detected in human specimens of TNBC or normal breast tissues. How were these cells detected simply using the H&E staining, particularly when assessing the genomic content? Were certain size and nuclear staining intensity thresholds used for these categorizations? If so, these should be mentioned in the paper.

      Response: In our study, we identified enlarged cells within human TNBC and normal breast tissue specimens using H&E staining, and their quantitation was carried out using the Colour Deconvolution 2 plugin (Landini G et al., 2020) within the ImageJ software. This method allowed us to analyze the staining intensity and cell size systematically. To ascertain that we were indeed observing cells with increased genomic content, we established specific size and nuclear staining intensity thresholds. Cells exceeding these predetermined thresholds were categorized as 'enlarged'. Additionally, we used continuous serial slides for the human TNBC tissues microarray (BR1301, US Biomax) for more accurate comparisons in Figures 3H, I, and 5H. To strengthen our findings, we verified that NUPR1 expression, which is associated with the observed cell enlargements, was indeed elevated in these same cells from the patient samples. We have detailed these methodological aspects and the criteria for cell categorization in the 'Tissue Microarray and Immunohistochemistry' section of our Materials and Methods to ensure clarity and reproducibility of our results.

      Comment #5: In Figure 3I, the authors label the enlarged cells in the patient tissues as L-DOXR cells. Were these assessments done in dox-treated tumors? Even if that is the case, it'll be unfair to call them resistant to doxorubicin. The axis label "% enlarged cells" might be more accurate.

      Response: We appreciate the reviewer's attention to detail and agree that the terminology used in Figure 3I was inaccurate. The cells identified in patient tissues were labeled based on their morphological resemblance to L-DOXR cells observed in vitro; however, these patient tissue samples were not confirmed to be treated with doxorubicin, nor were the cells confirmed to be resistant. Therefore, we have amended the figure legend to reflect this and now refer to these cells simply as 'enlarged cells’.

      Comment #6: The authors uncovered that NUPR1 expression is dramatically increased in the L-DOXR cells vs the wild-type cells. How does the NUPR1 gene expression and activity compare between L-DOXR and DOXR MDA-MB-231 cells?

      Response: Thank you for the valuable comment. The data are included in figure supplement 3 and we revise the manuscript as below. “While DOXR cells exhibited a marked increase in Nupr1 expression compared to the WT cells, this expression was substantially less than that observed in L-DOXR cells, as detailed in figure supplement 3.”(Page 7, Line 3).

      Comment #7: Following from above, the authors show that NUPR1 activity is not necessary for cell survival in the absence of doxorubicin (Fig. 4H). But, does it control the cellular size and polyploid states of the L-DOXR cells? In other words, is there any association between increased size and genomic content of the cells to their sensitivity to doxorubicin? Are cells resistant to other chemotherapeutics as well? Or is the resistant phenotype specific to doxorubicin? The authors causally implicate NUPR1 in driving the dox-resistant phenotype in MDA-MB-231 cells. To fully substantiate this claim, the authors should perform gain-of-function studies, in at least 2-3 TNBC cell lines, to show that over-expression of NUPR1 alone is sufficient to impart doxorubicin resistance. Also, the most critical information missing from the study is how NUPR1 drives resistance to doxorubicin. What is the function of NUPR1 in L-DOXR cells and what gene expression program does it activate to impart the resistant phenotype?

      Response: During the experimental process either the loss of function or gain of function of Nupr1 in the L-DOXR cells, we have not noticed any specific changes in the cellular size and polyploid states of L-DOXR cells. Although we cannot rule out the possibility that not only by DOX treatment, phenotypically larger cell might arise in response to other chemotherapeutics, in the current study, we found that high level of Nupr1 expression is correlated with sensitivity to doxorubicin in L-DOX cells. Moreover, as followed by the reviewer’s suggestion we performed gain-of-function study to determine whether over-expression of NUPR1 alone is sufficient to impart doxorubicin resistance in TNBC cells. Overexpression of GST-NUPR1 attenuates DOX-induced cell death while slightly increased cell viability of WT (MDA-MB231) cells in the condition of vehicle -treatment, indicating that NUPR1 expressing cells are resistant to the cytotoxic effect of DOX. We have also demonstrated that Nupr1 upregulation in L-DOXR cells are due to suppressed expression of HDAC11 in these cells as we found that HDAC11 triggers promoter acetylation of Nupr1 in L-DOXR cells. Thus, it is conceivable that increased expression of Nupr1 upon HDAC11 suppression in L-DOXR cells is at least responsible for doxorubicin resistance.

      Comment #8: Do the authors speculate the dox-resistant phenotype to be restricted to basal TNBC tumors or even NUPR1-high ER+ breast cancer cells (MCF7 or T47D) would likely be resistant to doxorubicin or other chemotherapeutics?

      Response: Yes, NUPR1-high ER+ breast cancer cells (MCF7 or T47D) would likely be resistant to doxorubicin or other chemotherapeutics as reported elsewhere; Wang, L., Sun, J., Yin, Y. et al. Transcriptional coregualtor NUPR1 maintains tamoxifen resistance in breast cancer cells. Cell Death Dis 12, 149 (2021). https://doi.org/10.1038/s41419-021-03442-z

      Comment #9: The authors suggest that HDAC11 continuously deacetylates the NUPR1 promoter to suppress its expression. Consequently, does the inactivation of HDAC11 in wild-type TNBC cells lead to NUPR1 up-regulation? Is this increase in NUPR1 expression reverted upon inhibition of the HAT machinery (say P300/CBP) in HDAC11-deficient TNBC cells?

      Response: In the revised manuscript (pg 8, lines 14-16 and Fig 5H) consistent with our observation that while overexpression of HDAC11 suppresses the expression of Nupr1 in the both WT and L-DOXR cells, HDAC11 inhibitor treatment enhances Nupr1 expression in WT cells, inversely mirroring an unusual low expression of HDAC11 and high level of Nupr1 in L-DOXR cells. Conceivably, the increased Nupr1 expression reflects reverting of promoter acetylation.

      Minor:

      Comment #10: In Figure 4L, how many animals or tumors were in each of the treatment arms? Were the weights of all the tumors recorded as well? It would be meaningful to add this data, if available. The authors keep changing gene nomenclature throughout the manuscript, listing the gene names in either capital letters or the small-case. This can be made consistent.

      Response: We have used 6 mice per group and one tumor for one mouse due to the tumor <br /> size of L-DORX with the vehicle group. We also added new data showing the weights of the tumors in Figure supplement 2D. We apologize for the unmatched gene names. Following the reviewer’s suggestion, the names of genes/proteins have been changed in figures and legends to the recommendations of the HUGO gene nomenclature committee (HGNC).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      (1) Please expand methods with additional details related to cell co-culture, such as cell numbers and duration.

      We thank the reviewer for the careful reading and constructive suggestions and we are sorry to make you confused. We have added the experimental details (manuscript line 551-553) related to co-culture in the revised manuscript.

      (2) Please unify the writing of the abbreviation of small extracellular vesicles in the text, figure, and caption.

      Thank you for your comments. We have unified the abbreviation of extracellular vesicles to sEVs in the revised manuscript.

      (3) The effects of components other than sEVs in mechanically stimulated osteocyte CM on the proliferation of NSCLC cells should be evaluated.

      We evaluated the effects of SF, lEVs and sEVs in osteocyte CM on NSCLC cell proliferation under mechanical stimulation, and found that sEVs had the most obvious inhibition on NSCLC cell proliferation, as shown in the revised Supplemental Figure 4c, d.

      (4) In addition to osteocytes and osteoblasts, the effects of other types of cells on the proliferation of NSCLC cells should be detected. It is recommended to add at least one type of cell from an infrequent metastatic site of NSCLC as a negative control.

      We thank the reviewer for the suggestion. We added NCM460 cell line (derived from intestinal epithelium) as a negative control and found that NCM460 had no significant effect on NSCLC cell proliferation, as shown in Figure 1d. These experiments were conducted before our last submission.

      (5) The bone microenvironment is complex. It is recommended to evaluate the effect of bone marrow-derived sEVs on NSCLC to validate whether the tumor suppressive effect of osteocyte sEVs is unique.

      We thank the reviewer for the suggestion. We agree with the reviewer’s comments that the bone microenvironment is complex. We explored the effect of bone marrow-derived sEVs on NSCLC cell proliferation and found that bone marrow-derived sEVs promoted NSCLC cell proliferation, as shown in Supplemental Figure 2g, h in the revised manuscript.

      (6) The description of exercise preconditioning is not clear enough. It is recommended to supplement the pattern diagram to improve readability. Exercise preconditioning should be further discussed by the Authors.

      Thank you for your comments and we are sorry to make you confused. We have added the pattern diagram of the exercise preconditioning in Supplemental Figure 6a.

      Reviewer #2 (Recommendations For The Authors):

      (1) The histological images are analyzed in a qualitative manner, with no description of the methodology used. A quantitative assessment of the distance and level of Ki-67+ NSCLC cells needs to be performed in human and murine tissues. Because in bone metastases cancer cells are frequently mixed with bone marrow cells, the inclusion of a cell marker to identify NSCLC cells is needed for proper interpretation of the imaging data.

      We thank the reviewer for the careful reading and constructive suggestions. We conducted the suggested quantitative assessment and descripted the methodology in the revised manuscript. The results showed that Ki-67 was lower in tumor cells adjacent to bone tissue than in the surrounding tumor cells (Figure 1a, b).

      In order to effectively identify NSCLC cells in bone metastases, GFP-expressing NSCLC cells were used in the animal model. We have added the immunofluorescence analysis of GFP and CCND3 in Supplemental Figure 4e, 4g, 5 and 6b.

      (2) The authors rely on KI-67 as a marker of proliferation. Yet, it is intriguing that some osteocytes, non-proliferating cells by definition, are often positive for this marker, which questions the specificity of the staining. The authors should provide the proper immunostaining controls to check for specificity and use additional markers of proliferation to confirm these results.

      We thank the reviewer for the suggestions. Ki-67 staining was wildly used to determine the dormancy of tumor cells in previous studies [1-4]. To confirm the results of Ki-67 staining, we used cyclin D3 (CCND3) as an additional marker of proliferation as suggested by the reviewer. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of KI-67.

      (3) The lack of proper controls in the in vivo experiments makes the interpretation of the data difficult. For instance, in the preconditioning experiment, it is likely that the bone mass increases. thus, these mice start with high bone mass than the control mice. The lack of a proper control (naive mice exposed to moderate exercise) does not allow testing if the presence of cancer cells still promotes bone loss in this group. The authors need to include naive mice or analyze the bones from the non-injected contralateral legs.

      We thank the reviewer for the thoughtful comments and we are sorry to make you confused. We absolutely agree with the reviewer that the bone mass increases after exercise preconditioning. Multiple tissues and organ systems are affected by exercise, initiating diverse homeostatic responses. Although exercise preconditioning effectively suppressed bone metastasis progression of NSCLC as mentioned in the previous manuscript, we cannot immediately conclude that it is completely dependent on osteocytes to function. The mechanism of exercise preconditioning in suppressing bone metastasis progression is complex which still need further exploration. The revised manuscript has expanded the discussion on this area (manuscript line 326-328).

      (4)Further, validating the in vivo work with other osteocyte-like cells or primary osteocytes would have strengthened the results.

      We thank the reviewer for the suggestion. We have conducted the experiments of co-culture of MLO-A5 (another type of osteogenic cell line) and NSCLC cells as shown in Supplemental Figure 1g. Not surprisingly, MLO-A5 cells also had an inhibitory effect on proliferation of NSCLC cells.

      (5) The data on miRNA99b-3p on NSCLC in Supplementary Figure 3 is not convincing. The positive cells are difficult to see and most of the osteocyte lack nuclei. Better data, in humans and the mouse model, is needed to confirm that osteocytes produce miRNA99b-3p.

      We thank the reviewer for the comments and we are sorry to make you confused. In this study, we used miRCURY LNA miRNA detection probes in ISH without staining the nuclei in the tissues, which method have been used in our previous studies with others [5-7]. Detailed experimental procedures for ISH of miRNA have been added in the revised manuscript (manuscript line 461-474).

      (6) The authors do not provide a piece of data supporting that osteocytes are responsible for any of the effects seen by the interventions done in the in vivo models. Osteocytes, as well as other bone cells, can respond to mechanical stimulation and thus could virtually be responsible for the protective effects of mechanical loading or moderate exercise. In vivo experiments demonstrating a direct role of osteocytes-produced miRNA99b-3p are needed to support the notion that osteocytes maintain tumor dormancy in NSCLC bone metastasis.

      We thank the reviewer for the thoughtful comments and suggestion. We constructed in vivo model by injecting with antagomir-NC and antagomir-99b-3p with mechanical loading [8]. The results showed that the injection of antagomiR-99b-3p could partially and effectively rescue the inhibitory effect on NSCLC cell proliferation (Figure 4i-k).

      (7) Further, the authors solely rely on Ki-67 as a marker of dormancy. Completing this analysis with an assessment of a dormant gene expression signature or in vivo studies assessing tumor dormancy directly would be needed to confirm this notion.

      We thank the reviewer for the suggestion. We conducted the suggested experiment by using CCND3 as an additional dormancy marker. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of Ki-67.

      References

      [1] Guba M, Cernaianu G, Koehl G et al. A primary tumor promotes dormancy of solitary tumor cells before inhibiting angiogenesis. Cancer Res, 2001, 61: 5575-9.

      [2] Bliss Sarah A, Sinha Garima, Sandiford Oleta A et al. Mesenchymal Stem Cell-Derived Exosomes Stimulate Cycling Quiescence and Early Breast Cancer Dormancy in Bone Marrow. Cancer Res, 2016, 76: 5832-5844.

      [3] Correia Ana Luísa, Guimaraes Joao C, Auf der Maur Priska et al. Hepatic stellate cells suppress NK cell-sustained breast cancer dormancy. Nature, 2021, 594: 566-571.

      [4] Hu Jing, Sánchez-Rivera Francisco J, Wang Zhenghan et al. STING inhibits the reactivation of dormant metastasis in lung adenocarcinoma. Nature, 2023, 616: 806-813.

      [5] Song Qiancheng, Xu Yuanfei, Yang Cuilan et al. miR-483-5p promotes invasion and metastasis of lung adenocarcinoma by targeting RhoGDI1 and ALCAM. Cancer Res, 2014, 74: 3031-42.

      [6] Carotenuto Pietro, Hedayat Somaieh, Fassan Matteo et al. Modulation of Biliary Cancer Chemo-Resistance Through MicroRNA-Mediated Rewiring of the Expansion of CD133+ Cells. Hepatology, 2020, 72: 982-996.

      [7] Lv Yan, Wang Yin, Song Yu et al. LncRNA PINK1-AS promotes Gαi1-driven gastric cancer tumorigenesis by sponging microRNA-200a. Oncogene, 2021, 40: 3826-3844.

      [8] Zhang Yun, Li Shuaijun, Jin Peisheng et al. Dual functions of microRNA-17 in maintaining cartilage homeostasis and protection against osteoarthritis. Nat Commun, 2022, 13: 2447.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans. In this manuscript, the authors generated TRIP13 null mice and Flag-tagged TRIP13 knock-in mice to study its role in meiosis. They demonstrate that TRIP13 regulates MORMA domain proteins and is essential for meiotic completion and fertility. The main impact of this manuscript is its clarification of the in vivo function of TRIP13 during mouse meiosis and its previously unrecognized role as a dose-sensitive regulator of meiosis.

      Strengths:

      Two previously reported Trip13 mutations in mice are both hypomorphic alleles with distinct phenotypes, precluding a conclusion on its function. This study for the first time generated the TRIP13 null mice, definitively revealing the function of TRIP13 in meiosis. The authors also show the novel localization of TRIP13 at SC and its independence from the axial element components. The finding of dose-sensitive regulation of meiosis by TRIP13 has implications in understanding human meiosis and disease phenotypes.

      Weaknesses:

      This manuscript would be more impactful if more mechanistic advancements could be made. For example, the authors could follow up with one of the new interactors identified by MS to offer new insight into the molecular function of TRIP13.

      We agree that it would be interesting to follow up on new candidate interactors but think that it would be more feasible to follow up on them in future studies.

      Reviewer #2 (Public Review):

      Summary and Strengths:

      In this manuscript, Chotiner and colleagues demonstrated the localization of TRIP13 and clarified the phenotypes of Trip13-null mice in mouse meiosis. The meiotic phenotypes of Trip13 have been well characterized using the hypomorph alleles in the literature. However, the null phenotypes have not been examined, and the localization of TRIP13 was not clearly demonstrated. The study fills these important knowledge gaps in the field. The demonstration of TRIP13 localization to SC in mice provides an explanation of how HOMRA domain proteins are evicted from SC in diverse organisms. This conclusion was confirmed in both IF and TRIP13-tagged Tg mice. Further, the phenotypes of Trip13-null mice are very clear. The manuscript is well crafted, and the discussion section is well organized and comprehends the topic in the field. All in all, the manuscript will provide important knowledge in the field of meiosis.

      Weaknesses:

      The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. However, the authors did not examine meiotic recombination in the Trip13-null mice.

      Meiotic recombination was extensively characterized in Trip13 severe hypomorph mutants in two previous studies: gamma-H2AX, BLM, BRCA1, ATR, RPA, RAD51, DMC1, MLH1 (Li and Schimenti, 2007; Roig et al., 2010). All the meiotic defects in our Trip13-null mice were also present in Trip13 severe hypermorph mutants: meiotic arrest, defects in chromosomal synapsis, asynapsis at chromosomal ends, and accumulation of HORMAD1/2 on the SC axis. Therefore, the defects in meiotic recombination in Trip13-null mice are expected to be similar to those in Trip13 severe hypermorph mutants and thus we did not examine the proteins involved in meiotic recombination in the Trip13-null mutant.

      Reviewer #3 (Public Review):

      Summary:

      The authors perform a thorough examination of the phenotypes of a newly generated Trip13 null allele in mice, noting defects in chromosome synapsis and impact on localization of other key proteins (namely HORMADs) on meiotic chromosomes. The vast majority of data confirms observations of several prior studies of Trip13 alleles (moderate and severe hypomorphs). The original or primary aims of the study aren't clear, but it can be assumed that the authors wanted to better study the role of this protein in evicting HORMADs upon synapsis by studying phenotypes of mutants and better characterizing TRIP13 localization data (which they find localizes to the central element of synapsed chromosomes using a new epitope-tagged allele). Their data confirm prior reports and are consistent with localization data of the orthologous Pch2 protein in many other organisms.

      Strengths:

      The quality of data is high. Probably the most important data the authors find is that TRIP13 is localized along the CE of synapsed chromosomes. However, this was not unexpected because PCH2 is also similarly localized. Also, the authors use a clear null (deletion allele), whereas prior studies used hypomorphs.

      Weaknesses:

      There is limited new data; most are confirmatory or expected (i.e., SC localization), and thus the impact of this report is not high. The claim that TRIP13 "functions as a dosage-sensitive regulator of meiosis" is exaggerated in my opinion. Indeed, the authors make the observation that hets have a phenotype, but numerous genes have haploinsufficient phenotypes. In my opinion, it is a leap to extrapolate this to infer that TRIP13 is a "regulator" of meiosis. What is the definition of a meiosis regulator? Is it at the apex of the meiosis process, or is it a crucial cog of any aspect of meiosis?

      TRIP13 is not haploinsufficient, as Trip13 heterozygotes were still viable and fertile (albeit with defects in meiosis). TRIP13 is an ATPase and changes the conformation of meiosis-specific proteins such as HORMAD proteins. TRIP13 is essential for meiosis and its mutations cause defects in both meiotic recombination and chromosomal synapsis. Reviewer 1 stated that “TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans”. Therefore, we feel that TRIP13 can be called a regulator of meiosis.

      Reviewer #1 (Recommendations For The Authors):

      A schematic illustration of SC structure, the components involved, and the main finding, would be helpful for readers to better understand the advancement made by this study.

      We have now added a schematic illustration in a new panel - Figure 7C.

      Fig. 1B, the stage with diplotene cells should be XII.

      The pachytene cells (Pac) were mis-labelled as diplotene cells. Corrected.

      Fig. 1C, color mislabeled.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript will provide important knowledge in the field of meiosis. I support the publication of this study. I have some suggestions to improve and polish the manuscript.

      Major points:

      (1) The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. Given the function of HORMAD1 in meiotic recombination, it would be informative if the authors could examine how major makers of meiotic recombination behave in Trip13-null meiosis.

      Please see our response to Weaknesses from Reviewer #2.

      (2) Relating to the above point, the complete lack of synapsis on the sex chromosomes in the Trip13-null meiosis is impressive. This result raises a question as to whether the pathway to designate XY-obligatory crossover (which can be detected with large foci of ANKRD31 and MEI4/REC114 at PAR) is affected or not. It would be interesting to examine whether the ANKRD31 and MEI4/REC114 foci are present on PAR in Trip13-null meiosis.

      We have performed immunofluorescent analysis of REC114 in spermatocytes. In Trip13-null pachytene-like spermatocytes, X and Y chromosomes are not synapsed. REC114 still formed one focus each on the unsynapsed X and Y chromosomes. We have added this new data in the Results as a new supplementary figure (Figure 4 -supplement 1).

      (3) Figure 4 can be improved if there are quantified data for each phenotype. These phenotypes look nearly complete, but it would be informative to show the penetrance of these phenotypes.

      Because some chromosomes have unsynapsed ends, resulting in two centromere or telomere foci, the total number of centromere or telomere foci is always higher in Trip13-null pachytene-like spermatocytes than wild type pachytene spermatocytes. Therefore, we did not count the foci of centromeres and telomeres. Consistently, the centromere and telomere markers localized as expected in both wild type and Trip13-null spermatocytes.

      (4) I am not fully convinced by these photos: "synapsed sister chromatids (Figure 6B)" and "Sycp2-/- spermatocytes formed short stretches of synapsis (Figure 6C)". The authors may try confocal microscopy with super-resolution deconvolution as they did for other data.

      These have been previously demonstrated. The “synapsed sister chromatids (Figure 6B)” were previously demonstrated by confocal microscopy with super-resolution deconvolution (Guan et al., 2020). The short stretches of synapsis in Sycp2-/- spermatocytes was previously demonstrated by electron microscopy (Tripartite SC structure) and SYCP1 immunofluorescence (Yang et al., 2006). We have revised the text by citing the previous evidence and the publications.

      Minor points:

      (1) Line 19-21: "Loss of TRIP13 leads to meiotic arrest and thus sterility in both sexes. Trip13-null meiocytes exhibit abnormal persistence of HORMAD1 and HOMRAD2 on synapsed SC". These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles. This information can be added to the abstract. Otherwise, it sounds like these are totally new findings, as written.

      This information is now added to the abstract: “These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles.”

      (2) The introduction section seems too long and contains unnecessary information. Some molecular details that are not touched in the result section can be deleted (e.g., Line 65-73).

      We would like to keep the molecular details on the two conformation states, as it provides biochemical background on TRIP13-HORMAD interactions.

      (3) Introduction, Line 92. A rationale can be added as to why the authors characterized the Trip13-null allele.

      a rationale has been added as follows: “To determine the effect of complete loss of TRIP13, we characterized Trip13-null mice.”

      (4) Line 205: Typo "TRRIP13". Corrected.

      Reviewer #3 (Recommendations For The Authors):

      Just a few recommendations:

      (1) In my opinion, the title is an overreach. "Regulator" invokes other concepts such as transcription factors.

      Please see our explanation in response to weaknesses from Reviewer #3.

      (2) The first sentence of the results deals with TRIP13 expression in only 3 tissues. The authors might look at more comprehensive RNA-seq data from mice and humans.

      We examined TRIP13 protein expression in 8 mouse tissues by WB and found that TRIP13 protein was abundant in testis but present at a very low level in ovary and liver (Figure 1A). We feel that readers can easily look up the relative transcript levels of Trip13 in more tissues from mice and humans from NCBI database under “Gene”.

      (3) The null allele is semi-lethal. Is body size affected? Were the mice abnormal in any other ways, given that TRIP13 has been implicated in other diseases and processes, and is expressed in other tissues (TRIP13 stands for Thyroid receptor interacting protein).

      The body weight of 2-3 month-old males was not significantly different between wild type (24.3±2.8 g, n=5) and Trip13 KO mice (22.8±1.7 g, n=5, p=0.3, Student’s t-Test). We have included the body weight information in the revised manuscript. We didn’t observe abnormal somatic defects in the viable Trip13-null mice, nor did the authors report any in the Trip13 hypomorph mutants in two previous studies (Li and Schimenti, 2007; Roig et al., 2010).

      (4) Line 276 : It would be nice to elaborate on the "spatial explanation."

      We meant that TRIP13 localizes to SC while HORMAD proteins are removed from SC upon chromosomal synapsis, thus providing a spatial explanation. However, we have now deleted “spatial”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      However, there are several concerns to be explained more in this study. In addition, some results should be revised and updated.

      Thank you for your comments. The concerns were addressed by the description and experiment.

      Some results were revised and updated accordingly.

      Reviewer #2 (Public Review):

      The minor weakness of the study is inconsistent use of terminology throughout the manuscript, occasional logic-jump in their flow, and missing detailed description in methodologies used either in the text or Materials and Methods section, which can be easily rectified.

      Thank you for your review. We have revised the manuscript and corrected errors according to your comments.

      Reviewer #3 (Public Review):

      Importantly, besides the Miwi ubiquitination experiment which is performed in a heterologous and therefore may not be ideal for extracting conclusions, the possible involvement of ubiquitination was not shown for any other proteins that the authors found that interact with FBXO24. Could histones and transition proteins be targets of the proposed ubiquitin ligase activity of FBXO24, and in its absence, histone replacement is abrogated?

      Thank you for your comments. The histones and transition proteins were not found in the immunoprecipitates of FBXO24, suggesting they are not the direct targets of FBXO24, shown in Figure S3G.

      Miwi should be immunoprecipitated and Miwi ubiquitination should be detected (with WB or mass spec) in WT testis.

      We agree with this suggestion. In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      Therefore, the claim that FBXO24 is essential for piRNA biogenesis/production (lines 308, 314) is not appropriately supported.

      We appreciate the comment. We have revised the description and modified the claim on page 11.

      Reviewing Editor's note for revision

      (1) As noted by all three reviewers, as currently written the rationale to focus on MIWI is not entirely clear. A transitional narrative to focus on MIWI needs to be provided as well as an explanation for how the absence of FBXO24 as an E3 ubiquitin ligase is responsible for the observed mRNA and protein differential expression.

      We appreciate your comments. We have supplemented the transitional narrative by focusing on MIWI and explained mRNA and protein differential expression upon FBXO24 deletion, shown on Page 7 and Page 13, respectively.

      (2) As it can be indirect, mass spec detection of MIWI in testis co-IP and MIWI ubiquitination should be detected (with WB or mass spec) in WT testis.

      In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      (3) Please tone down the claim that FBXO24 is essential for piRNA biogenesis/production as it requires further evidence.

      We have revised the description and modified the claim on page 11.

      (4) Ontology analysis of the genes with abnormally spliced mRNAs to provide an explanation for developmental defects.

      In the revision, we have performed the ontology analysis and provided new data regarding the abnormally spliced genes, as shown in Figure S4D.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) The authors performed mainly with the WT (or knock-in) and Fbxo24-knockout mouse model. Do the heterozygous males and their sperm have any physiological defects like FBXO24-deficient mice?

      This is a good question. We did the phenotype analysis and found that heterozygous males are all fertile, and their sperm do not have any physiological defects.

      (2) Fbxo24-KO sperm carries swollen mitochondria. How do the mitochondria affect sperm function?

      Thank you for raising this interesting question. Based on our data and published literature, the defective mitochondria were associated with energetic disturbances and reduced sperm motility, as shown on Page 12.

      (3) TEM images show that Fbxo24-KO spermatids carry swollen mitochondria and enlarged chromatoid bodies. How the swollen mitochondria and enlarged chromatid are defective for sperm motility and flagellar development, requires more explanation. In addition, it is unclear how the enlarged diameter of the chromatoid body is critical for normal sperm development.

      Thank you for your comments. The chromatoid bodies are considered to be engaged in mitochondrial sheath morphogenesis. Analysis of the chromatoid bodies' RNA content reveals enrichment of PIWI-interacting RNAs (piRNAs), further emphasizing the role of the chromatoid bodies in post-transcriptional regulation of spermatogenetic genes. We added this explanation on Page 12-13.

      (4) The authors only show band images to compare the protein amounts between WT and KO sperm and round spermatids. As the blots for loading controls are not clear, the authors should quantify the protein levels and perform a statistical comparison.

      We quantified the protein levels and performed a statistical comparison, as shown in Figure S3B.

      (5) The authors show the defective sperm head structure from Fbxo24-KO sperm in Figure 5. However, the Fbxo24-KO sperm heads seem quite normal in Figure 3. How many sperm show defective sperm head structure? In addition, the authors observed altered histone-to-protamine conversion in sperm, but it is unclear whether the altered nuclear protein conversion causes morphological defects in the sperm head.

      We appreciate the comments. In our study, we found over 80% of Fbxo24 KO sperm showed defective structure in the sperm head. Altered histone-to-protamine conversion caused the decondensed nucleus of Fbxo24 KO sperm. Notably, in many knockout mice studies, impaired chromatin condensation is frequently associated with abnormal sperm head morphology, as shown in reference 15 of Page 8.

      (6) The authors compare the protein levels of RNF8, PHF7, TSSK6, which participate in nuclear protein replacement in sperm. However, considering the sperm is the endpoint for the nuclear protein conversion, it is unclear to compare the protein levels in mature sperm. The authors might want to compare the protein levels in developing germ cells.

      Thank you for your comment. Yes, we actually detected the protein levels of RNF8, PHF7, and TSSK6 in the testes, not in sperm. We have corrected it in the Figure 5E. We apologize for our carelessness.

      (7)This reviewer suggests describing more rationales for how the authors focus on the MIWI protein. Also, it is wondered whether MIWI is also detected from testis co-IP mass spectrometry.

      We agree with this suggestion. Since MIWI was a core component of CB and also identified as an FBOX24 interacting partner from our immunoprecipitation-mass spectrometry (IP-MS) (Table S1), we focused on the examination of MIWI expression between WT and Fbxo24 KO testes. We have added this description in the revision (see lines 191-193 on page 7).

      (8) The authors need to provide a more detailed explanation for how the altered piRNA production affects physiological defects in germ cell development. In addition, it will be good to describe more how the piRNAs affect a broad range of mRNA levels.

      Thank you for your comments. The previously published studies have demonstrated that piRNAs could act as siRNAs to degrade specific mRNAs during male germ cell development and maturation. We have cited these studies on lines 369-372 of Page 13.

      (9) The authors observed an altered splicing process in the absence of FBXO24. However, it is a little bit confusing how the altered splicing events affect developmental defects. Therefore, the authors should state which mRNAs have undergone abnormal splicing processes and provide ontology analysis for the genes.

      We have performed the ontology analysis and showed the new data in Figure S4D.

      Minor comments

      (1) Figure 1A-C - Statistical comparison is missed. Numbers for biological replication should be described in corresponding legends.

      Thank you for your careful review. We have provided the statistical comparison and the numbers for biological replication in the legends of Figure 1A-C.

      (2) Figure 1E, F - Current images can't clearly resolve the nuclear localization of the FBXO24 testicular germ cells. To clarify the intracellular localization, the authors should provide images with higher resolution.

      The resolution of Figure 1E, F was improved, as suggested. Thank you!

      (3) Figure 1E, F - Scale bar information is missing.

      The scale bars of Figure 1E, F were provided.

      (4) It will be much better to show the predicted frameshift and early termination of the protein translation in Fbxo24-knockout mice.

      The predicted frameshift of Fbxo24-knockout mice was added and shown in Figure S1B.

      (5) It is required to provide primer information for qPCR.

      The primer information for qPCR was provided, as shown in Table S7.

      (6) The authors describe that Fbxo24-KO sperm show abrupt bending of the tail. However, the description is unclear and the sperm shown in Figure 3C seems quite normal. The authors should clarify the abnormal bending pattern of the tail and show quantified results.

      Thank you for pointing out this issue. In Fbxo24 KO sperm, abnormal bending of the sperm tails mainly included neck bending and midpiece bending. We have shown them in Figure S3A.

      (7) The authors mention that Fbxo24-KO sperm have swollen mitochondria at the midpiece, but this is also unclear. How many mitochondria are swollen in Fbxo24-KO sperm?

      This is a good question. However, since it is very difficult to observe all of the mitochondria in each sperm using the electronic microscope, we could not quantify the swollen mitochondria in Fbxo24 KO sperm.

      (8) Scale bar information is missed - Fig 3C insets, Fig 3D, Fig 3F insets, 4A insets, Figure 4C insets.

      All the scale bars have been added.

      (9) How many sperm have annulus defects? In Figure 3F, WT sperm does not have an annulus, which could be damaged during sample preparation. Is the annulus defects in Fbxo24-KO sperm consistent?

      Thank you for asking these questions. Based on our results, about 30% of Fbxo24 KO sperm showed defective annulus structure. Since both TEM (Figure 3F) and SEM (Figure 3G) results clearly showed the defective annulus structure of Fbxo24 KO sperm, we believe the annulus defects are consistent and highly unlikely caused by sample preparation.

      (10) A Cross-section image for the endpiece of Fbxo24-KO sperm is not suitable. There is a longitudinal column structure of the principal piece.

      Thank you for your comments. It is difficult to observe a completely longitudinal structure of sperm tail under TEM. The cross-section of the endpiece and principal piece allowed us know the structure of the axoneme, ODFs and fibrous sheath (FS).

      (11) The endpiece of Fbxo24-KO sperm seems to have a normal axoneme. Do all endpieces of Fbxo24KO sperm have normal axoneme? Also, the authors need to describe whether an axonemal structure is damaged and disrupted in all Fbxo24-KO sperm.

      Our TEM data showed the axonemal structure was impaired in the endpiece of Fbxo24 KO sperm (See right panels of Figure 3H). Moreover, based on the ultrastructure analysis of TEM, we found over 90% of Fbxo24 sperm had a damaged axonemal structure.

      (12) Reference blots in Fig 3I, 3J, 4E (left), 5C and 5E are quite faint. The authors should replace the blot images.

      Thank you for pointing out this. We have rerun Western blot multiple times but could not obtain better images due to antibody sensitivity. However, we quantified the protein levels and performed a statistical comparison, as shown in Figure S3B, to establish a good readout from these images for the readers.

      (13) Loading controls are required - 7D-H.

      Done as suggested. Thanks!

      (14) How do the authors measure the midpiece length? From where to where? This should be clarified.

      Good question. We measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this on Page 16.

      (15) How are the bands for Fbxo24 shifted during IP in Fig 7A?

      The protein modification in the interaction may cause the band shift.

      (16) There are several typos throughout the manuscript. Please check carefully and fix them.

      Thank you for your careful review. We have corrected and fixed all the typos as far as we can.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      (1) Please provide a schematic of HA-Fbxo24 knock-in construct and strategy together with knockout (Figure S2) or even separately early in Figure S1. The description of using the transgenic mouse is mentioned even earlier than the knockout but there are no citations or methods provided in the text other than that listed in Materials and Methods.

      Thank you for your suggestion. As suggested, the schematic of the HA-Fbxo24 knock-in strategy has been supplemented in Figure S2A. The description of using the transgenic mouse has been added to the results, as shown on page 4 of lines 102-103.

      Also, it is not clear to what extent the phenotypic and molecular characterization of HA-transgenic mice is performed. For example, Lines 134-139: The use of Fbxo24-HA labeled transgenic mice results in the rescue of spermatogenesis and fertility as shown in Figure 2F by measuring the litter size. It is not clear how this observation leads the author to state that this rescues defects in spermiogenesis. Please clarify how and what other measures are taken to support this conclusion. Is the observed infertility due to defects in spermatogenesis or spermiogenesis?

      Thank you for your question. We crossed FBXO24-HATag males with FBXO24−/− females to obtain FBXO24−/−; FBXO24-HATag males. We examined the testes volume and histological morphology of FBXO24−/−; FBXO24-HATag males and found that they were similar to FBXO24+/−; FBXO24-HATag littermates, indicating that spermatogenesis was restored, as shown in Figure S2H.

      (2) Line 107 vs Line 114: Please use the terminology spermatogenesis and spermiogenesis consistently throughout the text. Earlier in the introduction, the authors clearly defined that spermatogenesis involves three phases, with the third phase referred to as spermiogenesis. However, the author concludes in the first line that "FBXO24 plays a role during spermatogenesis" while summarizing at the end of the paragraph that this protein is "expressed in haploid spermatids specifically during spermiogenesis". Therefore, it is not clear whether the authors conclude that FBXO24 is important for all of spermatogenesis (line 107) or only for part of spermiogenesis (line 114). Another example is line 219 vs. 238: At this point in the manuscript, it is again unclear whether the authors want to study molecular changes during spermatogenesis or spermiogenesis upon FBXO24 depletion. Many examples of such cases throughout the text, and it is recommended to be consistent in using more restrictive terminology whenever applicable for a clear interpretation.

      We thank you for your careful review. We have double-checked the terminology of spermatogenesis and spermiogenesis and made it consistent throughout the text of the revised manuscript.

      (3) It is not clear how rampant/frequent the Fbxo24-knockout sperm show defects in head morphology based on Figures 3C, 3F, and 5A since it seems that there are some sperm showing relatively normallooking sperm heads. Please provide quantification.

      We have performed the quantification and found that over 80% of Fbxo24 KO sperm showed defective structures in the sperm head.

      (4) Figure 3B: The authors describe in the figure legend that 3 mice were analyzed in each group. The standard deviation for the WT analysis is missing, or if the author wanted to set the WT value to 100%, the bar and scale shown on the y-axis do not fit. The value for WT looks more like 95%.

      We have indeed analyzed sperm motility based on the WT value set at 100% and have revised Figure 3B in the revision. We apologize for this oversight.

      (5) Figure 3 B and C: It is not clear how the motility is measured. Is CASA used (not described in Methods). The conclusion about abnormal flagellar bending in KO spermatozoa cannot be drawn from the static microscopic images alone. Please provide more details of motility analysis together with videos of live cell imaging.

      The sperm motility was measured manually using a hemocytometer, according to the reference.

      We provided the details of sperm motility analysis in the Materials and Methods section on Page 16.

      (6) Figure 3 I and J: These are one of a few figures that are not supported by statistical analysis. In particular, for 3I, GAPDH controls of WT and KO protein do not show equal loading, which could explain the lower expression of the KO protein. Please show normalized bar graphs with multiple biological replicates or at least show a representee technical replicat that shows equal loading of GAPDH to better support the conclusion.

      Thank you for your suggestion. Statistical comparison of relative protein expression was supplemented, as shown in new Figure S3B.

      (7) Line 184: It is not clear how the authors define a swollen mitochondrion? Are there any size criteria (roundness) that can be measured to distinguish between a swollen and a non-swollen mitochondrion? It is recommended to use another terminology as often 'swollen' implies there is a difference in osmolarity but there is no experiment to support this implication.

      Thank you for your comment. We have changed the “swollen” to “vacuolar” in the revision, as shown on Page 7.

      (8) Figure S4, without a bright field image, it is hard to see the purity and morphology of the isolated prep. Please provide the bright field images together or as overlaid images.

      We agree with your comment. We have provided the overlaid images in new Figure S4A.

      (9) There is a big logic jump in what prompts the authors to look MIWI protein level and link the observation to MIWI/piRNA pathway in both Introduction and Results while it is one of the main findings. It is recommended to provide a better rationale and logical flow in the text.

      Thank you for your suggestion. We have added a sentence explaining why we wanted to focus on studying MIWI expression (see lines 190-193 on page 7).

      Minor comments

      (1) Please keep all the conventions of gene vs. protein nomenclature. For example, write the genes mentioned in the figures in italics with the first letter in Capital, as it is done in the main part. Proteins should be in ALL CAPITAL like FBXO24.

      The names of gene and protein have been revised in the revision, as suggested.

      (2) In the MM section, the name of the manufacturer and the location of the materials used are missing in several sections. Please go back through the MM section and add this information in the appropriate places.

      Done as suggested. Thank you!

      (3) On page 4, the authors mentioned that "Further qPCR analysis of developmental testes and purified testicular cells showed that FBXO24 mRNA was highly expressed in the round spermatids and elongating spermatids (Fig 1B-C)". Please include statistical analyses for Fig 1B-C as well as for Fig 1A to support the written statements.

      Statistical comparison was supplemented, as shown in Figure 1. P-values are denoted in figures by *p < 0.05.

      (4) Figure 3E: Please describe in more detail how the length of the midpiece was measured. Was it based on TEM images or based on fluorescent images using MitoTracker?

      As we responded to Reviewer #1, we measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this in the Method and Material section on Page 16.

      (5) Line 431: In the "Electron Microscopy" section of the MM part, the author should indicate the ascending ethanol series (%) used.

      Done as suggested. Thank you!

      (6) Line 432: The thickness of the sections prepared is missing, as well as an indication of the microtome used.

      We have added thickness and the microtome in the Method and Material section on Page 16.

      (7) Line 433: If the generated tiff files have been processed with Adobe Photoshop, this information is missing.

      We have provided information on the usage of Adobe Photoshop for the generation of tiff files on Page 17.

      (8) Lines 445, 452, 467: In some places in the paper, the temperature is written with a space between the number and {degree sign}C, and sometimes it is not. Please go through the paper and make it consistent. The usual spelling is 4{degree sign}C.

      We have gone through the manuscript and checked all the spelling of temperature writing to make them consistent. Thank you for careful review.

      (9) Line 469: The gel documentation system used is not mentioned.

      Done as suggested. Thank you!

      (10) Line 469: The 'TM' should be superscripted.

      Done as suggested.

      (11) Line 489: A space is missing between the changes and the parenthesis.

      Done as suggested.

      (12) Line 495-496: The authors write that the fractions enriched with round spermatids after sedimentation were collected manually. Was a determination of cell concentration - e.g., 2 x106 cells/ml -performed after collection of the cells? How were the cells stored until use? Please add the sedimentation time and used temperature.

      Store the cell in the 1´ Krebs buffer on ice. The cell sediment was through a BSA density gradient for 1.5 h at 4°C. The cell concentration was determined after collection, as shown on Page 18.

      (13) Line 505: spelling error. Instead of " manufacturer's procedure" it is written manufactures' instructions.

      The spelling error was corrected.

      (14) Line 520: Please write a short sentence on how the purification of the 16-40 nt long RNA was performed.

      The length of 16–40 nt RNA was enriched by polyacrylamide gel electrophoresis. We added this information on Page 19 of line 531.

      (15) Line 528: The version of the used GraphPad software is missing.

      The version of GraphPad software was supplemented, as shown on Page 19.

      (16) Line 677: For qPCR analyses, the number of mice analyzed (N) and a statistical evaluation are missing.

      The statistical comparison and the numbers for biological replication were added, as shown on Page 26.

      (17) Figure 3D: Please add a scale bar.

      Done as suggested. Thanks!

      (18) Line 371 and Line 377: Two times "in summary" is written. Please make one summary for the whole paper.

      This sentence was revised, as shown in Page 13.

      (19) Line 382: To be consistent in the whole paper, please write Figure 10 in bold letters.

      Done as suggested.

      (20) Please make the size and font of the references consistent with the main text.

      Done as suggested. Thanks again for your careful review.

      Reviewer #3 (Recommendations For The Authors):

      I would like to see the description of the FBXO24 immunoprecipitation experiment performed in HEK293T cells. This somatic cell line does not normally express Miwi, so how Miwi was detected in FBXO24 mCherry IP beads? It is not mentioned if Miwi is expressed from a recombinant vector in this experiment. Similarly, I would like to see a better description of the experiment described in the same paragraph towards the end of it with the ubiquitin peptides, it is not clear.

      Thank you for your comments. FBXO24-mCherry was expressed in HEK293T cells and the immunoprecipitates was incubated with the protein lysate of the testes (see lines 268-272 on Page 10). The description of the ubiquitin experiment was added as well, as shown in lines 283-286 on Page 10.

      Line 263: I think the term ectopic here is not appropriate, a correction is needed.

      We have changed “ectopic” to “increased” in the revision (see line 268 on Page 10).

      I would like the authors to provide a tentative explanation or evidence of why FBXO24 KO males are completely sterile, even though there are still mature sperm produced with some motility. Since there are defects in nuclear condensation it will be very relevant to check DNA damage/fragmentation, which could contribute to the sterility phenotype.

      This is a good suggestion. We reanalyzed the sperm DNA damage by TUNEL staining and shown the new data in Figure S3E-F.

      Line 213: There have been some conflicting reports about the role of RNF8 in spermiogenesis, but a recent report has shown that RNF8 is not involved in histone PTMs that mediate histone to protamine transition (Abe et al Biol Reprod 2021 https://doi.org/10.1093%2Fbiolre%2Fioab132).

      Thank you for your comment. We have cited this critical reference and discussed it in Discussion section on Page 12.

      Figure 7: I would like to see zoomed-out views of the affected exons, so that flanking unaffected exons can be used as a reference for unaffected splicing. Most of the genome browser views in this image only show affected exons and it is impossible to see if these alone are affected or if the reduced RNAseq coverage in those exons is a result of overall reduced mapped reads in these genes. Also, a fixed Y axis with the same max value should be shown for these genome browser snapshots so that the expression level is comparable between the two genotypes.

      Thank you for your comments. Loading control of RT-PCR and scale range of Y axis were added in new Figure 7.

      Minor corrections:

      Line 70: correct "..functions as protein-protein interaction..".

      Thank you for your careful review. We have corrected this sentence (see line 69 on Page 3).

      Line 101: correct "..qPCR analysis of developmental testis..".

      We have corrected this sentence (see line 100 on Page 4). Thanks again.

      Line 116: correct "..results in detective..".

      Corrected.

      Line 186: correct ".. explored..".

      Corrected.

      Line 218: correct ".. gene expressions.

      Corrected.

      Line 221: correct "..genes significantly differentiated expressed".

      Corrected.

      Line 241: FBXO24 was shown earlier in both cytoplasm and nucleus.

      We have changed “FBXO24 is mainly confined to the nucleus” to “FBXO24 expressed in the nucleus”, as shown in line 247 on Page 9.

      Line 501-502: correct "..reverse transcriptional".

      “reverse transcriptional” was changed into “reverse transcription”, showing in Page 18.

      Line 686: correct ".. deficiency male..".

      Corrected.

      Line 769: correct "..Western blots were adopted..".

      Corrected.

      Line 784: correct "..WT tesis..".

      Corrected.

      I cannot understand exactly what is shown in Figure 9B. Some elements marked on the X-axis are single base locations (-2K, TSS, +2K) and others are stretches of sequences so they cannot be equivalent. Why there is only an intron shown? There should be a measure of normalized expression on the Y-axis.

      Thank you for your questions. The X-axis means that genome segments were scaled to the same size and were calculated the signal abundance, which was analyzed by computeMatrix. Aim to know the piRNA source, piRNA was mapped to the gene body, including introns, CDS and UTRs. The value of the Y-axis is the normalized count.

      Figure 6F is not needed.

      Figure 6F was used to illustrate the number of different types of mRNA splicing upon FBXO24 deletion in the round spermatids. To better understand the splicing for the reader, we decided to keep it.

      The last two paragraphs of the discussion seem to be redundant.

      Thank you for pointing out this. We have revised the last two paragraphs of the discussion.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      R. C. Edgar, et al., Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version did not necessarily fail on our dataset due to its size (ALE ran, but provided unrealistic parameter estimates and was not able to output possible reconciliations, as mentioned in our Material and Methods section). We think it most likely did not run because there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent transfers is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. Following a suggestion from reviewer #3, we are going to try running the dated version of ALE independently on the alpha and beta-coronaviruses, resulting in smaller datasets. This will help us elucidate whether the dated version of ALE fails due to data size or the absence of a codiversification pattern.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host transfers involve unsampled intermediate hosts. To address the reviewer's comment, we will better underline the importance of sampling biases in our main text and include the suggested references. We will also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text.

      We agree that distinguishing between alpha and beta coronaviruses will provide useful additional insights; we are going to run separate cophylogenetic analyses for these two sub-clades. We will report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that we will now discuss.

    1. Author Response

      Reviewer #1:

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:

      The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

      We thank the reviewer for their enthusiastic and positive comments on our work.

      Reviewer #2:

      Summary:

      In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:

      As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      We thank the reviewer for their careful critique of our work. Below we address each major concern.

      Major comments:

      (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      In this manuscript, as well as in our two previous publications (Singh et al., 2019; Fan et al.,2022), we have presented the results of no enzyme controls, +/- ZB dimers, no ATP controls, or AMP-PNP controls for our FRET-based, H2A.Z deposition assay (see also Figure S3). We do not observe significant levels of photobleaching in this assay, either during ensemble measurements or in an smFRET experiment. To aid the reader, we have added the AMP-PNP data for the experiment shown in Figure 1B. The results show there is less than a 10% decrease in FRET over 30’, and the signal from the double acidic patch disrupted nucleosome is identical to this negative control.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      We agree with the reviewer, that loss of FRET can be due to DNA unwrapping from the nucleosome. We have previously demonstrated this activity by SWR1C in our smFRET study (Fan et al., 2022). However, DNA unwrapping is highly reversible and has a time duration of only 1-3 seconds. We and others have not observed stable unwrapping of nucleosomes by SWR1C, but rather the stable loss of FRET reports on dimer eviction. We assume the reviewer is concerned about the rather large decrease in FRET signal shown in the AMP-PNP controls for Figure S3, panels A and D. For the other 7 panels, the decrease in FRET with AMP-PNP are minimal. In fact, if we average all of the AMP-PNP data points, the rate of FRET loss is not statistically different from no enzyme control reactions (nucleosome plus ZB dimers).

      Data for panels A and D used a 77NO nucleosomal substrate, with Cy3 labeling the linker distal dimer. This is our standard DNA fragment, and it was used in Figure 1B. The only difference between data sets is that the data shown in Fig 1B used nucleosome reconstituted with a Cy5-labelled histone octamer, rather than the hexasome assembly method used for Fig S3. Three points are important. First, for all of these substrates, we assembled 3 independent nucleosomes, and the results are highly reproducible. Two, we performed a total of 6 experiments for the 77NO-Cy5 substrates to ensure that the rates were accurate (+/-ATP). Third, and most important, we do not see this decrease in FRET signal in the absence of SWR1C (no enzyme control). This data was included in the data source file. Thus, it appears that there is significant SWR1C-induced nucleosome instability for these two hexasome-assembled substrates. We now note this in the legend to Figure S3. Key for this work, however, is that there is a large increase in the rate of FRET loss in the presence of ATP, and this rate is faster when a ZB dimer was present at the linker proximal location. In response to the last point, we state in the first paragraph of the results: “The dimer exchange activity of SWR1C is monitored by following the decrease in the 670 nm FRET signal due to eviction of the Cy5-labeled AB-Cy5 dimer (Figure 1A).”

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      See response to item 2 above

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      We apologize for not making this more explicit for each figure. The error bars report on 95% confidence intervals from at least 3 sets of experiments. This statement has been added to the legend.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      Our pincher model is based on three, independent sets of data, not just Figure 1C. First, as noted by the reviewer, we find that disruption of either acidic patch cripples the dimer exchange activity of SWR1C in the FRET-based assay. Whether the defect is identical to that of the double APM mutant nucleosome does not seem pertinent to the model. In a second set of assays, we used fluorescence polarization to quantify the binding affinity of SWR1C for wildtype nucleosomes, a double APM nucleosome, or each single APM nucleosome. Consistent with the pincher model, each single APM disruption decreases binding affinity at least 10-fold (below the sensitivity of the assay). Finally, we monitored the ability of different nucleosomes to stimulate the ATPase activity of SWR1C. Consistent with the pincher model, a single APM disruption was sufficient to eliminate nucleosome stimulation.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      For all data shown in our manuscript, at least three different nucleosome preparations were used. The impact of a ZB dimer on the rates of dimer exchange was highly reproducible among different nucleosome preparations and experiments. We also see reproducible ZB stimulation for three different substrates – with ZB on the linker proximal side, the linker distal side, and on one side of a core particle. We do not believe that our data are inconsistent with previous studies. First, the previous work referenced by the reviewer performed dimer exchange reactions with a large excess of nucleosomes to SWR1C (catalytic conditions), whereas we used single turnover reactions. Secondly, our study is the first to use a homogenous, ZA heterotypic nucleosome as a substrate for SWR1C. All previous studies used a standard AA nucleosome, following the first and second rounds of dimer exchange that occur sequentially. And finally, we observe only a 20-30% increase in rate by a ZB dimer (e.g. 77N0 substrates), and such an increase was unlikely to have been detected by previous gel-based assays.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      Removed

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      The term ‘inviable’ has been replaced with ‘poor’ or ‘slow growth’

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      We apologize for missing this mistake in the Figure 8 legend. We had inadvertently copied this from the euroscarf entry and forgot to edit the entry. We decided not to add all the plasmid names to the figure, as it was too cluttered. We state in the figure legend that the panels show growth of swc5 deletion strains harboring the indicated swc5 alleles on CEN/ARS plasmids.

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

      In our discussion, we had noted that the published cryoEM structure had suggested that the Swc2 subunit likely interacts with the acidic patch on the dimer that is not targeted for replacement, and we proposed that Swc5 interacts with the acidic patch on the exchanging H2A/H2B dimer. We have now made this more clear in the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important work by Park et al. introduces an open-top two-photon light sheet microscopy (OT-TP-LSM) for lesser invasive evaluation of intraoperative 3D pathology. The authors provide convincing evidence for the effectiveness of this technique in investigating various human cancer cells. The paper needs some minor corrections and has the potential to be of broad interest to biologists and, specifically, pathologists utilizing 3D optical microscopy.

      We would like to thank the editor for the positive general comment. We revised the manuscript by addressing the reviewers' comments.

      Public Reviews:

      Reviewer1

      Summary:

      A2. This manuscript presents the development of a new microscope method termed "open-top two-photon light sheet microscopy (OT-TP-LSM)". While the key aspects of the new approach (open-top LSM and Two-photon microscopy) have been demonstrated separately, this is the first system of integrating the two. The integration provides better imaging depth than a single-photon excitation OT-LSM.

      Strengths:

      The use of liquid prism to minimize the aberration induced by index mismatching is interesting and potentially helpful to other researchers in the field.

      • The use of propidium iodide (PI) provided a deeper imaging depth.

      Weaknesses:

      Details are lacking on imaging time, data size, the processing time to generate large-area en face images, and inference time to generate pseudo H&E images. This makes it difficult to assess how applicable the new microscope approach might be in various pathology applications.

      B2. We would like to thank the reviewer for the critical and positive comments. We agree with the reviewer that detailed information such as processing time is missing.

      The imaging time and data size were estimated per 1cm2 area and they were 7 min and 318 GB (= (7 × 60) s × 400 fps × (1850 × 512 × 2) byte) for each channel, respectively. The time for processing en-face images was relatively long by taking ~1.7 s Gb−1 after loading the image dataset at ~6.8 s Gb−1 in the current setting and needs to be shortened for intraoperative application. The time for converting OT-TP-LSM images of 512 x 512 pixels into virtual H&E staining images was 160 ms. This study was to address the current limitation of 3D pathology such as imaging depth and to develop the image processing to generate virtual H&E images. Further development such as speeding up the image processing would be needed. We added missing information and included some discussion on limitations of the new system and further development for intraoperative applications.

      C1-1. Revised manuscript, Discussion, pages 14-15 and lines 320-328

      Although OT-TP-LSM enabled high-speed 3D imaging, the post-processing time of the OT-TP-LSM image datasets was relatively long due to the large data size, sequential processing of dual channel images, and manual stitching. The long post-processing time needs to be resolved for intraoperative applications. To speed up processing, these processing steps can be performed using field-programmable gate array (FPGA)-based data acquisition with graphics processing unit (GPU)-based computing. The processing time can be further reduced by coding the algorithm in a C++-based environment. Furthermore, ImageJ-based software such as the Bigstitcher plugin can be used for automatic 3D image processing [44].

      C1-2. Revised manuscript, Materials and methods, Image acquisition and post-processing, page 17 and lines 390-398

      Image acquisition and post-processing

      Raw image datasets from dual sCMOS cameras were acquired and processed on a workstation with 128 Gb RAM and a 2 TB SSD drive. The imaging time and data size per 1cm2 area with 400 fps was 7 min and 318 GB (= (7 × 60) s × 400 fps × (1850 × 512 × 2) byte) for each channel, respectively. The raw image strip was sheared at 45° with respect to the sample surface, and a custom image processing algorithm was used to transform the image data in the XYZ coordinate. The processing for en-face image was conducted in MATLAB and took ~1.7 s Gb−1 after loading the image dataset at ~6.8 s Gb−1 in the current laboratory setting. Mosaic images were generated by joining the image strips manually.

      C1-3. Revised manuscript, Materials and methods, Virtual H&E staining of OT-TP-LSM via deep learning network, page 18 and lines 414-418

      The CycleGAN training and testing were performed using a Nvidia GeForce RTX 3090 with 24 GB RAM. The network was implemented using Python version 3.8.0 on a desktop computer with a Core i7-12700K CPU@3.61 GHz and 64 GB RAM, running Anaconda (version 22.9.0). The inference time for converting OT-TP-LSM patch image into virtual H&E patch image was measured as 160 ms.

      Reviewer 2

      Summary:

      A2. In this manuscript, the authors developed an open-top two-photon light sheet microscopy (OT-TP-LSM) that enables high-throughput and high-depth investigation of 3D cell structures. The data presented here shows that OT-T-LSM could be a complementary technique to traditional imaging workflows of human cancer cells.

      Strengths:

      High-speed and high-depth imaging of human cells in an open-top configuration is the main strength of the presented study. An extended depth of field of 180 µm in 0.9 µm thickness was achieved together with an acquisition of 0.24 mm2/s. This was confirmed by 3D visualization of human cancer cells in the skin, pancreas, and prostate.

      Weaknesses:

      The complementary aspect of the presented technique in human pathological samples is not convincingly presented. The traditional hematoxylin and eosin (H&E) staining is a well-established and widely used technique to detect human cancer cells. What would be the benefit of 3D cell visualization in an OT-TP-LSM microscope for cancer detection in addition to H&E staining?

      B2. We would like to thank the reviewer for the critical and positive comments. 3D pathology has been a long-standing research direction. The current pathology is 2D by examining H&E histology slides which were generated by thin sectioning biopsied and surgical specimens at different depths. The reliability of the pathological diagnosis suffers from under sampling of specimens. Although 3D pathology is possible by serial thin-sectioning, imaging, and then combining the images in 3D, it is not practice for clinical use due to the required labor and time.

      We demonstrated the advantages of OT-TP-LSM in various human cancer tissues. The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. We revised the manuscript to explain the benefits of 3D pathology with OT-TP-LSM.

      C2-1. Revised manuscript, Results, 3D OT-TP-LSM imaging of human skin cancers, pages 8-9 and lines 176-180

      Using 3D visualization, normal glandular structures in the dermis were distinguished from BCC tumor nests (Video 1). Both eccrine and sebaceous glands could appear similar to BCC nests in 2D images at certain depths. Hence, nondestructive 3D visualization of cell structures would be important for distinguishing them, serving as a complement to the traditional 2D H&E images.

      C2-2. Revised manuscript, Results, 3D OT-TP-LSM imaging of human pancreatic cancers, pages 10-11 and lines 222-232

      Magnified images of ROI 1 (PDAC) at two different depths showed irregularly shaped glands with sharp angles and 3D structural complexity including unstable bridging structure inside (Figure 4B). An irregular and distorted architecture amidst desmoplastic stroma is one of the important diagnostic factors for PDAC [35]. The cancer glands exhibited disorganized cancer cell arrangement with nuclear membrane distortion. Magnified images of ROI 2 showed both nonneoplastic ducts and cancer glands in different cell arrangements (Figure 4C). The nonneoplastic ducts showed single-layered epithelium with small, evenly distributed cells expressing relatively high nuclear fluorescence. Cancer glands, on the other hand, had disorganized and multilayered structure with large nuclei. OT-TP-LSM visualized the 3D invasiveness of cancer glands within tissues nondestructively, which could not be identified from limited 2D information.

      C2-3. Revised manuscript, Results, 3D OT-TP-LSM imaging of human prostatic cancers, page 11 and lines 251-252

      OT-TP-LSM provided histological 3D information equivalent to that of the H&E stained image without the need for sectioning.

      C2-4. Revised manuscript, Discussion, page 12 and lines 274-276

      OT-TP-LSM was developed for the rapid and precise nondestructive 3D pathological examination of excised tissue specimens during both biopsy and surgery, as a compliment to traditional 2D H&E pathology by visualizing 3D cell structures.

      C2-5. Revised manuscript, Discussion, page 13 and lines 284-288

      The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. These have been challenging with 2D histological images.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the following points to the authors to enhance the readability of the manuscript and to provide a strong narrative to explain their findings:

      A3. Line 54: For the non-expert readers, please provide more background information about the histopathology before introducing the hematoxylin and eosin staining.

      B3. We would like to thank the reviewer for the comment. As suggested by the reviewer, we added information about the current standard method of histopathological examination and its limitations.

      C3. Revised manuscript, introduction, page 4 and lines 56-64 Precise intraoperative cancer diagnosis is crucial for achieving optimal patient outcomes by enabling complete tumor removal. The standard method is the microscopic cellular examination of surgically excised specimens following various processing steps, including thin sectioning and hematoxylin and eosin (H&E) cell staining. However, this examination method is laborious and time-consuming. Furthermore, it has inherent artifacts that disturb accurate diagnosis, including tissue loss, limited two-dimensional (2D) information, and sampling error [1]. High-speed three-dimensional (3D) optical microscopy, which can visualize cellular structures without thin sectioning, holds promise for nondestructive 3D pathological examination as a complement of 2D pathology limitation [1-4].

      A4. Line 66 and 71: Please briefly introduce the cited studies to give some information about the previous studies. This will help to reader to understand the innovative aspects of your study.

      B4. We would like to thank the reviewer for the comment. As suggested by the reviewer, we added a brief introduction about the cited studies.

      C4. Revised manuscript, introduction, pages 4-5 and lines 71-82

      As a deep tissue imaging method, two-photon microscopy (TPM) has been used in both biological and optical biopsy studies [17-19]. TPM is based on nonlinear two-photon excitation of fluorophores and achieves high imaging depths down to a few hundred micrometers by using long excitation wavelengths, which reduce light scattering. Moreover, TPM provides additional intrinsic second harmonic generation (SHG) contrast for visualizing collagen fibers within the extracellular matrix (ECM). This feature proved advantageous for high-contrast imaging of cancer tissue and microenvironmental analysis [20-22]. However, TPM has low imaging speeds due to point scanning-based imaging. To address this limitation, two-photon LSM (TP-LSM) techniques were developed for high-speed imaging [23-27]. Although TP-LSM facilitated rapid 3D imaging of cancer cells and zebrafish, its applications were limited to small samples and biological studies due to geometric limitations.

      A5. Line 72: Please mention the importance and benefit of having an open-top configuration. I think this is one of the key aspects that provide a high imaging depth in OT-LP-LSM.

      B5. We would like to thank the reviewer for the comment. Conventional LSM techniques including TP-LSM have a configuration in which the illumination objective is oriented in the horizontal plane and imaging is performed with orthogonally arranged objectives. However, this geometry limited lateral sample size physically and it is unsuitable to image centimeter-scale large tissue. Therefore, we developed OT-TP-LSM for 3D large tissue examination. High imaging depths were achieved with long excitation wavelengths and long emission wavelengths of fluorophores. The open-top configuration does not contribute to the improvement of imaging depth. We revised the manuscript to explain the need for open-top configuration.

      C5. Revised manuscript, introduction, page 5 and lines 82-86

      Conventional TP-LSM had a configuration of a horizontally oriented illumination objective and a vertically oriented imaging objective. This geometry imposed limitations on the sample size, rendering it unsuitable for the examination of centimeter-scale specimens. TP-LSM with open-top configuration is needed for 3D histological examination.

      A6. Line 78: It would be nice to clearly quantify the imaging depth here.

      B6. We would like to thank the reviewer for the comment. Although we considered entering the quantitative imaging depth of OT-TP-LSM in the introduction section, we decided that it would be appropriate to present the quantitative imaging depth in the Results section and discuss it in the Discussion section.

      A7. Line 146: Please clearly explain the reason why the upper layers are not resolved.

      B7. We would like to thank the reviewer for the comment and we are sorry for the missing information. The skin epidermis has various cell layers and superficial layers are composed of less rounded and flat cells with relatively small cytoplasm. Therefore, cells in that layer could be difficult to resolve with the current system resolution because there is little space between nuclei. Additionally, strong autofluorescence signal in the stratum corneum could be the reason for preventing visualization of the cells in the superficial layer. We revised the manuscript to explain the reasons in detail.

      C7. Revised manuscript, Results, 3D OT-TP-LSM imaging of human skin cancers, page 8 and lines 159-163

      Keratinocytes in the basal layer were relatively large and individually resolved, while those in the upper layers were unresolved and appeared as a band. It could be attributed to the upper layers being comprised of flat cells with relatively small cytoplasm, resulting in little space between nuclei. Additionally, strong autofluorescence signal in the stratum corneum might prevent visualization of the cells in the superficial layer.

      A8. Line 253: Please explain the importance of visualization of 3D cell structures in cancer pathology. I think this should be stated clearly throughout the text as it is the key component of OT-LP-LSM to complement the traditional H&E staining. Also, referring to the non-destructive manner of your technique would help to emphasize this point.

      B8. We would like to thank the reviewer for the comment. As answered in A2, the current H&E histological examination has inherent limitations due to limited 2D information and sampling errors. To resolve this, OT-TP-LSM was developed for the visualization of 3D cell structures nondestructively as a complement to traditional slide-based 2D pathology. We demonstrated the advantages of OT-TP-LSM in various human cancer tissues. The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. We revised the manuscript to explain the benefits of 3D pathology with OT-TP-LSM.

      C8. Please refer to the answer in C2-1 – C2-5.

      A9. Figures: Please clearly mark the cancer regions in the images as indicated in Figure 5. It will help the reader to easily compare the healthy and invaded tissue parts.

      B9. We would like to thank the reviewer for the comment. We confirmed that the cancer area is not marked in Figure 4 of the pancreatic cancer tissue. We modified Figure 4 to mark the cancer region. Additionally, Figure 2 of the skin cancer tissue was also modified in this regard.

      C9. Modified Figure 2 and Figure 4.

      Author response image 1.

      Author response image 2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to first thank the Editor as well as the two reviewers for their enthusiasm and careful evaluation of our manuscript. We also appreciate their thoughtful and constructive comments and suggestions. They did, however, have concerns regarding experimental design, data analysis, and over-interpretation of our findings. We endeavored to address these concerns through refinement of our framing, inclusion of additional new analyses, and rewriting some parts of our discussion section. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review)

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      Thanks very much again for the evaluation and comments. Please find our revision plans to each comment below.

      The weak points of this paper are that its findings are not sufficiently supporting their arguments, and there are several reasons for this:

      (1) Does the grid-like activity reflect 'navigation over the social space' or 'navigation in sensory feature space'? The grid-like representation in this study could simply reflect the transition between stimuli (the length of bar graphs). Participants in this study associated each face with a specific length of two bars, and the 'navigation' was only guided by the morphing of a bar graph image. Moreover, any social cognition was not required to perform the task where they estimate the gridlike activity. To make social decision-making that was conducted separately, we do not know if participants needed to navigate between faces in a social space. Instead, they can recall bar graphs associated with faces and compute the decision values by comparing the length of bars. Notably, in the trust game in this study, competence and trustworthiness are not equally important to make a decision (Equation 1). The expected value is more sensitive to one over the other. This also suggests that the space might not reflect social values but perceptual differences.

      The Reviewer raises an interesting point. We apologize for not being clear enough to address this possibility in our original manuscript and we will improve the clarity in our revision. To address this issue, we would like to break it into two sub-questions and answer them separately: 1) Are participants merely memorizing the values associated with each avatar or do they place the avatars on a two-dimensional map in their internal representation. 2) If so, are the two dimensions of this internal representation social dimensions relating to competence and trust or sensory dimensions relating to bar height (i.e., social space or sensory space).

      For the first question, we hope our analysis of the distance effect on the reaction time in the comparison task can address this issue. Specifically, it came from the idea that distance is a measure of similarity between two avatars in the 2D social space. The closer two avatars are, the more similar they are, hence distinguishing them will be harder and result in longer reaction time. If participants are merely memorizing the avatars as six isolated instances without integrating them into a low-dimensional map, then avatars should be equidistant (as if they were lying on the vertices of a 5-simplex), and would not show a distance effect. Therefore, we interpreted the stronger distance effect as a behavioural index of having a better internal map-like representation. This approach is adopted from the work by Park et al. (2020), where they used the distance effect to demonstrate human brains map abstract relationships among entities from piecemeal learning.

      For the second question of ‘social space’ vs. ‘sensory space’, our study adopted the paradigm developed by, in which they used a similar way to construct a conceptual space and found that such space can be represented with grid-like code in the entorhinal and prefrontal cortex. We stayed close to the original design by Constantinescu et al. (2016) and hoped that our work could provide, to some extent, a close replication of their result but using non-spatial social concepts instead. Indeed, this led to the limitation of our study that participants are passively traversing the artificial space rather than actively navigating in the space to make decisions/inferences. And we did not find sufficient evidence as reported in previous grid-like coding fMRI studies. This may have to do with low signal quality in the medial temporal region, we are not entirely sure. Nevertheless, we don’t think our findings contradict or disprove previous findings in any way. Here we would also like to point to the work by Park et al. (2021). Their task involves making novel inferences in a 2D social hierarchy space and found that grid-like code in the entorhinal cortex and medial prefrontal cortex support such novel inferences. Hence, we argue that results from these studies and partial evidence from our study collectively support the idea that the entorhinal is important for representing abstract knowledge (spatial and non-spatial).

      (2) Does the brain have a common representation of faces in a social space? In this study, participants don't need to have a map-like representation of six faces according to their levels of social traits. Instead, they can remember the values of each trait. The evidence of neural representations of the faces in a 2-dimensional social space is lacking. The authors argued that the relationship between the reaction times and the distances between faces provides evidence of the formation of internal representations. However, this can be found without the internal representation of the relationships between faces. If the authors seek internal representations of the faces in the brain, it would be important to show that this representation is not simply driven by perceptual differences between bar graphs that participants may recall in association with each face.

      Considering these caveats, it is hard for me to agree if the authors provide evidence to support their claims.

      With regard to the common representation of faces, this is a potential limitation of our paradigm because our current task design didn’t include a stage of face presentation to properly test this question. With regard to the asymmetry between the two dimensions in determining expected value. We think that the prerequisite for identifying six-fold grid-like coding is to have an abstract space formed by orthogonal dimensions, i.e., competence and trustworthiness in our task are not correlated. In addition, the scanner task does not require computation of expected value. However, we do think that it is worth investigating whether the extent to which each dimension contributes to decision-making and inference will distort the grid-like representation of the map. Our prediction is that the entorhinal cortex will maintain a representation of the map invariant to this aspect so that it can support inferences in different contexts where different weights may be assigned to different dimensions. But this will be an interesting hypothesis for future studies to test. We hope that our revision plans with above considerations could address the Reviewer’s comments.

      Reviewer #2 (Public Review)

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits of warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid.

      From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Thank you very much again for your careful evaluation and thoughtful comments. Please find our response to the comments below.

      Weaknesses:

      In various parts of this manuscript, the authors appear to use a variety of terms to refer to the (ostensibly) same neural regions: prefrontal cortex, frontal pole, ventromedial prefrontal cortex (vmPFC), and orbitofrontal cortex (OFC). It would be useful for the authors to use more consistent terminology to avoid confusing readers.

      Thanks for pointing out the use of terms, we will try to improve that in the revision of our manuscript.

      Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      On a conceptual level, it is not entirely clear how this work advances our understanding of gridlike encoding of two-dimensional abstract spaces, or of social cognition. The study design borrows heavily from Constantinescu et al. 2016, which is itself not an inherent weakness, but the Constantinescu et al. study already suggests that grid codes are likely to underlie two-dimensional spaces, no matter how abstract or arbitrary. If there were a hypothesis that there is something unique about how grid codes operate in the social domain, that would help motivate the search for social grid codes specifically, but no such theory is provided. The authors do note that warmth and competence likely have ecological importance as social traits, but other past studies have used slightly different social dimensions without any apparent loss of generality (e.g., Park et al. 2021). There are some (seemingly) exploratory analyses examining how individual difference measures like social anxiety and avoidance might affect the brain and behavior in this study, but a strong theoretical basis for examining these particular measures is lacking.

      We acknowledge that we used very similar dimensions to the work by Park et al. (2021). While Park and colleagues (2021) took a more innovative and rigorous approach, we tried to stay close to the original design by Constantinescu et al. (2016) with the hope that our work could provide, to some extent, a close replication of their result. Our data was collected before the 2021 paper came out and as the comment points out, we did not find as complete and convincing evidence as in these previous grid-like coding fMRI papers. This may be due to low signal quality in the medial temporal region, we are not entirely sure. But we don’t think our current findings can contradict or disprove previous findings in any way.

      I found it difficult to understand the analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. It is possible that I have misunderstood the authors' logic and/or methodology, but I do not feel comfortable commenting on the correctness or implications of this approach given the information provided in the current version of this manuscript.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis aims to examine if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and test if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait. For the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioral index of having better internal map-like representation.

      It was puzzling to see passing references to multivariate analyses using representational similarity analysis (RSA) in the main text, given that RSA is only used in analyses presented in the supplementary material.

      We speculate if RSA in entorhinal ROI would be more sensitive than the wholebrain univariate analysis to identify grid-like code because a previous paper on grid-like code in olfactory space (Bao et al., 2019) didn’t identify grid-like representation with univariate analysis but identified it with RSA analysis. However, we failed to find evidence of grid-like code in the entorhinal ROI aligned to its own putative grid orientation with the RSA approach. We reported this result in the main text to show that we carried out a relatively thorough investigation to test the hypothesis using various approaches and decided to add references to the RSA approach in the main text as well.

      Reviewer #3 (Public Review)

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes and is relatively well-powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in the entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably by Park et al., 2021, Nature Neuroscience.

      Thanks very much again for your careful evaluation and comments. Please find our response to the comments below.

      Below, I raise a few issues and questions on the evidence presented here for a grid-like code as the basis of navigating abstract social space or social knowledge.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid-like, i.e., show six-fold symmetry. In real-world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two-dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raising the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much for the references to the papers that we haven’t considered enough in our discussion. We will endeavour to discuss the topic in more depth in our revision. In summary, we raise this discussion point because various research groups have found gridlike representations in 2D artificial conceptual space. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      Data and analysis

      (2) Concerning the negative correlation of distance with activation in the fusiform gyrus and visual cortex: this is a slightly puzzling but potentially interesting finding. However, could this be related to reaction times? The larger the distance, the longer the reaction times, so the original finding might reflect larger activations with smaller distances.

      Thanks very much for the suggestion. However, we didn’t find a correlation between response time in the choice stage in the scanner task and the negative distance activation in the fusiform gyrus (Figures below). Meanwhile, the morph period in each trial remains the same, the negative correlation of distance with activation in the fusiform gyrus could also be interpreted as a positive correlation of morphing speed with activation in the fusiform gyrus. Indeed, stronger negative activation indicates larger activation for smaller distances, but we are uncertain what it indicates concerning the functional role of Fusiform in our current task.

      Author response image 1.

      (3) Concerning the correlation of grid-like activity with behavior: is the correlation with reaction time just about how long people took (rather than a task-related neural signal)? The authors have only reported correlations with reaction time. The issue here is that the duration of reaction times also relates to the starting positions of each trial and where participants will navigate to. Considering the speed-accuracy tradeoff, could performance accuracy be negatively correlated with these grid consistency metrics? Or it could be positively correlated, which would suggest the grid signal reflects a good representation of the task.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. The reaction time used to calculate the distance effect is from a task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioural index of having better internal map-like representation. This was the motivation behind this analysis.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science,352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron, 107(6), 1226-1238 e1228. https://doi.org/10.1016/j.neuron.2020.06.030

    1. Author Response

      The following is the authors’ response to the original reviews.

      We wish to thank the reviewers for their helpful insightful comments. Their concerns were mainly related to the interpretation of the data, help in clarifying our statements and improving our discussion.

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting study It involves the utilization of hippocampal neuronal cultures from syntaxin 1 knock-out mice. These cultures serve as a platform for monitoring changes in synaptic transmission through electrophysiological recording of postsynaptic currents, upon lentiviral infection with various isoforms, chimeras, and point mutations of syntaxins.

      The authors observe the following:

      (1) Syntaxin2 restores neuronal viability and can partially rescue Ca2+-evoked release in syntaxin1 knock-out neurons that it is much slower (cumulative charge transfer differences) and with a clearly smaller RRP than when rescued with syntaxin1. In contrast, syntaxin2-mediated rescue leads to a high increase in spontaneous release (Figure 1). Convincingly, the authors conclude that syntaxin 1 is optimized for fast phasic release and for clamping of spontaneous release, in comparison with syntaxin2.

      (2) The replacement of the SNARE domain (or its C-terminal part) of syntaxin1 by the SNARE domain of syntaxin2 (or its C-terminal part) rescues the fast kinetics, but not the amplitude, of Ca2+-evoked release. This is associated with a decrease in the size of the RRP and an increase in spontaneous release. The probability of vesicular release (PVR) is a little bit increased, which is intriguing because a little decrease would be expected instead according to the reduced RRP, indicating that an enhancement of Ca2-dependent fusion is occurring at the same time by unknown mechanisms as the authors properly point out. The replacement of the Analogous experiments in which the SNARE domain of syntaxin1 is replaced into syntaxin2, reveals the exitance of differential regulatory elements outside the SNARE domain.

      (3) Different constructs of syntaxin 1 and syntaxin 2 display different expression levels. On the other hand, the expression levels of Munc-18 are associated with the characteristics of the transfected specific syntaxin construct. In any case, the electrophysiological phenotypes cannot be consistently explained by changes in Munc-18.

      (4) Mutations in several residues of the outer surface of the C-terminal half of the syntaxin1 SNARE domain lead to alterations in the RRP and the frequency of spontaneous release, but the changes cannot attributed to a change in the net surface charge, because the alterations occur even in paired mutations in which electrical neutrality is conserved.

      Comments:

      (1) This is a comment regarding the interpretation of the results. In general, the decrease in the RRP size is associated with the increased frequency of spontaneous release due to unclamping. The authors claim that both phenomena seem to be independent of each other. In any case, how can the authors discard the possibility that the unclamping of spontaneous release leads to a decrease in the RRP size?

      The main argument against the reduction of the RRP being caused by the observed increase in the mEPSC frequency is based on kinetics of refilling and depletion. The average time a vesicle fuses spontaneously after it becomes primed is 500 – 1000 seconds (spontaneous vesicle release rate – STX1 Figure 1, Figure 2 and Figure 3). The time it takes to refill the RRP after depletion is in the order of 3 seconds (Rosenmund and Stevens, 1996). Therefore, the refilling of the RRP is more than 100 times faster. Even when the spontaneous release would increase 5 fold, this would lead to less than 5 % of the steady state depletion of the RRP.

      (2) The authors have analyzed the kinetics of mEPSCs and found differences (Fig2-Supp. Fig1; Fig2-Supp. Fig1). It would be interesting and pertinent to discuss these data in the context of potential phenotypes in the fusion pore kinetics involving syntaxin1 and syntaxin2 and their SNARE domains. Indeed, the figure will improve by including averaged traces of mEPSCs.

      We thank the reviewer for the idea. Upon closer examination of the changes in mEPSC rise time and mEPSC decay time we noticed a minor slowing in the mEPSC rise time from 0.443ms (SEM0.0067) of STX1A to 0.535ms (SEM0.0151) for STX1A-2(SNARE) or 0.507ms (SEM0.01251) for STX1A-2(Cter), while the mEPSC half widths did not change significantly. It is possible that the measured change is related to the detection algorithm as mEPSC detection at elevated frequencies becomes more difficult due to increased overlap of event, and we therefore prefer to refrain from making any mechanistic claims.

      Minor comments:

      (1) Fig2 J; Fig 3 J. It is difficult to distinguish between different colors and implementing a legend within the graph will be very helpful.

      (2) Fig3 H. Please change the color of the box plot for Stx1 A to improve the contrast with the individual data points.

      (3) Page 6. Line 225. "Figure 2D and E" should be corrected to "Figure 2C and D"

      (1) Colors were changed for clearer visualization. (2) Unfortunately, changing the color did not improve the contrast with the individual plots. However, the numerical data is all included in the data sheets of the corresponding figure. (3) The mistake was corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 135-136: Are cited numbers cited in the text mean and SEM? Please indicate.

      Line 139 and Figure 1G: The difference between purple and blue was very hard to see on my hard copy.

      Line 152: Reference to Figure 1L should probably be 1K.

      Line 183: Reference to Figure 2C should probably be Figure 2F.

      Line 225: Reference to Figure 2D and 2E should probably be 2C and 2D.

      Line 239: Reference to Figure 3I should probably be 3H.

      All typos were addressed and colors were changed for better visualization.

      Line 210-211: Sentence ("One of the benefits..") is hard to understand.

      Thank you for noticing this mistake, agreeably the the sentence did not add any important or new information and so it was deleted. Additionally, the message of the mentioned sentence was already clearly stated in lines 209-211.

      Figure 4E-H misses data for STX2, for the figure to be arranged like Figure 5.

      Given that STX1 is the endogenous syntaxin in hippocampal neurons, we use it at a control for all the analysis done in STX2 and STX2-chimera experimental groups, thus it is included in Figure 3 and 5.

      It appears that the authors do not present or discuss the Western Blot in Fig. 4D. Are the quantitative results of the Western Blot consistent with or different from the quantification of the immunostainings (Fig. 4B-C)? A similar question for Figure 5D, which also seems not to be presented.

      In terms of quantification, we have relied mainly on the ICC experiments because they test also for putative impairments in transport to the presynaptic compartment. Our WB data are overall consistent with the results, but were not used to quantitate expression of our syntaxin chimeras and mutations in the STX1-null hippocampal neuron model.

      Figure 6F-G: The normalization of spontaneous vesicular release rates is not clear, because the vesicular release rates already contain a normalization (mEPSC rate divided by RRP size). Is a further normalization of the STX1A condition informative? The authors should consider presenting the release rates themselves. In any case, the normalization should be presented/explained, at least in the legends.

      The reviewer is in principle correct. Due to the large number of experimental groups we had to perform recordings from multiple cultures, where not all experimental groups were present, while the WT STX1 was present as a consistent control. The reduce culture to culture variability, additional normalization to the WT control group was performed. However, we also included the raw data numerical values in the data-source sheets (Normalized and absolute), which produce a similar overall outcome.

      References to Figure 7 subpanels (A, B, and C) are missing.

      Thank you for the comment. We have integrated all panels into one for better representation and understanding since they are representative of one another.

      Lines 330-339 and Figure 7 in Discussion: the authors discuss that adding the non-cognate STX2 SNARE-domain to syntaxin-1 might destabilize the primed state and decrease the fusion energy barrier (as indicated in Figure 7C). What is the evidence that the decrease in RRP size is not caused solely by the depletion of the pool due to the increased spontaneous fusion?

      Please see the comments to major point 2 of reviewer 1.

      Statistics: Missing is the number of observations (n) for all data. Even if all data points are displayed, this should be stated.

      N numbers are included in the data sheets attached to each figure.

      The statement (start of Discussion,) that the SNARE-domain of STX1 'plays a minimal role in the regulation for Ca2+-evoked release' is somewhat puzzling, since without the SNARE-domain in STX1 there would be no Ca2+-evoked release. I guess these statements (similar statements are found elsewhere) are due to the interesting finding that STX2 leads to a decrease in release kinetics, compared to STX1, and this is not (entirely) due to differences in the SNARE-domain. I would suggest rephrasing the finding in terms of release kinetics. Also, the statement in the last sentence of the Abstract is not clear.

      Thank you for pointing this out and we agree that our experiments showed strong impact of the syntaxin isoform exchange on release kinetics and overall release output. A similar comment came also from reviewer #3 and so, we have addressed both comments as one.

      Our confusing statement resulted from the order of the presented results and our summarizing remarks for each section. Our statement reflected our finding that mutating residues in the C-terminal part of the STX1 SNARE motif affected only spontaneous release and RRP size but not release efficacy. We now state (pg. 6 lines 231-233) that the data observed from the comparison of “the results obtained from the Ca2+-evoked release between STX1 and STX2 support major regulatory differences of the domains outside of the SNARE domain between isoforms”.

      We have changed the abstract pg. 2 lines 55-56

      We have changed the introduction pg. 3 lines 102-105 for a better contextualization.

      We have changed the start of the discussion pg. 9 lines 250-252 for better contextualization.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, Salazar-Lázaro et al. presented interesting data that C-terminal half of the Syx1 SNARE domain is responsible for clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release. The authors routinely utilized the chimeric approach to replace the SNARE domain of Syx1 with its paralogue Syx2 and analyzed the neuronal activity through electrophysiology. The data are straightforward and fruitful. The conclusions are partly reasonable. One obvious drawback is that they did not explore the underlying mechanism. I think it is easy for the authors to carry out some simple assays to verify their hypothesis for the mechanism, instead of just talking about it in the discussion section. In all, I appreciate the data presented in the manuscript. If the authors could supply more data on the mechanisms, this would be important research in the field. Some critical comments are listed below:

      We thank the reviewer for his/her comments and suggestions.

      Major comments:

      (1) In pg.3, lines 102-104, the authors stated that 'We found that the C-terminal half of the SNARE domain of STX1.. ..while it is minimally involved in the regulation of Ca2+-evoked release.' But in pg.5, lines 174-176, they wrote that 'Replacement of the full-SNARE domain (STX1A-2(SNARE)) or the C-terminal half (STX1A-2(Cter)) of the SNARE domain of STX1A with the same domain from STX2 resulted in a reduction in the EPSC amplitude (Figure 2B).' and in pg.5-6, lines 197-199, they wrote that 'Taken together our results suggest that the C-terminal half of the SNARE domain of STX1A is involved in the regulation of the efficacy of Ca2+-evoked release, the formation of the RRP and in the clamping of spontaneous release.' It puzzles me a lot as to what the authors are really trying to express for the relationship between C-half of the SNARE complex and Ca2+-evoked release (i.e., minimally involved or significantly participate in the process?). Please clarify and reorganize the contexts.

      Please see our reply to the last comment of reviewer 2.

      (2) Figure 1-figure supplement 1, the authors should analyze Syx1/VGlut1 level additionally. And, if possible, compare the difference between Syx1/VGlut1 and Syx2/VGlut1.

      The levels of STX1/VGlut1 and STX2/VGlut1 were analyzed in detail in Figures 4 and 5.

      The direct comparison between the expression levels of these two proteins is not possible since affinities of the antibodies to the target proteins are different and can induce potential biases. While this could be overcome by the use of a FLAG-tag to the syntaxin proteins, we have not utilized this approach in this publication. We in addition inferred sufficient and comparable expression of both syntaxins from their ability to rescue some of syntaxin1 loss of function phenotypes.

      (3) Figure 2D only analyzed the EPSC half-width, could the author alternatively analyze the rise/decay time? Also, in Figure 3-figure supplement 1, does it refer to the kinetic parameters of Syx2-1A in Figure 3? It is very confused.

      We have changed the text accordingly and each parameter is referenced to its corresponding figure for clarity. As for the decay and rise time of STX1 and STX1-chimeras, they are in Figure 2-figure supplement 1A and B.

      (4) On pg.4, lines 151-152, 'Finally, no change was observed in the paired-pulse ratio (PPR) between STX1A and STX2 groups (Figure 1L).' does not contain any explanations and comments for this observation in the texts.

      The small EPSC amplitudes and altered kinetics on the STX2 constricts (Figure 1 and Figure 3) have made it more difficult to quantitate paired pulse experiments. Therefore, we preferred not to overinterpret these measurements. The findings that the paired pulse data were not significantly different, fit with the vesicular release probability measurements which showed no major changes. We have made our statement on this basis.

      (5) On pg.6, lines 235-236, the authors wrote that 'Additionally, we found that only STX2-1A(SNARE) and STX2-1A(Cter) could rescue the RRP to around double of what we measured from STX2 and STX2-1A(Nter) (figure 3F)'. However, in Figure 3F, the authors indicated 'n.s.' (p>0.05) for the differences between STX2 and STX2-1A(SNARE)/STX2-1A(Cter). It is perplexing how the authors interpret their data. Definitely, the p-value could not be arbitrarily used as a criterion of difference. An easier way is that indicating the exact p-values for each comparison (indicate in figure legends or list in tables).

      We apologize for any confusion, and hope the modification gives more clarity in our interpretation. The calculated p-values are included in attached data source tables and hope this will provide clarity to our comparative analysis. We have changed the text in pg 7 lines 238-241 and are cautious to overinterpret these results and rely more on the data observed in STX1A-chimeras, which show significant changes in the RRP.

      (6) I noticed that the authors preferred using 'xx% increase/decrease' or 'xx-fold increase/decrease' to interpret their inter-group data. I would doubt whether the interpretations are appropriate. First, it seems that most of the individual scatters from one set were not subject to Gaussian distribution; also, the authors utilized non-parameter tests to compare the differences. Second, the authors did not explicitly indicate the method to calculate the % or fold, e.g., by comparing mean value or median. I think it is a bad choice to use the median to calculate fold changes; meanwhile, the mean value would also be biased, given the fact that the data were not Gaussian-distributed. The authors should be cautious in interpreting their data.

      We thank the reviewer for pointing the inaccuracy of our descriptions and have included the parameter used to calculated the percentage and fold increase/decrease in the materials and methods section. Specifically, the mean. Our intention is to plainly state the amount of change seen in a parameter based on the observed changes in the mean value. We agree with the reviewer that interpreting this could be problematic if we are speculating possible mechanisms. Further test should be conducted as to state whether similar increase/decrease changes in a parameter are due to the disturbance of the same mechanisms or different. E.g., we discussed whether the regulation of SYT1 might be or not be the mechanism affected in some of the chimeras that show an increase in the spontaneous release rate, for the release rate observed in some is massively higher than that seen in SYT1-KO (Bouazza-Arostegui et al., 2022). It is tempting to speculate that it could be due to other mechanisms based on the differences in the changes. For this reason, we have given an array of possible mechanisms affected when we manipulate the SNARE domain of STX1.

      (7) The authors routinely analyzed the levels of Munc18-1 in neuronal lysates by WB and Munc18-1/VGlut1 by immunofluorescence in various Syx1 mutants. However, in my view, these assays were slightly indirect. It is evident that the SNARE domain of Syx1 participates in the binding to Munc18-1 according to the atomic structures (pdb entries: 3C98 and 7UDB). Meanwhile, Han et al. reported that K46E mutation (located in domain 1 of Munc18-1) strongly impairs Syx1 expression, Syx1-interaction, vesicle docking and secretion (Han et al., 2011, PMID: 21900502). Intriguingly, the residue K46 of Munc18-1, which is close to D231/R232 of Syx1, may have potential electrostatic contacts to D231 and R232 of Syx1. This is reminiscent of the possibility that Syx1D231/R232 and some Syx1-2 chimeras lost their normal function through their defective binding to Munc18-1.nmb, To better understand the underlying mechanism, the authors may need to carry out in vivo and/or in vitro binding analysis between syntaxin mutants/chimeras and Munc18-1. They also need to conduct more discussions about the issue.

      We express our gratitude for the identification of a previously overlooked aspect in our investigation of the interplay between Munc18-1 and STX1. In response, we have incorporated additional discourse on this matter in pg11 lines 419-431.

      Additionally, we appreciate the thoughtful suggestion regarding additional experiments to further explore the molecular relationship between Munc18-1 and STX1. We agree that co-immunoprecipitation experiments (either by using an antibody against Munc18-1 or STX1 and STX2) would offer greater insight into whether the binding of these proteins is affected in the isoform or the mutants. Notably, we performed immunoprecipitation experiments by using neuronal lysates of the corresponding groups and using STX1A and STX2 antibodies for the pull-downs. However, we were unable to co-IP Munc18-1 when doing so. Changing the conditions of the experiment did not yield better results and so these experiments remained inconclusive for the moment. For this reason, we included it as an open question and a potential concluding hypothesis of the molecular mechanism. However, Shi et al., 2021, have performed co-IP assays using Munc18-1-wt and a mutant form which affects the binding to the C-terminal half of the SNARE domain of STX, and STX1-wt and a STX mutants targeting some of our residues of interest and showed a decrease in the pulled-down levels of Munc18-1 using HeLa cells. We have made sure to mention the conclusion of this important publication in our discussion.

      (8) The third possible mechanism (i.e., interaction with Syt1) proposed by the authors seems more reasonable. However, the discussions raised by the authors were not enough. For instance, plenty of literature has indicated that Syt1 may participate in synaptic vesicle priming through stabilizing partially or fully assembled SNARE complex (Li et al., 2017, PMID: 28860966; Bacaj et al., 2015, PMID: 26437117; Mohrmann et al., 2013, PMID: 24005294; Wang et al., 2011; PMID: 22184197; Liu et al., 2009, PMID: 19515907); complexins are also SNARE binding modules that regulate synaptic exocytosis. Lack of complexins could lead to unclasping of spontaneous fusion of synaptic vesicles, though it causes severe Ca2+-triggered release at the same time (Maximov et al., 2009, PMID: 19164751). Meanwhile, different domains of complexin may accomplish different steps of SV fusion, early research had indicated that the C-terminal sequence of complexin is selectively required for clamping of spontaneous fusion and priming but not for Ca2+-triggered release (Kaeser-Woo et al., 2012, PMID: 22357870). Likewise, if possible, the authors may need to carry out in vivo and/or in vitro binding analysis to confirm their hypothesis.

      The exploration of complexin´s involvement was limited in our study primarily due to our methodological focus on comprehending molecular mechanisms concerning the sequence disparities between STX1 and STX2. Our laboratory has studied the role of Complexin extensively, and we certainly have had a possible involvement in mind. However, since the sites identified on syntaxin are either conserved between STX1 and STX2 or not close to the central or accessory helical domains of complexin, we did not perform experiments to test putative interactions, and we refrained from discussing complexin in this paper.

      (9) Lastly, I would suspect that whether the defects of Syx2 and Syx1 chimeras were caused by the SNARE complex itself, from another point of view that is different from the hypothesis raised by the authors. Changing the outward residues (or we say the solvent-accessible residues) of the SNARE complex may affect the stability, assembly kinetics, and energetics (Wang and Ma, 2022, PMID: 35810329; Zorman et al., 2014, PMID: 25180101), especially for the C-terminal halves. Is this another possible mechanism through which the C-terminus of Syx1 might contribute to SV priming and clamping of spontaneous release? The authors should at least conduct some discussions about the point.

      Thank you for this suggestion. We indeed assumed that since the hydrophobic layers of the SNARE domains that form the hydrophobic pocket of STX2 and STX1 are mainly conserved, that the intrinsic stability of the SNARE complex is largely unchanged. Additionally, Li et al., (2022) PMID: 35810329 examined the stability of the alfa-helix structure of the SNARE domain of SNAP25. And while they found no changes in the stability and formation of the alfa-helix when mutating outwards-facing residues for methodological purposes (bimane-tryptophan quenching), their study did not selectively explore the effect of mutations of outer-surface residues on the stability of the alfa-helix.

      Zorman et al., (2014) PMID: 25180101, as noted by the reviewer, observed that changes in the sequence of the SNARE domain (by using SNARE proteins from different trafficking systems (neuron, GLUT4, yeast…) correlated with changes in the step-wise SNARE complex assembly. However, they also did not selectively mutate the outer solvent-accessible residues, hindering conclusive speculations in the contribution of said residues on the kinetics and energetics of assembly and intrinsic stability of the SNARE complex.

      Upon petition of the reviewer, we have added this paragraph to discuss an additional mechanism:

      “As a final remark, it is possible that the changes in the spontaneous release rate and the priming stability may stem from a reduced stability of the SNARE complex itself through putative interactions between outer surface residues. Studies of the kinetics of assembly of the SNARE complex which mutate solvent-accessible residues in the C-terminal half of the SNARE domain of SYB2 have shown reduction in the stability of the SNARE complex assembly and are correlated with impaired fusion (Jiao et al., 2018). However, STX1 mutations of outward residues were inconclusive and were always accompanied by hydrophobic layer mutations (Jiao et al., 2018), which affect the assembly kinetics and energetics of the SNARE complex (Ma et al., 2015). Single molecule optical-tweezer studies have focused on the impact of regulatory molecules on the stability of assembly such as Munc18-1 (Ma et al., 2015; Jiao et al., 2018) and complexin (Hao et al., 2023), or on the intrinsic stability of the hydrophobic layers in the step-wise assembly of the SNARE complex (Gao et al., 2012; Ma et al., 2015; Zhang et al., 2017). Although the conserved hydrophobic layers in the SNARE domains of STX1A and STX2 (Figure 1) suggest unchanged zippering and intrinsic stability of the complex, further studies addressing the contribution of surface residues on the stability of the alfa-helix structure of the SNARE domain of STX1 (Li et al., 2022) or the stability of the SNARE complex should be conducted.”

      Minor comments:

      (1) In pg.6, line 236, 'figure 3F', the initial 'f' should be uppercased.

      (3) On pg.11, line 396, the section title 'The interaction of the C-terminus of de SNARE domain of STX1A with Munc18-1 in the stabilization of the primed pool of vesicles.' The word 'de' is confusing, please check.

      (4) In pg.12, line 446, the section title, should 'though' be 'through'?

      These comments have been acknowledged and changed. Thank you

      (2) In pg.7, line 239, '..had an increased PVR (Figure 3G), no change in the release rate (Figure 3I)', should Figure 3I be Figure 3H? and line 240, 'and an increase in short-term depression during 10Hz train stimulation (Figure 3I)', should Figure 3I be Figure 3J? If so, Figure 3I will not be cited in the texts and lack adequate interpretations. Please check.

      We apologize for the oversight in not referencing this specific subpanel of the figure and have incorporated the reference in the text. Additionally, our interpretation of this data is connected to the mechanisms that govern efficacy of Ca2+-evoked response, and its dependence on the integrity of the entire-SNARE domain. We wish to highlight the modifications made to the discussion on the regulation of the Ca2+-evoked response based on previous reviewer comment #1, and a similar comment from reviewer #2 (as stated previously).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer 1 (Public Review):

      Summary:

      The authors set out to clarify the molecular mechanism of endocytosis (re-uptake) of synaptic vesicle (SV) membrane in the presynaptic terminal following release. They have examined the role of presynaptic actin, and of the actin regulatory proteins diaphanous-related formins (mDia1/3), and Rho and Rac GTPases in controlling the endocytosis. They successfully show that presynaptic membrane-associated actin is required for normal SV endocytosis in the presynaptic terminal and that the rate of endocytosis is increased by activation of mDia1/3. They show that RhoA activity and Rac1 activity act in a partially redundant and synergistic fashion together with mDia1/3 to regulate the rate of SV endocytosis. The work adds substantially to our understanding of the molecular mechanisms of SV endocytosis in the presynaptic terminal.

      Strengths:

      The authors use state-of-the-art optical recording of presynaptic endocytosis in primary hippocampal neurons, combined with well-executed genetic and pharmacological perturbations to document effects of alteration of actin polymerization on the rate of SV endocytosis. They show that removal of the short amino-terminal portion of mDia1 that associates with the membrane interrupts the association of mDia1 with membrane actin in the presynaptic terminal. They then use a wide variety of controlled perturbations, including genetic modification of the amount of mDia1/3 by knock-down and knockout, combined with inhibition of activity of RhoA and Rac1 by pharmacological agents, to document the quantitative importance of each agent and their synergistic relationship in regulation of endocytosis.<br /> The analysis is augmented by ultrastructural analyses that demonstrate the quantitative changes in numbers of synaptic vesicles and in uncoated membrane invaginations that are predicted by the optical recordings.

      The manuscript is well-written and the data are clearly explained. Statistical analysis of the data is strengthened by the very large number of data points analyzed for each experiment.

      Weaknesses:

      There are no major weaknesses. The optical images as first presented are small and it is recommended that the authors provide larger, higher-resolution images.

      Response: We thank the referee for these highly positive remarks. In response, we now provide larger, high-resolution images as requested.

      Reviewer 2 (Public Review):

      Summary:

      This manuscript expands on previous work from the Haucke group which demonstrated the role of formins in synaptic vesicle endocytosis. The techniques used to address the research question are state-of-the-art. As stated above there is a significant advance in knowledge, with particular respect to Rho/Rac signalling.

      Strengths:

      The major strength of the work was to reveal new information regarding the control of both presynaptic actin dynamics and synaptic vesicle endocytosis via Rho/Rac cascades. In addition, there was further mechanistic insight regarding the specific function of mDia1/3. The methods used were state-of-theart.

      Weaknesses:

      There are a number of instances where the conclusions drawn are not supported by the submitted data, or further work is required to confirm these conclusions.

      Response: We thank the referee for his/ her thorough reading of the manuscript and the thoughtful comments and questions. We have conducted additional experiments and made textual change to our manuscript to address these points and to further strengthen the conclusions as detailed in our response to the recommendations for authors.

      Recommendations for the authors

      Reviewer 1 (Recommendations For The Authors):

      Most of the figures contain images that are too small to be easily interpreted because the resolution is degraded when they are enlarged in the PDF file. The authors should redesign the figures so that the letters marking each panel are smaller, and the size of each data panel is much larger (at least twice as large with increased resolution). There is, at present, a great deal of white space in most of the figures that should be reduced to make room for larger, higher-resolution images. Larger fonts should be used for annotations of the images so that they are easier to read. The data appears to be very high quality, but it is presented at a size and resolution that don't do it justice.

      Response: We thank the referee for his/ her helpful comments. In response to the referee’s comment, we have carefully re-arranged all figures and now provide larger, high-resolution images.

      Reviewer 2 (Recommendations For The Authors):

      Major points

      (1) Figure 1 - While there is a rationale for employing a cocktail of drugs to interfere with actin dynamics, it would be highly informative to determine the effect of these modulators in isolation. This is important, since in their previous publication (Soykan et al Neuron 2017 93:854) the authors demonstrated that latrunculin had no effect, while jasplakinolide accelerated endocytosis of originating purely from Y-27362 and ROCK kinase inhibition, rather than destabilisation/stabilisation of actin. It will be key to dissect this by examining the effect on endocytosis of both 1) a cocktail of latrunculin/jasplakinolide and 2) Y-27362 alone.

      Response: We thank the referee for highlighting this interesting point. We have now experimentally addressed the effect of latrunculin (L), jasplakinolide (J) and the ROCK inhibitor Y-27362 (Y) either alone or in combination on the kinetics of synaptic vesicle (SV) endocytosis(new Fig. 1-Supplement 1C,D). We now demonstrate that application of the ROCK inhibitor Y-27362 or the combination of latrunculin (L) and jasplakinolide (J) have no effect on Syph-pH endocytosis. Combined use of jasplakinolide (J) and the ROCK inhibitor Y-27362 (Y) has a small phenotype. In contrast, a mix of all three inhibitors (JYL) potently impairs endocytosis kinetics at hippocampal synapses. These data demonstrate that actin dynamics are required for SV endocytosis, while ROCK inhibition alone does not appear to impair endocytosis kinetics. We note that our data are in line with a study by Ann Saal et al (2020) who reported a lack of effect of ROCK inhibition on the kinetics of Synaptotagmin1-CypHer retrieval.

      (2) Figure 1 - There are clear effects on the retrieval of pHluorin reporters and also endogenous vGAT in the presence of disruptors of actin function. However, there was no assessment of the impact of these interventions on either neurotransmitter release or SV fusion (with the exception of 1 condition with one stimulus train (Fig S1D), and the effect of Rac modulation in Fig S6F). As quoted by the authors, previous studies using knockout of beta- or gamma-actin have shown a profound effect on these parameters in hippocampal neurons, which has the potential to impact the speed and extent of compensatory endocytosis. The authors will already have this data from the use of the two reporters (pHluorn and GAT-cypHer), and it is important to include this to allow interpretation of the effect on endocytosis observed.

      Response: We agree with the referee that this is an important point that we have tackled experimentally using vGAT-CypHer and synapto-pHluorin responses as measures. In the new Fig. 1-Supplement 1, Fig. 5- Supplement 1, and Fig.6 -Supplement 1 of our revised manuscript, we show that SV exocytosis is largely unaffected by any of the applied manipulations of actin function.<br /> Specifically, we have added surface normalized data as a surrogate measure for exocytosis for the following:

      • JLY treatment monitored by Syph-pH (Figure 1-Supplement 1A) and vGAT-CypHer (Figure 1-Supplement 1B),

      • shCTR/shmDia1 (transfected) assayed via Syph-pH (Figure 1-Supplement 1G),

      • shCTR/shmDia1/shmDia1+3 assayed via vGLUT1-pH (40AP: Figure 1-Supplement 1J; 80AP: Figure 1-Supplement 1L),

      • shCTR/shmDia1+3 (transduced) assayed by vGAT-CypHer (Figure 1-Supplement 1M),

      • IMM treatment monitored by vGLUT1-pH (Figure 1-Supplement 1O),

      • RhoA/B WT/DN overexpression monitored by Syph-pH (Figure 5-Supplement 1B),

      • shCTR/shRhoA+B (transfected) monitored via Syph-pH (Figure 5-Supplement 1D),

      • shCTR/shmDia1+3 +/- EHT 1864 (Rac Inhibitor) assayed by vGAT-CypHer (Figure 6-Supplement 1D),

      • shCTR/shmDia1+3 +/- Rac1-CA/DN assayed by Syph-pH (Figure 6-Supplement 1F).

      The lack of effect of these manipulations on exocytic SV fusion is thus distinct from the effects of complete abrogation of actin expression in beta- or gamma-actin knockout studies reported by the LingGang Wu laboratory (Neuron 2016) as the referee also noted.

      (3) Figure 3H, 3K, 4C, 4F - It is unclear how the values on the Y-axis were calculated. Regardless, to confirm that there is a specific increase in presynaptic mDia1/actin, the equivalent values for Homer/mDia1 should be presented (with Basson/Homer as a negative control). Without this, it is difficult to argue for a specific enrichment of mDia1/actin at the presynapse. The CRISPR experiments help with this interpretation (Fig 4G-I), however, inclusion of the Homer/mDia1 STED data would strengthen it greatly.

      Response: We apologize if the description has been unclear. We essentially have followed the same type of analysis as recently described by Bolz et al (2023). In brief, the rationale for quantifying presynaptic protein levels of interests is as follows: The presynaptic area was defined by the normalized distribution curve of Bassoon, i.e. area between 151.37 and -37.84 nm as marked by purple shading with a cutoff set where Bassoon and Homer1 distributions overlap (-37.84 nm) as shown in Figure 3Supplement 1H (pasted below). The individual synaptic line profiles, e.g. of mDia1 were integrated to yield presynaptic (between 151.37 and -37.84 nm (purple in the graph) vs. postsynaptic levels (from - 56.76 to -245.97 nm (green shaded area). new Figure 3-Supplement 1H-J

      Author response image 1.

      Based on this analysis postsynaptic mDia1 levels were also elevated upon Dynasore treatment (new Figure 3-Supplement 1I). In spite of this and consistent with the fact that the majority of mDia1 is localized at the presynapse, we found that postsynaptic F-actin levels were unchanged in mDia1/3depleted neurons (p = 0.0966; One sample t-test) (new Figure 4-Supplement 1E,F). new Figure 4 – Supplement 1E,F

      Author response image 2.

      Moreover, we also conducted further analysis with respect to possible effects of Dynasore on synaptic architecture in general. Neither presynaptic Bassoon nor postsynaptic Homer1 levels were significantly altered by Dynasore treatment (new Figure 3–Supplement 1J).

      (4) Figure 4J - The rescue of the pHlourin response by jasplakinolide is difficult to interpret when considering previous work from the same authors. In their 2017 publication (Soykan et al Neuron 2017 93:854), they revealed that the drug accelerated the pHluorin response, whereas now they demonstrate no effect in the control condition. If the drug does accelerate endocytosis, then it may be working via a different mechanism to restore endocytosis in mDia1/3 knockdown neurons.

      Response: The referee is correct. The very mild acceleration of endocytosis in the presence of jasplakinolide can be observed using synaptophysin-pHluorin as a reporter under moderate mediumfrequency stimulation at 10Hz for 5 s (i.e. 50 APs). In the present dataset using a different pHluorin reporter (i.e. vGLUT1-pHluorin) that tends to yield faster endocytic responses (as noted before by the Ryan lab) and using a high frequency stimulus (20Hz) we fail to observe a significant effect. While this cannot be excluded, we would be reluctant to conclude that these differences indicate distinct mechanisms of jasplakinolide action. Alternatively, actin may be of particular importance under conditions of high-frequency stimulation.

      In this regard, the conclusions from the pHluorin experiment would be greatly strengthened by demonstrating that jasplakinolide corrects the reduction of presynaptic actin in mDia1/3 knockdown synapses observed in figures 4E-I.

      Response: As demonstrated in Figure 4-Supplement 1G and in support of a common mechanism of action, we find that application of jasplakinolide rescues reduced presynaptic actin levels in mDia1/3depleted neurons. The respective data for presynaptic actin (normalized to shCTR + DMSO set to 100) are: shCTR + DMSO = 100 ± 6.3; shmDia1+3 + DMSO = 47.7 ± 4.3; shCTR + Jasp = 150.6 ± 11.9; shmDia1+3 + Jasp = 94.3 ± 11.5. These data are now also quoted in the revised manuscript text.

      Minor points

      (1) There is no rationale provided regarding why different stimulation protocols are sometimes used in the pHluorin/cypHer experiments. In most cases it is 200 APs (40 Hz), however, in some cases, it is 40 APs or 80 APs. Can the authors explain why they used these different protocols?

      Response: The referee noted this correctly. This in part reflects the history of the project, in which initial datasets were acquired using 200 AP trains using pHluorin reporters. To probe whether the phenotypic effects induced by actin perturbations, were robust over different stimulation paradigms and optical reporters, additional data using either 40 or 80 AP trains as well as experiments capitalizing on vGLUT1 or endogenous vGAT monitiored by pH-sensitive cypHer-labeled antibodies were conducted. We hope the referee agrees that these additional data add to the general importance of our study.

      (2) Figure 2 - The reduction in SV density in mDia1/3 knockdown neurons correlates with the results in Figures 1 and 7. However, a functional consequence of this reduction (change in size of RRP or neurotransmitter release, as stated above) would have increased the impact of these experiments.

      Response: We agree with the referee and will address this interesting possibility using electrophysiolgical recordings in future studies.

      (3) It appears the experimental n in Figure 2 is profiles, rather than experiments. This should be clarified, especially since there is no reference to how many times the experiments in Fig2E-G were performed.

      Response: This point has been clarified in the revised figure legend.

      (4) Figure 6 - The authors state that inhibition of Rac function either via a dominant negative mutant or an inhibitor increases the inhibition of endocytosis via knockdown of mDia1/3. However, both interventions inhibit endocytosis themselves in the control condition. It would be informative to see the full statistical analysis of this data since there does not appear to be a significant additive effect when comparing Rac inhibition with the additional knockdown of mDia1/3.

      Response: In our revised manuscript, we now provide the full statistical analysis in the revised Source Data Table for Figures 6G,H. We observe that Rac1-DN expression indeed further aggravates phenotypes elicited by depletion of mDia1+3, but not vice versa. We have modified the corresponding section in the results section of our revised manuscript accordingly.

      (5) Figure 7 - The increase in endosomes in mDia1/3 knockdown neurons is consistent with previous studies examining pharmacological inhibition of formins (Soykan et al Neuron 2017 93:854). However, it is noted that these structures were absent in the images shown in Figure 2. Similar to the previous point in figure 6, a full reporting of the significance of different conditions is important here, since it appears that the only difference between EHT1864 and its co-incubation with mDia1/3 knockdown neurons is in the number of ELVs (Fig 7H).

      Response: Similar to the example EM images shown in Figure 7, enlarged endocytic structures are also observed in shmDia1+3 depleted synapses shown in Figure 2. However, ELVs and membrane invaginations were not color-coded as the focus in figure 2 is on the reduction of the SV pool. To better illustrate this, we have chosen a more representative example of this phenotype in revised Figure 2.

      Moreover, we now provide the full statistical analysis of EM phenotypes in the revised Source Data Table for Figure 7. We find that Rac1 inhibition indeed significantly aggravates the effects of mDia1+3 loss with respect to the accumulation of membrane invaginations, while the effect on ELVs remains insignificant. However, accumulation of ELVs in the presence of the Rac1 inhibitor EHT1864 is further aggravated upon depletion of mDia1+3. We have modified the corresponding section in the results section of our revised manuscript accordingly.

      We speculate that Rac1 may thus predominantly act at the plasma membrane, whereas mDia1/3 may serve additional functions in SV reformation at the level of ELVs. Clearly, further studies would be needed to test this idea in the future.