10,000 Matching Annotations
  1. Nov 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study examined the associations of a healthy lifestyle with comprehensive and organ-specific biological ages defined using common blood biomarkers and body measures. Its large sample size, longitudinal design, and robust statistical analysis provide solid support for the findings, which will be of interest to epidemiologists and clinicians.

      Thank you very much for your thoughtful review of our manuscript. Your valuable comments have greatly helped us improve our manuscript. We have carefully considered all the comments and suggestions made by the reviewers and have revised them to address each point. Below, we provide detailed responses to each of the reviewers' comments. Please note that the line numbers mentioned in the following responses correspond to the line numbers in the clean version of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study was to examine the associations of a healthy lifestyle with comprehensive and organ-specific biological ages. It emphasized the importance of lifestyle factors in biological ages, which were defined using common blood biomarkers and body measures.

      Strengths:

      The data were from a large cohort study and defined comprehensive and six-specified biological ages.

      Weaknesses:

      (1) Since only 8.5% of participants from the CMEC (China Multi-Ethnic Cohort Study) were included in the study, has any section bias happened?

      Thank you for your valuable question. We understand the concern regarding the potential selection bias due to only 8.5% of participants being included in the study. The baseline survey of China Multi-Ethnic Cohort Study (CMEC) employed a rigorous multi-stage stratified cluster sampling method and the repeat survey reevaluated approximately 10% of baseline participants through community-based cluster random sampling. Therefore, the sample of the repeat survey is representative. The second reason for the loss of sample size was the availability of biomarkers for BA calculation. We have compared characteristic of the overall population, the population included in and excluded from this study. Most characteristics were similar, but participants included in this study showed better in some health-related variables, one potential reason is healthier individuals were more likely to complete the follow-up survey. In conclusion, we believe that the impact of selection bias is limited.

      Author response table 1.

      Baseline characteristics of participants included and not included in the study

      BA, biological age; BMI, body mass index; CVD, cardiovascular disease; HLI, healthy lifestyle indicator.

      1 Data are presented as median (25th, 75th percentile) for continuous variables and count (percentage) for categorical variables.

      2 For HLI, "healthy" corresponds to a score of 4-5.

      3 Information on each validated BA has been reported. BA acceleration is the difference between each BA and CA in the same survey.

      (2) The authors should specify the efficiency of FFQ. How can FFQ genuinely reflect the actual intake? Moreover, how was the aMED calculated?

      Thank you for the comments and questions. We appreciate the opportunity to clarify these aspects of our study. For the first question, we evaluated the FFQ's reproducibility and validity by conducting repeated FFQs and 24-hour dietary recalls at the baseline survey. Intraclass correlation coefficients (ICC) for reproducibility ranged from 0.15 for fresh vegetables to 0.67 for alcohol, while deattenuated Spearman rank correlations for validity ranged from 0.10 for soybean products to 0.66 for rice. More details are provided in our previous study (Lancet Reg Health West Pac, 2021). We have added the corresponding content in both the main text and the supplementary materials.

      Methods, Page 8, lines 145-146: “The FFQ's reproducibility and validity were evaluated by conducting repeated FFQs and 24-hour dietary recalls.”

      Supplementary methods, Dietary assessment: “We evaluated the FFQ's reproducibility and validity by conducting repeated FFQs and 24-hour dietary recalls. Intraclass correlation coefficients for reproducibility ranged from 0.15 for fresh vegetables to 0.67 for alcohol, while deattenuated Spearman rank correlations for validity ranged from 0.10 for soybean products to 0.66 for rice.”

      For the second question, we apologize for any confusion. To avoid taking up too much space in the main text, we decided not to include the detailed aMED calculation (as described in Circulation, 2009) there and instead placed it in the supplementary materials:

      “Our calculated aMED score incorporates eight components: vegetables, legumes, fruits, whole grains, fish, the ratio of monounsaturated fatty acids (MUFA) to saturated fatty acids (SFA), red and processed meats, and alcohol. Each component's consumption was divided into sex-specific quintiles. Scores ranging from 1 to 5 were assigned based on quintile rankings to each component, except for red and processed meats and alcohol, for which the scoring was inverted. The alcohol criteria for the aMED was defined as moderate consumption. Since the healthy lifestyle index (HLI) already contained a drinking component, we removed the drinking item in the aMED, which had a score range of 7-35 with a higher score reflecting better adherence to the overall Mediterranean dietary pattern. We defined individuals with aMED scores ≥ population median as healthy diets.”

      Reference:

      (1) Xiao X, Qin Z, Lv X, Dai Y, Ciren Z, Yangla Y, et al. Dietary patterns and cardiometabolic risks in diverse less-developed ethnic minority regions: results from the China Multi-Ethnic Cohort (CMEC) Study. Lancet Reg Health West Pac. 2021;15:100252. doi: 10.1016/j.lanwpc.2021.100252.

      (2) Fung TT, Rexrode KM, Mantzoros CS, Manson JE, Willett WC, Hu FB. Mediterranean diet and incidence of and mortality from coronary heart disease and stroke in women. Circulation. 2009;119(8):1093-100. doi: 10.1161/circulationaha.108.816736.

      (3) HLI (range) and HLI (category) should be clearly defined.

      Thank you for the comment. We have added the definition of HLI (range) and HLI (category) in the methods section:

      Methods P9 lines 165-170: “The HLI was calculated by directly adding up the five lifestyle scores, ranging from 0-5, with a higher score representing an overall healthier lifestyle, denoted as HLI (range) in the following text. We then transformed HLI into a dichotomous variable in this study, denoted as HLI (category), where a score of 4-5 for HLI was considered a healthy lifestyle, and a score of 0-3 was considered an unfavorable lifestyle that could be improved.”

      (4) The comprehensive rationale and each specific BA construction should be clearly defined and discussed. For example, can cardiopulmonary BA be reflected only by using cardiopulmonary status? I do not think so.

      Thank you for the opportunity to clarify. We constructed the comprehensive BA based on all the available biochemical data from the CMEC study, selecting aging-related markers (J Gerontol A Biol Sci Med Sci, 2021), and further construct organ-specific BAs based on these selected biomarkers. The KDM algorithm does not specify biomarker types but requires them to be correlated with chronological age (CA) (Ageing Dev, 2006). Existing studies typically construct BA based on available biomarker, we included 15 biomarkers in this study, which could be considered comprehensive and extensive compared to previous research (J Transl Med. 2023; J Am Heart Assoc. 2024; Nat Cardiovasc Res. 2024). For how the biomarkers for each organ-specific BAs were selected, we categorized biomarkers primarily based on their relevance to the structure and function of each organ system according to the classification in previous studies (Nat Med, 2023; Cell Rep, 2022). Since the biomarkers we used came from clinical-lab data sets, they were categorized based on the clinical interpretation of blood chemistry tests following the methods outlined in the two referenced papers (Nat Med, 2023; Cell Rep, 2022). We only used biomarkers directly related to each specific system to minimize overlap between the indicators used for different BAs, thereby preserving the distinctiveness of organ-specific BAs. We acknowledge the limitations of this approach that a few biomarkers may not fully capture the complete aging process of a system, and certain indicators may be missing due to data constraints. However, the multi-organ BAs we constructed are cost-effective, easy to implement, and have been validated, making them valuable despite the limitations.

      Reference:

      (1) Verschoor CP, Belsky DW, Ma J, Cohen AA, Griffith LE, Raina P. Comparing Biological Age Estimates Using Domain-Specific Measures From the Canadian Longitudinal Study on Aging. J Gerontol A Biol Sci Med Sci. 2021;76(2):187-94. doi: 10.1093/gerona/glaa151.

      (2) Klemera P, Doubal S. A new approach to the concept and computation of biological age. Mech Ageing Dev. 2006;127(3):240-8. doi: 10.1016/j.mad.2005.10.004

      (3) Zhang R, Wu M, Zhang W, Liu X, Pu J, Wei T, et al. Association between life's essential 8 and biological ageing among US adults. J Transl Med. 2023;21(1):622. doi: 10.1186/s12967-023-04495-8.

      (4) Forrester SN, Baek J, Hou L, Roger V, Kiefe CI. A Comparison of 5 Measures of Accelerated Biological Aging and Their Association With Incident Cardiovascular Disease: The CARDIA Study. J Am Heart Assoc. 2024;13(8):e032847. doi: 10.1161/jaha.123.032847.

      (5) Jiang M, Tian S, Liu S, Wang Y, Guo X, Huang T, Lin X, Belsky DW, Baccarelli AA, Gao X. Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality. Nat Cardiovasc Res. 2024;3(3):332-42. doi: 10.1038/s44161-024-00438-8.

      (6) Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29(5):1221-31. doi: 10.1038/s41591-023-02296-6.

      (7) Nie C, Li Y, Li R, Yan Y, Zhang D, Li T, et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 2022;38(10):110459. doi: 10.1016/j.celrep.2022.110459.

      (5) The lifestyle index is defined based on an equal-weight approach, but this does not reflect reality and cannot fully answer the research questions it raises.

      Thank you very much for your valuable suggestion. We used equal weight healthy lifestyle index (HLI) partly to facilitate comparisons with other studies. The equal-weight approach to construct the HLI is commonly used in current research (Bmj, 2021; Diabetes Care. 2022; Arch Gerontol Geriatr. 2022). The equal-weight HLI can demonstrate the average benefit of adopting each additional healthy lifestyle and avoid assumptions about the relative importance of different behaviors, which may vary depending on the population. To further clarify the importance of each lifestyle factor, we conducted quantile G-computation analysis, which can reflect the weight differences between lifestyle factors (PLoS Med, 2020; Clin Epigenetics, 2022).

      Reference:

      (1) Zhang YB, Chen C, Pan XF, Guo J, Li Y, Franco OH, Liu G, Pan A. Associations of healthy lifestyle and socioeconomic status with mortality and incident cardiovascular disease: two prospective cohort studies. Bmj. 2021;373:n604. doi: 10.1136/bmj.n604.

      (2) Han H, Cao Y, Feng C, Zheng Y, Dhana K, Zhu S, Shang C, Yuan C, Zong G. Association of a Healthy Lifestyle With All-Cause and Cause-Specific Mortality Among Individuals With Type 2 Diabetes: A Prospective Study in UK Biobank. Diabetes Care. 2022;45(2):319-29. doi: 10.2337/dc21-1512.

      (3) Jin S, Li C, Cao X, Chen C, Ye Z, Liu Z. Association of lifestyle with mortality and the mediating role of aging among older adults in China. Arch Gerontol Geriatr. 2022;98:104559. doi: 10.1016/j.archger.2021.104559.

      (4) Chudasama YV, Khunti K, Gillies CL, Dhalwani NN, Davies MJ, Yates T, Zaccardi F. Healthy lifestyle and life expectancy in people with multimorbidity in the UK Biobank: A longitudinal cohort study. PLoS Med. 2020;17(9):e1003332. doi: 10.1371/journal.pmed.1003332.

      (5) Kim K, Zheng Y, Joyce BT, Jiang H, Greenland P, Jacobs DR, Jr., et al. Relative contributions of six lifestyle- and health-related exposures to epigenetic aging: the Coronary Artery Risk Development in Young Adults (CARDIA) Study. Clin Epigenetics. 2022;14(1):85. doi: 10.1186/s13148-022-01304-9.

      Reviewer #2 (Public Review):

      This interesting study focuses on the association between lifestyle factors and comprehensive and organ-specific biological aging in a multi-ethnic cohort from Southwest China. It stands out for its large sample size, longitudinal design, and robust statistical analysis.

      Some issues deserve clarification to enhance this paper:

      (1) How were the biochemical indicators for organ-specific biological ages chosen, and are these indicators appropriate? Additionally, a more detailed description of the multi-organ biological ages should be provided to help understand the distribution and characteristics of BAs.

      We thank you for raising this point. As explained in our response to the fourth question from the first reviewer, we constructed the comprehensive BA b ased on all the available biochemical data from the CMEC study, selecting aging-related markers (J Gerontol A Biol Sci Med Sci, 2021), and further construct organ-specific BAs based on these selected biomarkers. The KDM algorithm does not specify biomarker types but requires them to be correlated with chronological age (CA) (Ageing Dev, 2006). Existing studies typically construct BA based on available biomarker, we included 15 biomarkers in this study, which could be considered comprehensive and extensive compared to previous research (J Transl Med. 2023; J Am Heart Assoc. 2024; Nat Cardiovasc Res. 2024). For how   the biomarkers for each organ-specific BAs were selected, we categorized biomarkers primarily based on their relevance to the structure and function of each organ system according to the classification in previous studies (Nat Med, 2023; Cell Rep, 2022). Since the biomarkers we used came from clinical-lab data sets, they were categorized based on the clinical interpretation of blood chemistry tests (Nat Med, 2023). We only used biomarkers directly related to each specific system to minimize overlap between the indicators used for different BAs, thereby preserving the distinctiveness of organ-specific BAs.

      We have added a descriptive table for the comprehensive and organ systems BAs in the supplementary materials to provide a more detailed understanding of the distribution and characteristics of BAs:

      Author response table 2.

      Description of BA and BA acceleration1

      BA, biological age

      1 Data are presented as mean (standard deviation).

      (2) The authors categorized the HLI score into a dichotomous variable, which may cause a loss of information. How did the authors address this potential issue?

      Thank you for raising this concern. We categorized each lifestyle factor into a binary variable based on relevant guidelines and studies, which recommend assigning a score of 1 if the guideline or study recommendations are met (Bmj, 2021; J Am Heart Assoc, 2023). While dichotomization may lead to some loss of information, it allows for a clearer interpretation and comparison of adherence to ideal healthy lifestyle behaviors. Another advantage of this treatment is that it allows for easy comparison with other studies. We categorized the HLI score into a dichotomous variable to enhance the practical relevance of the results (J Gerontol A Biol Sci Med Sci, 2021). Additionally, we conducted analyses using the continuous HLI score to ensure that our findings were robust, and the results were consistent with those obtained using the dichotomous HLI.

      Reference:

      (1) Verschoor CP, Belsky DW, Ma J, Cohen AA, Griffith LE, Raina P. Comparing Biological Age Estimates Using Domain-Specific Measures From the Canadian Longitudinal Study on Aging. J Gerontol A Biol Sci Med Sci. 2021;76(2):187-94. doi: 10.1093/gerona/glaa151.

      (2) Klemera P, Doubal S. A new approach to the concept and computation of biological age. Mech Ageing Dev. 2006;127(3):240-8. doi: 10.1016/j.mad.2005.10.004

      (3) Zhang R, Wu M, Zhang W, Liu X, Pu J, Wei T, et al. Association between life's essential 8 and biological ageing among US adults. J Transl Med. 2023;21(1):622. doi: 10.1186/s12967-023-04495-8.

      (4) Forrester SN, Baek J, Hou L, Roger V, Kiefe CI. A Comparison of 5 Measures of Accelerated Biological Aging and Their Association With Incident Cardiovascular Disease: The CARDIA Study. J Am Heart Assoc. 2024;13(8):e032847. doi: 10.1161/jaha.123.032847.

      (5) Jiang M, Tian S, Liu S, Wang Y, Guo X, Huang T, Lin X, Belsky DW, Baccarelli AA, Gao X. Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality. Nat Cardiovasc Res. 2024;3(3):332-42. doi: 10.1038/s44161-024-00438-8.

      (6) Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29(5):1221-31. doi: 10.1038/s41591-023-02296-6.

      (7) Nie C, Li Y, Li R, Yan Y, Zhang D, Li T, et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 2022;38(10):110459. doi: 10.1016/j.celrep.2022.110459.

      (3) Because lifestyle data are self-reported, they may suffer from recall bias. This issue needs to be addressed in the limitations section.

      Thank you for your valuable suggestion. We acknowledge that the use of self-reported lifestyle data in our study may introduce recall bias, potentially affecting the accuracy of the information collected. We have added the following statement to the limitations section of our manuscript:

      Discussion, Page 22, lines 463-464: “Fifth, assessment of lifestyle factors was based on self-reported data collected through questionnaires, which may be subject to recall bias.”

      (4) It should be clarified whether the adjusted CA is the baseline value of CA. Additionally, why did the authors choose models with additional adjustments for time-invariant variables as their primary analysis? This approach does not align with standard FEM analysis (Lines 261-263).

      Thank you for the opportunity to clarify. We have changed the sentence to “baseline CA”. For the second question, in a standard fixed effects model (FEM), only time-varying variables are typically included. However, to enhance the flexibility of our models and account for potential variations in the association of time-invariant variables with CA, as has been commonly done in previous studies, we additionally adjusted for time-invariant variables and the baseline value of CA (BMC Med Res Methodol, 2024; Am J Clin Nutr, 2020). Moreover, sensitivity analyses using the standard FEM were conducted in this study, and robust results were obtained.

      Reference:

      (1) Tang D, Hu Y, Zhang N, Xiao X, Zhao X. Change analysis for intermediate disease markers in nutritional epidemiology: a causal inference perspective. BMC Med Res Methodol. 2024;24(1):49. doi: 10.1186/s12874-024-02167-9.

      (2) Trichia E, Luben R, Khaw KT, Wareham NJ, Imamura F, Forouhi NG. The associations of longitudinal changes in consumption of total and types of dairy products and markers of metabolic risk and adiposity: findings from the European Investigation into Cancer and Nutrition (EPIC)-Norfolk study, United Kingdom. Am J Clin Nutr. 2020;111(5):1018-26. doi: 10.1093/ajcn/nqz335.

      (5) How is the relative contribution calculated in the QGC analysis? The relative contribution of some lifestyle factors is not shown in Figure 2 and the supplementary figures, such as Supplementary Figure 7. These omissions should be explained.

      Thanks for the questions. The QGC obtains causal relationships and estimates weights for each component, which has been widely used in epidemiological research. More details about QGC can be found in the supplementary methods. The reason some results are not displayed is that we assumed all healthy lifestyle changes would have a protective effect on BA acceleration. However, the effect size of some lifestyle factors did not align with this assumption and lacked statistical significance. Because positive and negative weights were calculated separately in QGC, with all positive weights summing to 1 and all negative weights summing to 1, these factors would have had large positive weights. To avoid potential misunderstandings, we chose not to include these results in the figures. We have added explanations to the figure legends where applicable:

      “The blue bars represent results that are statistically significant in the FEM analysis, while the gray bars represent results in the FEM analysis that were not found to be statistically significant and positive weights were not shown.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      To enhance this paper, some issues deserve clarification:

      (1) How were the biochemical indicators for organ-specific biological ages chosen, and are these indicators appropriate? Additionally, please provide a more detailed description of the multi-organ biological ages to help understand BAs' the distribution and characteristics.

      (2) The authors categorized the HLI score into a dichotomous variable, which may cause a loss of information. How did the authors address this potential issue?

      (3) Because lifestyle data are self-reported, they may suffer from recall bias. This issue needs to be addressed in the limitations section.

      (4) Lines 261-263: Please clarify if the adjusted CA is the baseline value of CA. Additionally, why did you choose models with additional adjustments for time-invariant variables as your primary analysis? This approach does not align with standard FEM analysis.

      (5) How is the relative contribution calculated in the QGC analysis? The relative contribution of some lifestyle factors is not shown in Figure 2 and the supplementary figures, such as Supplementary Figure 7. Please explain these omissions.

      The above five issues overlap with those raised by Reviewer #2 (Public Review). Please refer to the responses provided earlier.

      Minor revision:

      Line 50: The expression "which factors" should be changed to "which lifestyle factor."

      Thank you for the suggestion. As suggested, we have used “which lifestyle factor” instead.

      Lines 91-92: "Aging exhibits variations across and with individuals" appears to be a clerical error. According to the context, it should be "Aging exhibits variations across and within individuals."

      We thank the reviewer for the correction. We have updated the text to read:

      “Aging exhibits variations across and within individuals.”

      Line 154: The authors mentioned "Considering previous studies" but lacked references. Please add the appropriate citations.

      Thank you for pointing this out. We apologize for the oversight. We have now added the appropriate citations to support the statement "Considering previous studies" in the revised manuscript.

      Lines 170-171: "regular exercise ("12 times/week", "3-5 times/week," or "daily or almost every day")"; the first item in parentheses should be "1-2 times/week"? Please verify and correct if necessary. Additionally, check the entire text carefully to avoid confusion caused by clerical errors.

      Thank you for your careful review. We have changed the sentence to "1-2 times/week." We have thoroughly checked the entire manuscript to ensure that no other clerical errors remain.

      Clarifications for Table 1:

      i. The expression "HLI=0" is difficult to understand. Please provide a more straightforward explanation or rephrase it.

      Thank you for your feedback. We have removed the confusing expression and provided a clearer explanation in the table legend for better understanding:

      “For HLI (category), "healthy" corresponds to a score of 4-5, while "unfavorable" corresponds to a score of 0-3.”

      ii. The baseline age is presented as an integer, but the follow-up age is not. Please clarify this discrepancy.

      Thank you for pointing out this discrepancy. We calculated the precise chronological age based on based on participants' survey dates and birth dates for the biological age calculations. Initially, the table presented age as integers, but we have now updated it to show the precise ages.

    1. eLife Assessment

      This fundamental, clearly written, and timely manuscript links the timing of ART with the kinetics of total and intact proviral HIV DNA. The conclusions are interesting and novel, and the importance of the work is high because the focus is on African women and clade C virus, both of which are understudied in the HIV reservoir field. The strength of the evidence is compelling. Overall, this work will be of very high interest to scientists and clinicians in the HIV cure/persistence fields.

    2. Reviewer #1 (Public review):

      The authors sought to determine the impact of early antiretroviral treatment on the size, composition, and decay of the HIV latent reservoir. This reservoir represents the source of viral rebound upon treatment interruption and therefore constitutes the greatest challenge to achieving an HIV cure. A particular strength of this study is that it reports on reservoir characteristics in African women, a significantly understudied population, of whom some have initiated treatment within days of acute HIV diagnosis. With the use of highly sensitive and current technologies, including digital droplet PCR and near full-length genome next-generation sequencing, the authors generated a valuable dataset for investigation of proviral dynamics in women initiating early treatment compared to those initiating treatment in chronic infection. The authors confirm previous reports that early antiretroviral treatment restricts reservoir size, but further show that this restriction extends to defective viral genomes, where late treatment initiation was associated with a greater frequency of defective genomes. Furthermore, an additional strength of this study is the longitudinal comparison of viral dynamics post-treatment, wherein early treatment was shown to be associated with a more rapid rate of decay in proviral genomes, regardless of intactness, over a period of one year post-treatment. While it is indicated that intact genomes were not detected after one year following early treatment initiation, sampling depth is noted as a limitation of the study by the authors, and caution should thus be taken with interpretation where sequence numbers are low. Defective genomes are more abundant than intact genomes and are therefore more likely to be sampled. Early treatment was also associated with reduced proviral diversity and fewer instances of polymorphisms associated with cytotoxic T-lymphocyte immune selection. This is expected given that rapid evolution and extensive immune selection are synonymous with HIV infection in the absence of treatment, yet points to an additional benefit of early treatment in the context of immune therapies to restrict the reservoir.

      This is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C. The data and findings from this study therefore represent a much-needed resource in furthering our understanding of HIV persistence and informing broadly impactful cure strategies. The analysis on clonal expansion of proviral genomes may be limited by higher sequence homogeneity in hyperacute infection i.e., cells with different proviral integration sites may have a higher likelihood of containing identical genomes compared to chronic infection.

      Overall, these data demonstrate the distinct benefits of early treatment initiation at reducing the barrier to a functional cure for HIV, not only by restricting viral abundance and diversity but also potentially through the preservation of immune function and limiting immune escape. It therefore provides clues to curative strategies even in settings where early diagnosis and treatment may be unlikely.

    3. Reviewer #2 (Public review):

      HIV infection is characterized by viral integration into permissive host cells - an event that occurs very early in viral-host encounter. This constitutes the HIV proviral reservoir and is a feature of HIV infection that provides the greatest challenge for eradicating HIV-1 infection once an individual is infected.

      This study looks at how starting HIV treatment very early after infection, which substantially reduces the peak viral load detectable (compared to untreated infection), affects the amount and characteristics of the viral reservoir. The authors studied 35 women in South Africa who were at high risk of getting HIV. Some of these women started HIV treatment very soon after getting infected, while others started later. This study is well-designed and has as its focus a very well characterized cohort. Comparison groups are appropriately selected to address proviral DNA characterization and dynamics in the context of acute and chronic treated HIV-1. The amount of HIV and various characteristics of the genetic makeup of the virus (intact/defective proviral genome) was evaluated over one year of treatment. Methods employed for proviral DNA characterization are state-of-the-art and provide in-depth insights into the reservoir in peripheral blood.

      While starting treatment early didn't reduce the amount of HIV DNA at the outset, it did lead to a gradual decrease in total HIV DNA quantity over time. In contrast, those who started treatment later didn't see much change in this parameter. Starting treatment early led to a faster decrease in intact provirus (a measure of replication-competence), compared to starting treatment later. Additionally, early treatment reduced genetic diversity of the viral DNA and resulted in fewer immune escape variants within intact genomes. This suggests that collectively having a smaller intact replication-competent reservoir, less viral variability, and less opportunity for virus to evade the immune system - are all features that are likely to facilitate more effective clearance of viral reservoir, especially when combined with other intervention strategies.

      Major strengths of the study include the cohort of very early treated persons with HIV and the depth of study. These are important findings, particularly as the study was conducted in HIV-1 subtype C infected women (more cure studies have focussed on men and with subtype B infection)- and in populations most affected by HIV and in need of HIV cure interventions. This is highly relevant because it cannot be assumed that any interventions employed for reducing/clearing the HIV reservoir would perform similarly in men and women or across different populations. Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections).

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1:

      (1) Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.  

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult.  However, we can cite/highlight and contrast our study with a few a few examples from acute infection studies as follows.

      a. Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year.  In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      b. White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      c. A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy.  Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      d. A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated.  In acute treated 93% (48% in FRESH) were defective and 35% (7% in FRESH) were hypermutated.  The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups where FRESH participants initiate ART at a median of 1 day after infection.  It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.  

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1 subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      e. In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV.  Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective.  These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      (2) Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size.

      We have edited the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly (lines 443-444). We have now performed an analysis of area under the curve to compare viral burden in the two study groups and found associations with proviral DNA levels after one year. This has been added to the results section (lines 162-163).

      Reviewer #2:

      (1) Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3:

      (1) The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and where appropriate have amended the use of the word reservoir to only refer to the proviral load after full viral suppression, i.e., undetectable viral load.

      (2) All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties on request. We have now added a section on data availability (lines 489-491).

      (3) The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points.  This has now been explained more clearly in the figure legend.

      Recommendations for The Authors:

      Reviewer #1:

      (1) It is recommended that the introduction includes information to set the scene regarding what is currently reported on the composition of the reservoir for those not in the immediate field of study i.e., the reported percentage of defective genomes and in which settings/populations genome intactness has been mapped, as this remains an area of limited information.

      We have now included summary of other reported findings in the field in the introduction (lines 89-92, 9498) and discussion (lines 345-350).  A more detailed overview has been provided in the response to public reviews.

      (2) It may be beneficial to state in the main text of the paper what the purpose of the Raltegravir was and that it was only administered post-suppression. Looking at Table 1, only the hyperacute treatment group received Raltegravir and this could be seen as a confounder as it is an integrase inhibitor. Therefore, this should be explained.

      Once Raltegravir became available in South Africa, all new acute infections in the study cohort had an intensified 4-drug regimen that included Raltegravir.  A more detailed explanation has now been included in the methods section (lines 435-437).

      (3) Can the authors explain why the viral measures at 6 months post-ART are not shown for chronictreated individuals in Figure 1 or reported on in the text?

      The 6 months post-ART time point has been added to Figure 1.

      (4) Can the authors indicate in the discussion, how the breakdown of proviral composition compares to subtype B as reported in the literature, for example, are the common sites of deletion similar, or is the frequency of hypermutation similar?

      Added to discussion (lines 345-350).

      (5) Do the numbers above the bars in Figure 3 represent the number of sampled genomes? If so, this should be stated.

      Yes, the numbers above the bars represent the number of sampled genomes. This has been added to the Figure 3 legend.

      (6) In the section starting on line 141, the introduction implies a comparison with immunological features, yet what is being compared are markers of clinical disease progression rather than immune responses. This should be clarified/corrected.

      This has been corrected (line 153).

      (7) Line 170 uses the term 'immediately' following infection, however, was this not 1 -3 days after?

      We have changed the word “immediately” to “1-3 days post-detection” (line 181).

      (8) Can the sampling time-points for the two groups be given for the longitudinal sequencing analysis?

      The sequencing time points for each group is depicted in Figure 2.

      (9) Line 183 indicates that intact genomes contributed 65% of the total sequence pool, yet it's given as 35% in the paragraph above. Should this be defective genomes?

      Yes, this was a typographical error.  Now corrected to read “defective genomes” (line 193).

      (10) The section on decay kinetics of intact and defective genomes seems to overlap with the section above and would flow better if merged.

      Well noted, however we choose to keep these sections separate.

      (11) Some references in the text are given in writing instead of numbering.

      This has been corrected.

      (12) In the clonal expansion results section, can it be indicated between which two time-points expansion was measured?

      This analysis was performed with all sequences available for each participant at all time points.  We have added this explanation to the respective Figure legend.

      Reviewer #2:

      (1) The statement on line 384 "Our data showed that early ART...preserves innate immune factors" - what innate immune factors are being referred to?

      We have removed this statement.

      (2) HLA genotyping methods are not included in the Methods section

      Now included and referenced (lines 481-483).

      (3) Are CD4:CD8 ratios available for the cohorts? This could be another informative clinical parameter to analyse in relation to HIV-1 proviral load after 1 year of ART – as done for the other variables (peak VL, and the CD4 measures).

      Yes, CD4:CD8 ratios are available. We performed the recommended analysis but found no associations with HIV-1 proviral load after 1 year of ART. We have added this to the results section (lines 163-164).

      (4) Reference formatting: Paragraph starting at line 247 (Contribution of clonal expansion...) - the two references in this paragraph are not cited according to the numbering system as for the rest of the manuscript. The Lui et al, 2020 reference is missing from the reference list - so will change all the numbering throughout.

      This has been corrected.

      Reviewer #3:

      (1) To allow comparison to past work. I suggest changing decay using % to half-life. I would also mention the multiple studies looking at total and intact HIV DNA decay rates in the intro.

      We do not have enough data points to get a good estimate of the half-life and therefor report decay as percentage per month for the first 6 months. 

      (2) Line 73: variability is the wrong word as inter-individual variability is remarkably low. I think the authors mean "difference" between intact and total.

      We have changed the word variability to difference as suggested.

      (3) Line 297: I am personally not convinced that there is data that definitively shows total HIV DNA impacting the pathophysiology of infection. All of this work is deeply confounded by the impact of past viremia. The authors should talk about this in more detail or eliminate this sentence.

      We have reworded the statement to read “Total HIV-1 DNA is an important biomarker of clinical outcomes.” (Lines 308-309).

      (4) Line 317; There is no target cell limitation for reservoir cells. The vast majority of CD4+ T cells during suppressive ART are uninfected. The mechanism listing the number of reservoir cells is necessarily not target cell limitation.

      We agree. The statement this refers to has been reworded as follows: “Considering, that the majority of CD4 T cells remain uninfected it is likely that this does not represent a higher number of target cells, and this warrants further investigation.” (lines 325-326).

      (5) Line 322: Some people in the field bristle at the concept of total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia. Please consider rephrasing. 

      We acknowledge that there are deferring opinions regarding total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia, however defective HIV proviruses may contribute to persistent immune dysfunction and T cell exhaustion that are associated comorbidities and adverse clinical outcomes in people living with HIV.  We have explained in the text that total HIV-DNA does not distinguish between replication-competent and -defective viruses that contribute to the viral reservoir.

      (6) Line 339: The under-sampling statement is an understatement. The degree of under-sampling is massive and biases estimates of clonality and sensitivity for intact HIV. Please see and consider citing work by Dan Reeves on this subject.

      We agree and have cited work by Dan Reeves (line 358).

      (7) Line 351: This is not a head-to-head comparison of biphasic decay as the Siliciano group's work (and others) does not start to consider HIV decay until one year after ART. I think it is important to not consider what happens during the first year of ART to be reservoir decay necessarily.

      Well noted.

      (8) Line 366-371: This section is underwritten. In nearly all PWH studies to date, observed reservoirs are highly clonal.

      We agree that observed reservoirs are highly clonal but have not added anything further to this section.

      (9) It would be nice to have some background in the intro & discussion about whether there is any a priori reason that clade C reservoirs, or reservoirs in South African women, might differ (or not) from clade B reservoirs observed in different study participants.

      We have now added this to the introduction (lines 94-103).

      (10) Line 248: This sentence is likely not accurate. It is probable that most of the reservoir is sustained by the proliferation of infected CD4+ T cells. 50% is a low estimate due to under-sampling leading to false singleton samples. Moreover, singletons can also be part of former clones that have contracted, which is a natural outcome for CD4+ T cells responding to antigens &/or exhibiting homeostasis. The data as reported is fine but more complex ecologic methods are needed to truly probe the clonal structure of the reservoir given severe under sampling.

      Well noted.

    1. eLife Assessment

      This important study shows that Toxoplasma gondii uses paracrine mechanisms, in addition to cell-intrinsic methods, to evade the host immune system, with MYR1 playing a key role in transporting effector molecules into host cells. The authors present convincing evidence that in vivo, MYR1-deficient parasites can be rescued by wild-type parasites, revealing a limitation in pooled CRISPR screens, where such paracrine effects may obscure the identification of key parasite pathways involved in immune evasion

    2. Reviewer #1 (Public review):

      Previous studies have highlighted some of these paracrine activities of Toxoplasma - and Rasogi et al (mBio, 2020) used a single cell sequencing approach of cells infected in vitro with the WT or MYR KO parasites - and one of their conclusions was that MYR-1 dependent paracrine activities counteract ROP-dependent processes. Similarly, Chen et al (JEM 2020) highlighted that a particular rhoptry protein (ROP16) could be injected into uninfected macrophages and move them to an anti-inflammatory state that might benefit the parasite.

      Caveats around immunity and as yet no insight into how this works. In Fig 2 there is a marked defect in the ability of the parasites to expand at day 2 and day 5. Together, these data sets suggest that this paracrine effect mediated by MYR-1 works early - well before the development of adaptive responses.

      Comments on revisions:

      The authors have provided their perspective on the original review. There were some previous comments that revolved around whether some of the early changes were masked by pooling data sets where they have reiterated that it is not statistically different. Would have been nice to have seen out addressed by having experiments that were appropriately powered. But it's their call.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript by Torelli et al., the authors propose that the major function of MYR1 and MYR1-dependent secreted proteins is to contribute to parasite survival in a paracrine manner rather than to protect parasites from cell-autonomous immune response. The authors conclude that these paracrine effects rescue ∆MYR1 or knockouts of MYR1-dependent effectors within pooled in vivo CRISPR screens.

      Strengths:

      The authors raised a more general concern that pooled CRISPR screens (not only in Toxoplasma but also other microbes or cancers) would miss important genes by "paracrine masking effect". Although there is no doubt that pooled CRISPR screens (especially in vivo CRISPR screens) are powerful techniques, I think this topic could be of interest to those fields and researchers.

      Weaknesses:

      In this version, the reviewer is not entirely convinced of the 'paracrine masking effect' because the in vivo experiments should include appropriate controls (see major point 2) in the first submission.

      After the revision, although no experiments were added, this reviewer considered that the points have been sufficiently discussed and commented on.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and thoughtful comments on our manuscript. 

      We realised a preliminary version of Figure 2 was initially submitted, which we are replacing now with a novel version. Differences between the two figures are : 1) The schematic in Figure 2a was replaced with a new one in line with that of Figure 3a; 2) in Figure 2c details about the statistical analysis were removed from the legend and one datapoint that was erroneously removed at day 5 for the ΔMYR1-Luc condition was included. Regardless, these changes do not affect the results and the conclusions initially drawn.

      Public Reviews:

      Reviewer #1 (Public review): 

      Previous studies have highlighted some of these paracrine activities of Toxoplasma - and Rasogi et al (mBio, 2020) used a single cell sequencing approach of cells infected in vitro with the WT or MYR KO parasites - and one of their conclusions was that MYR-1 dependent paracrine activities counteract ROP-dependent processes.

      Similarly, Chen et al (JEM 2020) highlighted that a particular rhoptry protein (ROP16) could be injected into uninfected macrophages and move them to an anti-inflammatory state that might benefit the parasite. 

      We are aware of both these studies, where the injection of rhoptry proteins into cells that the parasite does not invade alters the host transcriptional profile establishing a permissive environment. However, here we propose a different paracrine effect that goes beyond the injected/uninfected cell. Specifically, we propose that one or more MYR1-dependent effectors alter the cytokine secretion profile of infected cells, which leads to overall changes in the immune response such as cell types recruited to the site of infection, or the activation state. 

      There are caveats around immunity and as yet no insight into how this works. In Figure 2 there is a marked defect in the ability of the parasites to expand at day 2 and day 5. Together, these data sets suggest that this paracrine effect mediated by MYR-1 works early - well before the development of adaptive responses. 

      Yes, we also hypothesise an early effect based on the data. Growth continues until day 5 at least, and then plateaus towards day 7, which makes us believe that the effect takes place within the first 5 days. We agree with the reviewer that the MYR1-mediated rescue acts before the involvement of the adaptive immune response, which is supported by our results obtained in Rag2-/- mice shown in Figure 3e. 

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript by Torelli et al., the authors propose that the major function of MYR1 and MYR1-dependent secreted proteins is to contribute to parasite survival in a paracrine manner rather than to protect parasites from cell-autonomous immune response. The authors conclude that these paracrine effects rescue ∆MYR1 or knockouts of MYR1-dependent effectors within pooled in vivo CRISPR screens. 

      Strengths: 

      The authors raised a more general concern that pooled CRISPR screens (not only in Toxoplasma but also other microbes or cancers) would miss important genes by "paracrine masking effect". Although there is no doubt that pooled CRISPR screens (especially in vivo CRISPR screens) are powerful techniques, I think this topic could be of interest to those fields and researchers. 

      Weaknesses: 

      In this version, the reviewer is not entirely convinced of the 'paracrine masking effect' because the in vivo experiments should include appropriate controls (see major point 2). 

      (1) It is convincing that co-infection of WT and ∆MYR1 parasites could rescue the growth of ∆MYR1 in mice shown by in vivo luciferase imaging. Also, this is consistent with ∆MYR1 parasites showing no in vivo fitness defect in the in vivo CRISPR screens conducted by several groups. Meanwhile, it has been reported previously and shown in this manuscript that ∆MYR1 parasites have an in vitro growth defect; however, ∆MYR1 parasites show no in vitro fitness defect the in vitro pooled CRISPR screen. The authors show that the competition defect of ∆MYR1 parasites cannot be rescued by co-infection with WT parasites in Figure 1c, which might indicate that no paracrine rescue occurred in an in vitro environment. The authors seem not to mention these discrepancies between in vitro CRISPR screens and in vitro competition assays. Why do ∆MYR1 parasites possess neutral in vitro fitness scores in in vitro CRISPR screens? Could the authors describe a reasonable hypothesis? 

      The reviewer raises a very interesting point, which at this stage, we cannot fully explain. A technical explanation could be that the relatively small growth defect detected for clean KOs, is not well represented in the CRISPR screens due to the variability of guides, where smaller differences in growth are not reliably captured and hidden within the noise of the assays. Another technical explanation may be median-centering: if the majority of KOs in the pool have a small growth defect, median centering would push these towards a zero. We have observed and reported this phenomenon in Young et al., 2019 for libraries containing a larger fraction of genes with a negative fitness score. In the library used here focusing on secreted proteins, we have not observed a strong trend to negative fitness scores, but cannot exclude smaller shifts. Because we have no solid base to favour any of the above mentioned explanations, we have decided to not speculate too much on this in the manuscript. However, we wanted to show all the data as the difference between these results may not be technical, but biological, which could inform future studies or results by us and others.  

      (2) The authors developed a mixed infection assay with an inoculum containing a 20:80 ratio of ΔMYR1-Luc parasites with either WT parasites or ΔMYR1 mutants not expressing luciferase, showing that the in vivo growth defect of ∆MYR1 parasites is rescued by the presence of WT parasites. Since this experiment lacks appropriate controls, interpretation could be difficult. Is this phenomenon specific to MYR1? If a co-inoculum of ∆GRA12-Luc with either WT parasites or GRA12 parasites not expressing luciferase is included, the data could be appropriately interpreted. 

      We are not quite sure what appropriate controls the reviewer refers to. We show here in Figures 3c and 3f that increasing parasite load by co-infecting mice with ∆MYR1 parasites is not sufficient to rescue ∆MYR1-Luc parasite growth. Co-infection with WT parasites, however, does result in increased ∆MYR1-Luc parasitaemia at day 7 p.i., indicating that MYR1 competence is required for the in vivo trans-rescue we describe. As ∆GRA12 parasites have a very strong cell-autonomous restriction in vitro and severe growth defect in vivo (Torelli et al., BioRxiv), these parasites would be rapidly depleted, which is also observed in all CRISPR screens from various laboratories. Therefore we do not think that co-infection with GRA12-deficient parasites would be an informative experiment here. We do speculate that mutant parasites for other proteins required for export (i.e. MYR 2, 3, 4, ROP17) could also be trans-rescued in addition to mutants for other MYR-dependent proteins such as GRA24 and GRA28, which remodel cytokine secretion and could individually, or synergistically, affect host cell immunity. Dissecting which Toxoplasma factor/s and host cytokine signalling pathways drive this trans-rescue effect is highly interesting, but beyond the scope of this manuscript. Here, we focused on the basic concept that an individual mutant can be rescued in trans in vivo, which we think is of importance beyond the field of Toxoplasma research. 

      (3) In the Discussion part, the authors argue that the rescue phenotype of mixed infection is not due to co-infection of host cells (lines 307-310). This data is important to support the authors' paracrine hypothesis and should be shown in the main figure.

      We understand the reviewer’s concern for rescue by co-infection of the same cell, but we largely exclude this hypothesis as Toxoplasma cell-autonomous effectors, such as GRA12 and ROP18, would also be rescued if that were to happen on a larger scale. We previously performed an in vivo experiment to assess co-infection rates of peritoneal exudate cells (PECs) by imaging using infection doses comparable to those used in the trans-rescue experiments. The total infection rate of PECs was 2.3%, so the overall number of infected cells per image was low, and not suitable for publication purposes. We tried to capture more cells using FACS analysis, however, PECs are highly autofluorescent in the yellow/green channels, which prevented us from drawing adequate conclusions using our GFP and mCherry strains. Because we see no rescue of GRA12 or ROP18 in CRISPR screens, and the overall in vivo co-infection rates were very low as observed by imaging, we did not think that generating strains expressing different fluorochromes compatible with standard FACS analysis, and then performing more in vivo experiments was best use of resources at the time. 

      (4) In the Discussion part, the authors assume that the rescue phenotype is the result of multiple MYR1-dependent effectors. I admit that this hypothesis could be possible since a recently published paper described the concerted action of numerous MYR1-dependent or independent effectors contributing to the hypermigration of infected cells (Ten Hoeve et al., mBio, 2024). I think this paragraph would be kind of overstated since the authors did not test any of the candidate effectors. Since the authors possess ∆IST parasites, they can test whether IST is involved in the "paracrine masking effect" or not to support their claim. 

      MYR1 deletion impairs the export of multiple Toxoplasma effectors into the host cell, including GRA16, GRA24, GRA28, HCE1/TEEGR etc, many of which can influence cytokine levels. As such, we speculate that it is a combination of multiple effector proteins that are responsible for the trans-rescue. As stated above, which parasite effectors, host cell types and cytokines are involved in the phenotype we describe are part of ongoing and future studies. Here, we wanted to focus on the key message, that in in vivo CRISPR screens, paracrine rescue of individual mutants can occur. While we will test IST mutants, it is probably not the top candidate as it only prevents upregulation of ISGs after exposure to IFN-γ, but has probably no role in already stimulated cells. As we still observe strong rescue past day 3, when IFN-γ levels are already elevated (Nishiyama 2020 Parasitol Int), IST probably plays no dominant role. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 1 - it's not obvious what concentration of IFN-gamma is being used in these assays (sorry if this is stated somewhere else). 

      All in vitro experiments were performed with 100 U/ml IFN-γ as stated in the Material & Methods section, however added this information in the figure legend of Figure 1.

      (2) Figure 3 This reviewer wonders if earlier differences are buried in the data sets. In Figure 3b it looks like there are early differences but this is lost in the collated data analysis in 3c. An early difference is quite apparent in Figure 2. 

      We agree with the reviewer that a difference is visible at day 3 and 5 in Figure 3b, however differences between experimental groups became statistically significant only at day 7 in Figure 3c (N = 4 biological replicates). We cannot compare results between Figure 3c and Figure 2c as the latter reports 100% WT or ΔMYR1 infections and not 20:80 mixes.

      (3) The authors conclude from their in vitro studies that MYR-1 is not required for in vitro growth in IFN-g activated macrophages. Given that the WT parasites still rescue MYR KO parasites in RAG mice it does imply that this paracrine effect would impact early innate responses. Since RAG mice do have a strong ILC/NK cell response that leads to the local production of IFN-g it would seem like a reasonable candidate. Do the authors know if the MYR KO have improved growth in the absence of IFN-g in vivo? This could be done using KO mice or with IFN-g neutralization. 

      MYR1 displayed a neutral score in CRISPR screens in IFN-γ KO mice (Tachibana et al Cell Reports 2023), suggesting that lack of IFN-γ does not specifically improve MYR1 mutant growth compared to other mutants in a pool. We believe that the rescue is rather driven by other cytokines that have been shown to be altered in a MYR1 dependent manner (i.e CCL2, IL-6, IL-12). But as laid out before, this is subject of future studies.  

      This is a submission that might benefit from a graphical model of how the authors view this system working. 

      We agree with the reviewer and we added a graphical model to the manuscript. 

      Reviewer #2 (Recommendations for the authors): 

      The authors previously published a study that combines CRISPR screens in Toxoplasma and host transcriptome by scRNA-seq (Butterworth et al., Cell Host Microbe 2023). I think the authors possess transcriptome of ∆MYR1-infected HFFs. Although I understand this screen is conducted in in-vitro culture and human fibroblasts, are there any differentially expressed genes or pathways that could explain the paracrine rescue phenomenon described in this manuscript?

      We thank the reviewer for this insightful comment, which is however hard to address.  Thousands of host cell genes within multiple pathways are affected by MYR1 deletion (Naor et al. mBio 2018; Butterworth et al. Cell Host Microbe 2023). Therefore the PerturbSeq dataset is not helpful to pinpoint specific immune mechanisms of rescue, and is speculative without any experimentation to back it up. However, we added a sentence in line 350 of the discussion to highlight known MYR1-related effects on immune-related pathways. “Individual MYR-related effectors that may be responsible for the paracrine rescue have not been investigated here and we hypothesise that the phenotype is likely the concerted result of multiple effectors that affect cytokine secretion. For example, previous studies showed that both GRA18 and GRA28 can induce release of CCL22 from infected cells (He 2018 eLife; Rudzki 2021 mBio), while GRA16 and HCE1/TEEGR impair NF-kB signalling and the potential release of pro-inflammatory cytokines such as IL-6, IL-1β and TNF (Seo 2020 Int J Mol Sci; Braun 2019 Nat Microbiol). Regardless of the effector(s), our results highlight an important novel function of MYR1-dependent effectors by establishing a supportive environment in trans for Toxoplasma growth within the peritoneum.”

    1. eLife Assessment

      This study presents a valuable finding on a potential signaling pathway responsible for the direct effects of nicotine on intestinal stem cell growth and tumorigenesis. The evidence supporting the claims of the authors is solid. This research will be of interest to medical biologists specializing in intestinal tumors.

    2. Reviewer #1 (Public review):

      In their manuscript, authors Isotani et al used in vivo and ex vivo models to show that nicotine could promote stemness and tumorigenicity in murine model. The authors further provided data supporting that the effects of nicotine on stem cell proliferation and tumor initiation were mediated by the Hippo-YAP/TAZ and Notch signal pathway.

      The major strength of this study is the using a set of tools, including Lgr5 reporter mice (Lgr5-EGFP-IRES-CreERT2 mice), stem cell-specific Apc knockout mice (Lgr5CreER Apcfl/fl mice), organoids derived from these mice and chemical compounds (agonists and antagonists) to demonstrate nicotine affects stem cells rather than Paneth cells, leading to increased intestinal stemness and tumorigenicity. Whereas, all models are restricted to mice, lacking analysis of human samples or human intestinal organoids to prove the human relevance of these findings.

      Overall, the presented results support their conclusions. A previous study reported that nicotine acts through the α2β4 nAChR to enhance Wnt production by Paneth cells, which subsequently affects ISCs. In contrast, this manuscript demonstrated that nicotine directly promotes ISCs through α7-nAChR, independent of Paneth cells. Therefore, this manuscript offers novel insights into the mechanism of nicotine's effects on the mouse intestine.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Isotani et al characterizes the hyperproliferation of intestinal stem cells (ISCs) induced by nicotine treatment in vivo. Employing a range of small molecule inhibitors, the authors systematically investigated potential receptors and downstream pathways associated with nicotine-induced phenotypes through in vitro organoid experiments. Notably, the study specifically highlights a signaling cascade involving α7-nAChR/PKC/YAP/TAZ/Notch as a key driver of nicotine-induced stem cell hyperproliferation. Utilizing a Lgr5CreER Apcfl/fl mouse model, the authors extend their findings to propose a potential role of nicotine in stem cell tumorgenesis. The study posits that Notch signaling is essential during this process.

      Strengths and Weaknesses:

      One noteworthy research highlight in this study is the indication, as shown in Figure 2 and S2, that the trophic effect of nicotine on ISC expansion is independent of Paneth cells. In the Discussion section, the authors propose that this independence may be attributed to distinct expression patterns of nAChRs in different cell types. To further substantiate these findings, the authors provided qPCR analysis of nAchRs in ISCs and Paneth cells from isolated whole small intestine, indicating that α7-nAChR uniquely responds to nicotine treatment among various nAChRs. The authors further strengthen the clinical relevance of the study by exploring human scRNA-seq dataset, in which α7-nAChR is indeed also expressed in human ISCs and Paneth cells.

      As shown in the same result section, the effect of nicotine on ISC organoid formation appears to be independent of CHIR99021, a Wnt activator. In the Lgr5CreER Apcfl/fl mouse model, it is known that APC loss results in a constitutive stabilization of β-catenin, thus the hyperproliferation of ISCs by nicotine treatment in this mouse model is likely beyond Wnt activation. The authors have included such discussion.

      In Figure 4, the authors investigate ISC organoid formation with a pan-PKC inhibitor, revealing that PKC inhibition blocks nicotine-induced ISC expansion. It's noteworthy that PKC inhibitors have historically been used successfully to isolate and maintain stem cells by promoting self-renewal. Therefore, it is surprising to observe no or reversal effect on ISCs in this context. The authors have now included an additional PKC inhibitor Sotrastaurin to confirm the role of PKC in nicotine-induced ISC expansion.

      Overall, the manuscript has provided sufficient experimental evidence to address my concerns and also significantly enhanced its quality.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths and weaknesses:

      Although the revised manuscript has significantly improved in the quality of pictures, there seems to be still a discrepancy in Figure 2A: quantification result suggested that NIC (1um) treatment increased the number of colonies from 300 to around 450 (1.5 folds), whereas representative picture shown that the difference was 3 to 12 living organoids (4 folds).

      As reviewer points out, the selected picture was not representative image of “control” group in Figure2A. We replaced it by the new representative image in this revised version.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      A minor point to be corrected:

      Please consider removing "In consistent with this notion", which is repetitive with "Similarly".

      " NIC is supposed to activate Wnt signaling via Hippo-YAP/TAZ and Notch signaling. In consistent with this notion. Similarly, the expression of target proteins (Sox9, TCF4 and, C-myc)..."

      We corrected it according to the reviewer’s suggestion.

    1. eLife Assessment

      This valuable study highlights how the diversity of the malaria parasite population diminishes following the initiation of effective control interventions but quickly rebounds as control wanes. The data presented is convincing and the work shows how genetic studies could be used to monitor changes in disease transmission.

    2. Reviewer #2 (Public review):

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebound more slowly than prevalence measures. This adds to a growing literature that demonstrates the relevance of asymptomatic reservoirs.

      Strengths:

      Overall, I found these results clear, convincing, and well-presented. There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods, particularly in regions with high diversity/transmission. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric.

      Weaknesses:

      While I understand the conceptual importance of distinguishing among parasite prevalence, mean MOI, and absolute parasite number, I am not fully convinced by this manuscript's implementation of "census population size". The authors reference the population genetic literature, but within the context of that field, "census population size" refers to the total population size (which, if not formally counted, can be extrapolated) as opposed to "effective population" size, which accounts for a multitude of demographic factors. There is often interesting biology to be gleaned from the magnitude of difference between N and Ne. In this manuscript, however, "census population size" is used to describe the number of distinct parasites detected within a sample, not a population. As a result, the counts do not have an immediate population genetic interpretation and cannot be directly compared to Ne. This doesn't negate their usefulness but does complicate the use of a standard population genetic term. In contrast, I think that sample parasite count will be most useful in an epidemiological context, where the total number of sampled parasites can be contrasted with other metrics to help us better understand how parasites are divided across hosts, space and time. However, for this use, I find it problematic that the metric does not appear to correct for variations in participant number. For instance, in this study, participant numbers especially varied across time for 1-5 year-olds (N=356, 216, 405, and 354 in 2012, 2014, 2015, and 2017 respectively). This sample size variability is accounted for with other metrics like mean MOI. In sum, while the manuscript opens up an interesting discussion, I'm left with an incomplete understanding of the robustness and interpretability of the new proposed metric.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions.

      Strengths:

      This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population.

      Weaknesses:

      None

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Tiedje et al. investigated the transient impact of indoor residual spraying (IRS) followed by seasonal malaria chemoprevention (SMC) on the plasmodium falciparum parasite population in a high transmission setting. The parasite population was characterized by sequencing the highly variable DBL$\alpha$ tag as a proxy for var genes, a method known as varcoding. Varcoding presents a unique opportunity due to the extraordinary diversity observed as well as the extremely low overlap of repertoires between parasite strains. The authors also present a new Bayesian approach to estimating individual multiplicity of infection (MOI) from the measured DBL$\alpha$ repertoire, addressing some of the potential shortcomings of the approach that have been previously discussed. The authors also present a new epidemiological endpoint, the so-called "census population size", to evaluate the impact of interventions. This study provides a nice example of how varcoding technology can be leveraged, as well as the importance of using diverse genetic markers for characterizing populations, especially in the context of high transmission. The data are robust and clearly show the transient impact of IRS in a high transmission setting, however, some aspects of the analysis are confusing.

      (1) Approaching MOI estimation with a Bayesian framework is a well-received addition to the varcoding methodology that helps to address the uncertainty associated with not knowing the true repertoire size. It's unfortunate that while the authors clearly explored the ability to estimate the population MOI distribution, they opted to use only MAP estimates. Embracing the Bayesian methodology fully would have been interesting, as the posterior distribution of population MOI could have been better explored. 

      We thank the reviewer for appreciating the extension of var_coding we present here. We believe the comment on maximum _a posteriori (MAP) refers to the way we obtained population-level MOI from the individual MOI estimates. We would like to note that reliance on MAP was only one of two approaches we described, although we then presented only MAP.  Having calculated both, we did not observe major differences between the two, for this data set.  Nonetheless, we revised the manuscript to include the result based on the mixture distribution which considers all the individual MOI distributions in the Figure supplement 6.

      (2) The "census population size" endpoint has unclear utility. It is defined as the sum of MOI across measured samples, making it sensitive to the total number of samples collected and genotyped. This means that the values are not comparable outside of this study, and are only roughly comparable between strata in the context of prevalence where we understand that approximately the same number of samples were collected. In contrast, mean MOI would be insensitive to differences in sample size, why was this not explored? It's also unclear in what way this is a "census". While the sample size is certainly large, it is nowhere near a complete enumeration of the parasite population in question, as evidenced by the extremely low level of pairwise type sharing in the observed data. 

      We consider the quantity a census in that it is a total enumeration or count of infections in a given population sample and over a given time period. In this sense, it gives us a tangible notion of the size of the parasite population, in an ecological sense, distinct from the formal effective population size used in population genetics. Given the low overlap between var repertoires of parasites (as observed in monoclonal infections), the population size we have calculated translates to a diversity of strains or repertoires.  But our focus here is in a measure of population size itself.  The distinction between population size in terms of infection counts and effective population size from population genetics has been made before for pathogens (see for example Bedford et al. for the seasonal influenza virus and for the measles virus (Bedford et al., 2011)), and it is also clear in the ecological literature for non-pathogen populations (Palstra and Fraser, 2012). 

      We completely agree with the dependence of our quantity on sample size. We used it for comparisons across time of samples of the same depth, to describe the large population size characteristic of high transmission which persists across the IRS intervention. Of course, one would like to be able to use this quantity across studies that differ in sampling depth and the reviewer makes an insightful and useful suggestion.  It is true that we can use mean MOI, and indeed there is a simple map between our population size and mean MOI (as we just need to divide or multiply by sample size, respectively) (Table supplement 7).  We can go further, as with mean MOI we can presumably extrapolate to the full sample size of the host population, or to the population size of another sample in another location. What is needed for this purpose is a stable mean MOI relative to sample size.  We can show that indeed in our study mean MOI is stable in that way, by subsampling to different depths our original sample (Figure supplement 8 in the revised manuscript). We now include in the revision discussion of this point, which allows an extrapolation of the census population size to the whole population of hosts in the local area.

      We have also clarified the time denominator: Given the typical duration of infection, we expect our population size to be representative of a per-generation measure_._

      (3) The extraordinary diversity of DBL$\alpha$ presents challenges to analyzing the data. The authors explore the variability in repertoire richness and frequency over the course of the study, noting that richness rapidly declined following IRS and later rebounded, while the frequency of rare types increased, and then later declined back to baseline levels. The authors attribute this to fundamental changes in population structure. While there may have been some changes to the population, the observed differences in richness as well as frequency before and after IRS may also be compatible with simply sampling fewer cases, and thus fewer DBL$\alpha$ sequences. The shift back to frequency and richness that is similar to pre-IRS also coincides with a similar total number of samples collected. The authors explore this to some degree with their survival analysis, demonstrating that a substantial number of rare sequences did not persist between timepoints and that rarer sequences had a higher probability of dropping out. This might also be explained by the extreme stochasticity of the highly diverse DBL$\alpha$, especially for rare sequences that are observed only once, rather than any fundamental shifts in the population structure.

      We thank the reviewer raising this question which led us to consider whether the change in the number of DBLα types over the course of the study (and intervention) follows from simply sampling fewer P. falciparum cases. We interpreted this question as basically meaning that one can predict the former from the latter in a simple way, and that therefore, tracking the changes in DBLα type diversity would be unnecessary.  A simple map would be for example a linear relationship (a given proportion of DBLα types lost given genomes lost), and even more trivially, a linear loss with a slope of one (same proportion).  Note, however, that for such expectations, one needs to rely on some knowledge of strain structure and gene composition. In particular, we would need to assume a complete lack of overlap and no gene repeats in a given genome. We have previously shown that immune selection leads to selection for minimum overlap and distinct genes in repertoires at high transmission (see for example (He et al., 2018)) for theoretical and empirical evidence of both patterns). Also, since the size of the gene pool is very large, even random repertoires would lead to limited overlap (even though the empirical overlap is even smaller than that expected at random (Day et al., 2017)). Despite these conservators, we cannot a priori assume a pattern of complete non-overlap and distinct genes, and ignore plausible complexities introduced by the gene frequency distribution.  

      To examine this insightful question, we simulated the loss of a given proportion of genomes from baseline in 2012 and examined the resulting loss of DBLα types. We specifically cumulated the loss of infections in individuals until it reached a given proportion (we can do this on the basis of the estimated individual MOI values). We repeated this procedure 500 times for each proportion, as the random selection of individual infection to be removed, introduces some variation. Figure 2 below shows that the relationship is nonlinear, and that one quantity is not a simple proportion of the other.  For example, the loss of half the genomes does not result in the loss of half the DBLα types. 

      Author response image 1.

      Non-linear relationship between the loss of DBLα types and the loss of a given proportion of genomes. The graph shows that the removal of parasite genomes from the population through intervention does not lead to the loss of the same proportion of DBLα types, as the initial removal of genomes involves the loss of rare DBLα types mostly whereas common DBLα types persist until a high proportion of genomes are lost. The survey data (pink dots) used for this subsampling analysis was sampled at the end of wet/high transmission season in Oct 2012 from Bongo District from northern Ghana. We used the Bayesian formulation of the _var_coding method proposed in this work to calculate the multiplicity of infection of each isolate to further obtain the total number of genomes. The randomized surveys (black dots) were obtained based on “curveball algorithm” (Strona et al., 2014) which keep isolate lengths and type frequency distribution.

      We also investigated whether the resulting pattern changed significantly if we randomized the composition of the isolates.  We performed such randomization with the “curveball algorithm” (Strona et al., 2014). This algorithm randomizes the presence-absence matrix with rows corresponding to the isolates and columns, to the different DBLα types; importantly, it preserves the DBLα type frequency and the length of isolates. We generated 500 randomizations and repeated the simulated loss of genomes as above. The data presented in Figure 2 above show that the pattern is similar to that obtained for the empirical data presented in this study in Ghana. We interpret this to mean that the number of genes is so large, that the reduced overlap relative to random due to immune selection (see (Day et al., 2017)) does not play a key role in this specific pattern. 

      Reviewer #2 (Public Review):  

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebounds more slowly than prevalence measures. Overall, I found these results clear, convincing, and well-presented. They add to a growing literature that demonstrates the relevance of asymptomatic reservoirs.  There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric. However, I am not fully convinced the current implementation will be applied meaningfully across additional studies. 

      (1) I find the term "census population size" problematic as the groups being analyzed (hosts grouped by age at a single time point) do not delineate distinct parasite populations. Separate parasite lineages are not moving through time within these host bins. Rather, there is a single parasite population that is stochastically divided across hosts at each time point. I find this distinction important for interpreting the results and remaining mindful that the 2,000 samples at each time point comprise a subsample of the true population. Instead of "census population size", I suggest simplifying it to "census count" or "parasite lineage count".  It would be fascinating to use the obtained results to model absolute parasite numbers at the whole population level (taking into account, for instance, the age structure of the population), and I do hope this group takes that on at some point even if it remains outside the scope of this paper. Such work could enable calculations of absolute---rather than relative---fitness and help us further understand parasite distributions across hosts.

      Lineages moving exclusively through a given type of host or “patch”  are not a necessary requirement for enumerating the size of the total infections in such subset.  It is true that what we have is a single parasite population, but we are enumerating for the season the respective size in host classes (children and adults). This is akin to enumerating subsets of a population in ecological settings where one has multiple habitat patches, with individuals able to move across patches.

      Remaining mindful that the count is relative to sample size is an important point. Please see our response to comment (2) of reviewer 1, also for the choice of terminology. We prefer not to adopt “census count” as a census in our mind is a count, and we are not clear on the concept of lineage for these highly recombinant parasites.  Also, census population size has been adopted already in the literature for both pathogens and non-pathogens, to make a distinction with the notion of effective population size in population genetics (see our response to reviewer 1) and is consistent with our usage as outlined in the introduction. 

      Thank you for the comment on an absolute number which would extrapolate to the whole host population.  Please see again our response to comment (2) of reviewer 1, on how we can use mean MOI for this purpose once the sampling is sufficient for this quantity to become constant/stable with sampling effort.

      (2) I'm uncertain how to contextualize the diversity results without taking into account the total number of samples analyzed in each group. Because of this, I would like a further explanation as to why the authors consider absolute parasite count more relevant than the combined MOI distribution itself (which would have sample count as a denominator). It seems to me that the "per host" component is needed to compare across age groups and time points---let alone different studies.

      Again, thank you for the insightful comment. We provide this number as a separate quantity and not a distribution, although it is clearly related to the mean MOI of such distribution. It gives a tangible sense for the actual infection count (different from prevalence) from the perspective of the parasite population in the ecological sense. The “per host” notion which enables an extrapolation to any host population size for the purpose of a complete count, or for comparison with another study site, has been discussed in the above responses for reviewer 1 and now in the revision of the discussion.

      (3) Thinking about the applicability of this approach to other studies, I would be interested in a larger treatment of how overlapping DBLα repertoires would impact MOIvar estimates. Is there a definable upper bound above which the method is unreliable? Alternatively, can repertoire overlap be incorporated into the MOI estimator? 

      This is a very good point and one we now discuss further in our revision. There is no predefined upper bound one can present a priori. Intuitively, the approach to estimate MOI would appear to breakdown as overlap moves away from extremely low values, and therefore for locations with low transmission intensity.  Interestingly, we have observed that this is not the case in our paper by Labbe et al. (Labbé et al., 2023) where we used model simulations in a gradient of three transmission intensities, from high to low values. The original _var_coding method performed well across the gradient. This robustness may arise from a nonlinear and fast transition from low to high overlap that is accompanied by MOI changing rapidly from primarily multiclonal (MOI > 1) to monoclonal (MOI = 1). This matter clearly needs to be investigated further, including ways to extend the estimation to explicitly include the distribution of overlap.

      Smaller comments:

      - Figure 1 provides confidence intervals for the prevalence estimates, but these aren't carried through on the other plots (and Figure 5 has lost CIs for both metrics). The relationship between prevalence and diversity is one of the interesting points in this paper, and it would be helpful to have CIs for both metrics when they are directly compared. 

      Based on the reviewer’s advice we have revised both Figure 4 and Figure 5, to include the missing uncertainty intervals. The specific approach for each quantity is described in the corresponding caption.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions. 

      Strengths: 

      This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age-stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population. 

      Census population size is complementary to parasite prevalence where the former gives a measure of the “parasite population size”, and the latter describes the “proportion of infected hosts”.  The reason we see similar trends for the “genetic information” (i.e., census population size) and “age-specific parasite prevalence” is because we identify all samples for var_coding based on the microscopy (i.e., all microscopy positive _P. falciparum isolates). But what is more relevant here is the relative percentage change in parasite prevalence and census population size following the IRS intervention. To make this point clearer in the revised manuscript we have updated Figure 4 and included additional panels plotting this percentage change from the 2012 baseline, for both census population size and prevalence (Figure 4EF). Overall, we see a greater percentage change in 2014 (and 2015), relative to the 2012 baseline, for census parasite population size vs. parasite prevalence (Figure 4EF) as a consequence of the significant changes in distributions of MOI following the IRS intervention (Figure 3). As discussed in the Results following the deployment of IRS in 2014 census population size decreased by 72.5% relative to the 2012 baseline survey (pre-IRS) whereas parasite prevalence only decreased by 54.5%. 

      With respect to the reviewer’s comment on “practicalities and cost”, var_coding has been used to successfully amplify _P. falciparum DNA collected as DBS that have been stored for more than 5-years from both clinical and lower density asymptomatic infection, without the additional step and added cost of sWGA ($8 to $32 USD per isolates, for costing estimates see (LaVerriere et al., 2022; Tessema et al., 2020)), which is currently required by other molecular surveillance methods (Jacob et al., 2021; LaVerriere et al., 2022; Oyola et al., 2016). _Var_coding involves a single PCR per isolate using degenerate primers, where a large number of isolates can be multiplexed into a single pool for amplicon sequencing.  Thus, the overall costs for incorporating molecular surveillance with _var_coding are mainly driven by the number of PCRs/clean-ups, the number samples indexed per sequencing run, and the NGS technology used (discussed in more detail in our publication Ghansah et al. (Ghansah et al., 2023)). Previous work has shown that _var_coding can be use both locally and globally for molecular surveillance, without the need to be customized or updated, thus it can be fairly easily deployed in malaria endemic regions (Chen et al., 2011; Day et al., 2017; Rougeron et al., 2017; Ruybal-Pesántez et al., 2022, 2021; Tonkin-Hill et al., 2021).

      Weaknesses: 

      Overall the manuscript is well-written and generally comprehensively explained. Some terms could be clarified to help the reader and I had some issues with a section of the methods and some of the more definitive statements given the evidence supporting them. 

      Thank you for the overall positive assessment. On addressing the “issues with a section of the methods” and “some of the more definitive statements given the evidence supporting them”, it is impossible to do so however, without an explicit indication of which methods and statements the reviewer is referring to. Hopefully, the answers to the detailed comments and questions of reviewers 1 and 2 address any methodological concerns (i.e., in the Materials and Methods and Results). To the issue of “definitive statements”, etc. we are unable to respond without further information.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 273: there is a reference to a figure which supports the empirical distribution of repertoire given MOI = 1, but the figure does not appear to exist.

      We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing this to our attention.

      Line 299: while this likely makes little difference, an insignificant result from a Kolmogorov-Smirnov test doesn't tell you if the distributions are the same, it only means there is not enough evidence to determine they are different (i.e. fail to reject the null). Also, what does the "mean MOI difference" column in supplementary table 3 mean? 

      The mean MOI difference is the difference in the mean value between the pairwise comparison of the true population-level MOI distribution, that of the population-level MOI estimates from either pooling the maximum a posteriori (MAP) estimates per individual host or the mixture distribution, or that of the population-level MOI estimates from different prior choices. This is now clarified as requested in the Table supplements 3 - 6. 

      Figure 4: how are the confidence intervals for the estimated number of var repertoires calculated? Also should include horizontal error bars for prevalence measures.

      The confidence intervals were calculated based on a bootstrap approach. We re-sampled 10,000 replicates from the original population-level MOI distribution with replacement. Each resampled replicate is the same size as the original sample. We then derive the 95% CI based on the distribution of the mean MOI of those resampled replicates. This is now clarified as requested in the Figure 4 caption (as well as Table supplement 7 footnotes). In addition, we have also updated Figure 4AB and have included the 95% CI for all measures for clarity. 

      Reviewer #2 (Recommendations For The Authors): 

      -  I would like to see a plot like Supplemental Figure 8 for the upsA DBLα repertoire size. 

      The upsA repertoire size for each survey and by age group has now been provided as requested in Figure supplement 5AB. 

      -  Supplemental Table 2 is cut off in the pdf. 

      We have now resolved this issue so that the Table supplement 2 is no longer cut off.  

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript terms the phrase "census population size". To me, the census is all about the number of individuals, not necessarily their diversity. I appreciate that there is no simple term for this, and I imagine the authors have considered many alternatives, but could it be clearer to say the "genetic census population size"? For example, I found the short title not particularly descriptive "Impact of IRS and SMC on census population size", which certainly didn't make me think of parasite diversity.

      Please see our response to comment (2) of reviewer 1. We prefer not to add “genetic” to the phrase as the distinction from effective population size from population genetics is important, and the quantity we are after is an ecological one. 

      The authors do not currently say much about the potential biases in the genetic data and how this might influence results. It seems likely that because (i) patients with sub-microscopic parasitaemia were not sampled and (ii) because a moderate number of (likely low density) samples failed to generate genetic data, that the observed MOI is an overestimate. I'd be interested to hear the authors' thoughts about how this could be overcome or taken into account in the future. 

      We thank the reviewer for this this comment and agree that this is an interesting area for further consideration. However, based on research from the Day Lab that is currently under review (Tan et al. 2024, under review), the estimated MOI using the Bayesian approach is likely not an “overestimate” but rather an “underestimate”. In this research by Tan et al. (2024) isolate MOI was estimated and compared using different initial whole blood volumes (e.g., 1, 10, 50, 100 uL) for the gDNA extraction. Using _var_coding and comparing these different volumes it was found that MOI was significantly “underestimated” when small blood volumes were used for the gDNA extraction, i.e., there was a ~3-fold increase in median MOI between 1μL and 100μL blood. Ultimately these findings will allow us to make computational corrections so that more accurate estimates of MOI can be obtained from the DBS in the future.

      The authors do not make much of LLIN use and for me, this can explain some of the trends. The first survey was conducted soon after a mass distribution whereas the last was done at least a year after (when fewer people would have been using the nets which are older and less effective). We have also seen a rise in pyrethroid resistance in the mosquito populations of the area which could further diminish the LLIN activity. This difference in LLIN efficacy between the first and last survey could explain similar prevalence, yet lower diversity (in Figures 4B/5). However, it also might mean that statements such as Line 478 "This is indicative of a loss of immunity during IRS which may relate to the observed loss of var richness, especially the many rare types" need to be tapered as the higher prevalence observed in this age group could be caused by lower LLIN efficacy at the time of the last survey, not loss of immunity (though both could be true).  

      We thank the reviewer for this question and agree that (i) LLIN usage and (ii) pyrethroid resistance are important factors to consider. 

      (i) Over the course of this study self-reported LLIN usage the previous night remained high across all age groups in each of the surveys (≥ 83.5%), in fact more participants reported sleeping under an LLIN in 2017 (96.8%) following the discontinuation of IRS compared to the 2012 baseline survey (89.1%). This increase in LLIN usage in 2017 is likely a result of several factors including a rebound in the local vector population making LLINs necessary again, increased community education and/or awareness on the importance of using LLINs, among others. Information on the LLINs (i.e., PermaNet 2.0, Olyset, or DawaPlus 2.0) distributed and participant reported usage the previous night has now been included in the Materials and Methods as requested by the reviewer.

      (ii) As to the reviewer’s question on increased in pyrethroid resistance in Ghana over the study period, research undertaken by our entomology collaborators (Noguchi Memorial Insftute for Medical Research: Profs. S. Dadzie and M. Appawu; and Navrongo Health Research Centre:  Dr. V. Asoala) has shown that pyrethroid resistance is a major problem across the country, including the Upper East Region. Preliminary studies from Bongo District (2013 - 2015), were undertaken to monitor for mutations in the voltage gated sodium channel gene that have been associated with knockdown resistance to pyrethroids and DDT in West Africa (kdr-w). Through this analysis the homozygote resistance kdr-w allele (RR) was found in 90% of An. gambiae s.s. samples tested from Bongo, providing evidence of high pyrethroid resistance in Bongo District dating back to 2013, i.e., prior to the IRS intervention (S. Dadzie, M. Appawu, personal communication). Although we do not have data in Bongo District on kdr-w from 2017 (i.e., post-IRS), we can hypothesize that pyrethroid resistance likely did not decline in the area, given the widespread deployment and use of LLINs.

      Thus, given this information that (i) self-reported LLIN usage remained high in all surveys (≥ 83.5%), and that (ii) there was evidence of high pyrethroid resistance in 2013 (i.e., kdr-w (RR) _~_90%), the rebound in prevalence observed for the older age groups (i.e., adolescents and adults) in 2017 is therefore best explained by a loss of immunity.

      I must confess I got a little lost with some of the Bayesian model section methods and the figure supplements. Line 272 reads "The measurement error is simply the repertoire size distribution, that is, the distribution of the number of non-upsA DBLα types sequenced given MOI = 1, which is empirically available (Figure supplement 3)." This does not appear correct as this figure is measuring kl divergence. If this is not a mistake in graph ordering please consider explaining the rationale for why this graph is being used to justify your point. 

      We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing our attention to this matter. We hope that the inclusion of this Figure as well as a more detailed description of the Bayesian approach helps to makes this section in the Materials and Methods clearer for the reader. 

      I was somewhat surprised that the choice of prior for estimating the MOI distribution at the population level did not make much difference. To me, the negative binomial distribution makes much more sense. I was left wondering, as you are only measuring MOI in positive individuals, whether you used zero truncated Poisson and zero truncated negative binomial distributions, and if not, whether this was a cause of a lack of difference between uniform and other priors. 

      Thank you for the relevant question. We have indeed considered different priors and the robustness of our  estimates to this choice and have now better described this in the text. We focused on individuals who had a confirmed microscopic asymptomatic P. falciparum infection for our MOI estimation, as median P. falciparum densities were overall low in this population during each survey (i.e., median ≤ 520 parasites/µL, see Table supplement 1). Thus, we used either a uniform prior excluding zero or a zero truncated negative binomial distribution when exploring the impact of priors on the final population-level MOI distribution.  A uniform prior and a zero-truncated negative binomial distribution with parameters within the range typical of high-transmission endemic regions (higher mean MOI with tails around higher MOI values) produce similar MOI  estimates at both the individual and population level. However, when setting the parameter range of the zero-truncated negative binomial to be of those in low transmission endemic regions where the empirical MOI distribution centers around mono-clonal infections with the majority of MOI = 1 or 2 (mean MOI » 1.5, no tail around higher MOI values), the final population-level MOI distribution does deviate more from that assuming the aforementioned prior and parameter choices. The final individual- and population-level MOI estimates are not sensitive to the specifics of the prior MOI distribution as long as this distribution captures the tail around higher MOI values with above-zero probability.   

      The high MOI in children <5yrs in 2017 (immediately after SMC) is very interesting. Any thoughts on how/why? 

      This result indicates that although the prevalence of asymptomatic P. falciparum infections remained significantly lower for the younger children targeted by SMC in 2017 compared 2012, they still carried multiclonal infections, as the reviewer has pointed out (Figure 3B). Importantly this upward shift in the MOI distributions (and median MOI) was observed in all age groups in 2017, not just the younger children, and provides evidence that transmission intensity in Bongo has rebounded in 2017, 32-months a er the discontinuation of IRS.  This increase in MOI for younger children at first glance may seem to be surprising, but instead likely shows the limitations of SMC to clear and/or supress the establishment of newly acquired infections, particularly at the end of the transmission season following the final cycle of SMC (i.e., end of September 2017 in Bongo District; NMEP/GHS, personal communication) when the posttreatment prophylactic effects of SMC would have waned (Chotsiri et al., 2022).  

      Line 521 in the penultimate paragraph says "we have analysed only low density...." should this not be "moderate" density, as low density infections might not be detected? The density range itself is not reported in the manuscript so could be added. 

      In Table supplement 1 we have provided the median, including the inter-quartile range, across each survey by age group. For the revision we have now provided the density min-max range, as requested by the reviewer. Finally, we have revised the statement in the discussion so that it now reads “….we have analysed low- to moderate-density, chronic asymptomatic infections (see Table supplement 1)……”.   

      Data availability - From the text the full breakdown of the epidemiological survey does not appear to be available, just a summary of defined age bounds in the SI. Provision of these data (with associated covariates such as parasite density and host characteristics linked to genetic samples) would facilitate more in-depth secondary analyses. 

      To address this question, we have updated the “Data availability statement” section with the following statement: “All data associated with this study are available in the main text, the Supporting Information, or upon reasonable request for research purposes to the corresponding author, Prof. Karen Day (karen.day@unimelb.edu.au).”  

      REFERENCES

      Bedford T, Cobey S, Pascual M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol 11. doi:10.1186/1471-2148-11-220

      Chen DS, Barry AE, Leliwa-Sytek A, Smith T-AA, Peterson I, Brown SM, Migot-Nabias F, Deloron P, Kortok MM, Marsh K, Daily JP, Ndiaye D, Sarr O, Mboup S, Day KP. 2011. A molecular epidemiological study of var gene diversity to characterize the reservoir of Plasmodium falciparum in humans in Africa. PLoS One 6:e16629. doi:10.1371/journal.pone.0016629

      Chotsiri P, White NJ, Tarning J. 2022. Pharmacokinetic considerations in seasonal malaria chemoprevention. Trends Parasitol. doi:10.1016/j.pt.2022.05.003

      Day KP, Artzy-Randrup Y, Tiedje KE, Rougeron V, Chen DS, Rask TS, Rorick MM, Migot-Nabias F, Deloron P, Luty AJF, Pascual M. 2017. Evidence of Strain Structure in Plasmodium falciparum Var Gene Repertoires in Children from Gabon, West Africa. PNAS 114:E4103–E4111. doi:10.1073/pnas.1613018114

      Ghansah A, Tiedje KE, Argyropoulos DC, Onwona CO, Deed SL, Labbé F, Oduro AR, Koram KA, Pascual M, Day KP. 2023. Comparison of molecular surveillance methods to assess changes in the population genetics of Plasmodium falciparum in high transmission. Fron9ers in Parasitology 2:1067966. doi: 10.3389/fpara.2023.1067966

      He Q, Pilosof S, Tiedje KE, Ruybal-Pesántez S, Artzy-Randrup Y, Baskerville EB, Day KP, Pascual M. 2018. Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum. Nat Commun 9:1817. doi:10.1038/s41467-018-04219-3

      Jacob CG, Thuy-nhien N, Mayxay M, Maude RJ, Quang HH, Hongvanthong B, Park N, Goodwin S, Ringwald P, Chindavongsa K, Newton P, Ashley E. 2021. Genetic surveillance in the Greater Mekong subregion and South Asia to support malaria control and elimination. Elife 10:1–22.

      Labbé F, He Q, Zhan Q, Tiedje KE, Argyropoulos DC, Tan MH, Ghansah A, Day KP, Pascual M. 2023. Neutral vs . non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19:e1010816. doi:doi.org/10.1101/2022.06.27.497801

      LaVerriere E, Schwabl P, Carrasquilla M, Taylor AR, Johnson ZM, Shieh M, Panchal R, Straub TJ, Kuzma R, Watson S, Buckee CO, Andrade CM, Portugal S, Crompton PD, Traore B, Rayner JC, Corredor V, James K, Cox H, Early AM, MacInnis BL, Neafsey DE. 2022. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: A malaria case study. Mol Ecol Resour 2285–2303. doi:10.1111/1755-0998.13622

      Oyola SO, Ariani C V., Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, Rutledge GG, Redmond S, Manske M, Jyothi D, Jacob CG, Ogo TD, Rockeg K, Newbold CI, Berriman M, Kwiatkowski DP. 2016. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selecFve whole genome amplification. Malar J 15:1–12. doi:10.1186/s12936-016-1641-7

      Palstra FP, Fraser DJ. 2012. Effective/census population size ratio estimation: A compendium and appraisal. Ecol Evol 2:2357–2365. doi:10.1002/ece3.329

      Rougeron V, Tiedje KE, Chen DS, Rask TS, Gamboa D, Maestre A, Musset L, Legrand E, Noya O, Yalcindag E, Renaud F, Prugnolle F, Day KP. 2017. Evolutionary structure of Plasmodium falciparum major variant surface antigen genes in South America : Implications for epidemic transmission and surveillance. Ecol Evol 7:9376–9390. doi:10.1002/ece3.3425

      Ruybal-Pesántez S, Sáenz FE, Deed S, Johnson EK, Larremore DB, Vera-Arias CA, Tiedje KE, Day KP. 2021. Clinical malaria incidence following an outbreak in Ecuador was predominantly associated with Plasmodium falciparum with recombinant variant antigen gene repertoires. medRxiv.

      Ruybal-Pesántez S, Tiedje KE, Pilosof S, Tonkin-Hill G, He Q, Rask TS, Amenga-Etego L, Oduro AR, Koram KA, Pascual M, Day KP. 2022. Age-specific patterns of DBLa var diversity can explain why residents of high malaria transmission areas remain susceptible to Plasmodium falciparum blood stage infection throughout life. Int J Parasitol 20:721–731.

      Strona G, Nappo D, Boccacci F, Fagorini S, San-Miguel-Ayanz J. 2014. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nat Commun 5. doi:10.1038/ncomms5114

      Tessema SK, Hathaway NJ, Teyssier NB, Murphy M, Chen A, Aydemir O, Duarte EM, Simone W, Colborn J, Saute F, Crawford E, Aide P, Bailey JA, Greenhouse B. 2020. Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. Journal of Infec9ous Diseases 225:1227–1237.

      Tonkin-Hill G, Ruybal-Pesántez S, Tiedje KE, Rougeron V, Duffy MF, Zakeri S, Pumpaibool T, Harnyuganakorn P, Branch OH, Ruiz-Mesıa L, Rask TS, Prugnolle F, Papenfuss AT, Chan Y, Day KP. 2021. Evolutionary analyses of the major variant surface antigen-encoding genes reveal population structure of Plasmodium falciparum within and between continents. PLoS Genet 7:e1009269. doi:10.1371/journal.pgen.1009269

    1. eLife Assessment

      This valuable study characterizes the molecular signatures and function of a type of enteric neuron (IPAN) in the mouse colon, identifying molecular markers (Cdh6 and Cdh8) for these cells. A battery of solid experimental findings suggest data from other species are likely translatable to mice, bridging the abundant literature from humans and other mammals into this experimentally tractable animal model, but the data establishing the role of Cdh6 in synapses among IPANs and in cell-cell contacts with non-neuronal cells is incomplete. This work will be of interest to scientists studying the motor control of the colon and more generally the enteric neuromuscular system.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Gomez-Frittelli and colleagues characterize the expression of cadherin6 (and -8) in colonic IPANs of mice. Moreover, they found that these cdh6-expressing IPANs are capable of initiating colonic motor complexes in the distal colon, but not proximal and midcolon. They support their claim by morphological, electrophysiological, optogenetic, and pharmacological experiments.

      Strengths:

      The work is very impressive and involves several genetic models and state-of-the-art physiological setups including respective controls. It is a very well-written manuscript that truly contributes to our understanding of GI-motility and its anatomical and physiological basis. The authors were able to convincingly answer their research questions with a wide range of methods without overselling their results.

      Weaknesses:

      The authors put quite some emphasis on stating that cdh6 is a synaptic protein (in the title and throughout the text), which interacts in a homophilic fashion. They deduct that cdh6 might be involved in IPAN-IPAN synapses (line 247ff.). However, Cdh6 does not only interact in synapses and is expressed by non-neuronal cells as well (see e.g., expression in the proximal tubuli of the kidney). Moreover, cdh6 does not only build homodimers, but also heterodimers with Chd9 as well as Cdh7, -10, and -14 (see e.g., Shimoyama et al. 2000, DOI: 10.1042/0264-6021:3490159). It would therefore be interesting to assess the expression pattern of cdh6-proteins using immunostainings in combination with synaptic markers to substantiate the authors' claim or at least add the possibility of cell-cell-interactions other than synapses to the discussion. Additionally, an immunostaining of cdh6 would confirm if the expression of tdTomato in smooth muscle cells of the cdh6-creERT model is valid or a leaky expression (false positive).

    3. Reviewer #2 (Public review):

      Summary:

      Intrinsic primary afferent neurons are an interesting population of enteric neurons that transduce stimuli from the mucosa, initiate reflexive neurocircuitry involved in motor and secretory functions, and modulate gut immune responses. The morphology, neurochemical coding, and electrophysiological properties of these cells have been relatively well described in a long literature dating back to the late 1800's but questions remain regarding their roles in enteric neurocircuitry, potential subsets with unique functions, and contributions to disease. Here, the authors provide RNAscope, immunolabeling, electrophysiological, and organ function data characterizing IPANs in mice and suggest that Cdh6 is an additional marker of these cells.

      Strengths:

      This paper would likely be of interest to a focused enteric neuroscience audience and increase information regarding the properties of IPANs in mice. These data are useful and suggest that prior data from studies of IPANs in other species are likely translatable to mice.

      Weaknesses:

      The advance presented here beyond what is already known is minimal. Some of the core conclusions are overstated and there are multiple other major issues that limit enthusiasm. Key control experiments are lacking and data do not specifically address the properties of the proposed Cdh6+ population.

      Major weaknesses:

      (1) The novelty of this study is relatively low. The main point of novelty suggests an additional marker of IPANs (Cdh6) that would add to the known list of markers for these cells. How useful this would be is unclear. Other main findings basically confirm that IPANs in mice display the same classical characteristics that have been known for many years from studies in guinea pigs, rats, mice and humans.

      (2) Some of the main conclusions of this study are overstated and claims of priority are made that are not true. For example, the authors state in lines 27-28 of the abstract that their findings provide the "first demonstration of selective activation of a single neurochemical and functional class of enteric neurons". This is certainly not true since Gould et al (AJP-GIL 2019) expressed ChR2 in nitrergic enteric neurons and showed that activating those cells disrupted CMC activity. In fact, prior work by the authors themselves (Hibberd et al Gastro 2018) showed that activating calretinin neurons with ChR2 evoked motor responses. Work by other groups has used chemogenetics and optogenetics to show the effects of activating multiple other classes of neurons in the gut.

      (3) Critical controls are needed to support the optogenetic experiments. Control experiments are needed to show that ChR2 expression a) does not change the baseline properties of the neurons, b) that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons, and c) that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions focused on here.

      (4) The electrophysiological characterization of mouse IPANs is useful but this is a basic characterization of any IPAN and really says nothing specifically about Cdh6+ neurons. The electrophysiological characterization was also only done in a small fraction of colonic IPANs, and it is not clear if these represent cell properties in the distal colon or proximal colon, and whether these properties might be extrapolated to IPANs in the different regions. Similarly, blocking IH with ZD7288 affects all IPANs and does not add specific information regarding the role of the proposed Cdh6+ subtype.

      (5) Why SMP IPANs were not included in the analysis of Cdh6 expression is a little puzzling. IPANs are present in the SMP of the small intestine and colon, and it would be useful to know if this proposed marker is also present in these cells.

      (6) The emphasis on IH being a rhythmicity indicator seems a bit premature. There is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS.

      (7) As the authors point out in the introduction and discuss later on, Type II Cadherins such as Cdh6 bind homophillically to the same cadherin at both pre- and post-synapse. The apparent enrichment of Cdh6 in IPANs would suggest extensive expression in synaptic terminals that would also suggest extensive IPAN-IPAN connections unless other subtypes of neurons express this protein. Such synaptic connections are not typical of IPANs and raise the question of whether or not IPANs actually express the functional protein and if so, what might be its role. Not having this information limits the usefulness of this as a proposed marker.

      (8) Experiments shown in Figures 6J and K use a tethered pellet to drive motor responses. By definition, these are not CMCs as stated by the authors.

      (9) The data from the optogenetic experiments are difficult to understand. How would stimulating IPANs in the distal colon generate retrograde CMCs and stimulating IPANs in the proximal colon do nothing? Additional characterization of the Cdh6+ population of cells is needed to understand the mechanisms underlying these effects.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Gomez-Frittelli and colleagues characterize the expression of cadherin6 (and -8) in colonic IPANs of mice. Moreover, they found that these cdh6-expressing IPANs are capable of initiating colonic motor complexes in the distal colon, but not proximal and midcolon. They support their claim by morphological, electrophysiological, optogenetic, and pharmacological experiments.

      Strengths:

      The work is very impressive and involves several genetic models and state-of-the-art physiological setups including respective controls. It is a very well-written manuscript that truly contributes to our understanding of GI-motility and its anatomical and physiological basis. The authors were able to convincingly answer their research questions with a wide range of methods without overselling their results.

      We greatly appreciate the reviewer’s time, careful reading and support of our study.

      Weaknesses:

      The authors put quite some emphasis on stating that cdh6 is a synaptic protein (in the title and throughout the text), which interacts in a homophilic fashion. They deduct that cdh6 might be involved in IPAN-IPAN synapses (line 247ff.). However, Cdh6 does not only interact in synapses and is expressed by non-neuronal cells as well (see e.g., expression in the proximal tubuli of the kidney). Moreover, cdh6 does not only build homodimers, but also heterodimers with Chd9 as well as Cdh7, -10, and -14 (see e.g., Shimoyama et al. 2000, DOI: 10.1042/0264-6021:3490159). It would therefore be interesting to assess the expression pattern of cdh6-proteins using immunostainings in combination with synaptic markers to substantiate the authors' claim or at least add the possibility of cell-cell-interactions other than synapses to the discussion. Additionally, an immunostaining of cdh6 would confirm if the expression of tdTomato in smooth muscle cells of the cdh6-creERT model is valid or a leaky expression (false positive).

      We agree with the reviewer that Cdh6 could be mediating some other cell-cell interaction besides synapses between IPANs, and will include more on this in the discussion. Cdh6 primarily forms homodimers but, as the reviewer points out, has been known to also form heterodimers with some other cadherins. We performed RNAscope in the colonic myenteric plexus with Cdh7 and found no expression (data not shown). Cdh10 is suggested to have very low expression (Drokhlyansky et al., 2020), possibly in putative secretomotor vasodilator neurons, and Cdh14 has not been assayed in any RNAseq screens. We attempted to visualize Cdh6 protein via antibody staining (Duan et al., 2018) but our efforts did not result in sufficient signal or resolution to identify synapses in the ENS, which remain broadly challenging to assay. Similarly, immunostaining with Cdh6 antibody was unable to confirm Cdh6 protein in tdT-expressing muscle cells, or by RNAscope. We will address these caveats in the discussion section.

      (1) E. Drokhlyansky, C. S. Smillie, N. V. Wittenberghe, M. Ericsson, G. K. Griffin, G. Eraslan, D. Dionne, M. S. Cuoco, M. N. Goder-Reiser, T. Sharova, O. Kuksenko, A. J. Aguirre, G. M. Boland, D. Graham, O. Rozenblatt-Rosen, R. J. Xavier, A. Regev, The Human and Mouse Enteric Nervous System at Single-Cell Resolution. Cell 182, 1606-1622.e23 (2020).

      (2) X. Duan, A. Krishnaswamy, M. A. Laboulaye, J. Liu, Y.-R. Peng, M. Yamagata, K. Toma, J. R. Sanes, Cadherin Combinations Recruit Dendrites of Distinct Retinal Neurons to a Shared Interneuronal Scaffold. Neuron 99, 1145-1154.e6 (2018).

      Reviewer #2 (Public review):

      Summary:

      Intrinsic primary afferent neurons are an interesting population of enteric neurons that transduce stimuli from the mucosa, initiate reflexive neurocircuitry involved in motor and secretory functions, and modulate gut immune responses. The morphology, neurochemical coding, and electrophysiological properties of these cells have been relatively well described in a long literature dating back to the late 1800's but questions remain regarding their roles in enteric neurocircuitry, potential subsets with unique functions, and contributions to disease. Here, the authors provide RNAscope, immunolabeling, electrophysiological, and organ function data characterizing IPANs in mice and suggest that Cdh6 is an additional marker of these cells.

      Strengths:

      This paper would likely be of interest to a focused enteric neuroscience audience and increase information regarding the properties of IPANs in mice. These data are useful and suggest that prior data from studies of IPANs in other species are likely translatable to mice.

      We appreciate the reviewer’s support of our study and insightful critiques for its improvement.

      Weaknesses:

      The advance presented here beyond what is already known is minimal. Some of the core conclusions are overstated and there are multiple other major issues that limit enthusiasm. Key control experiments are lacking and data do not specifically address the properties of the proposed Cdh6+ population.

      Major weaknesses:

      (1) The novelty of this study is relatively low. The main point of novelty suggests an additional marker of IPANs (Cdh6) that would add to the known list of markers for these cells. How useful this would be is unclear. Other main findings basically confirm that IPANs in mice display the same classical characteristics that have been known for many years from studies in guinea pigs, rats, mice and humans.

      We appreciate the already existing markers for IPANs in the ENS and the existing literature characterizing these neurons. The primary intent of this study was to use these well established characteristics of IPANs in both mice and other species to characterize Cdh6-expressing neurons in the mouse myenteric plexus and confirm their classification as IPANs.

      (2) Some of the main conclusions of this study are overstated and claims of priority are made that are not true. For example, the authors state in lines 27-28 of the abstract that their findings provide the "first demonstration of selective activation of a single neurochemical and functional class of enteric neurons". This is certainly not true since Gould et al (AJP-GIL 2019) expressed ChR2 in nitrergic enteric neurons and showed that activating those cells disrupted CMC activity. In fact, prior work by the authors themselves (Hibberd et al., Gastro 2018) showed that activating calretinin neurons with ChR2 evoked motor responses. Work by other groups has used chemogenetics and optogenetics to show the effects of activating multiple other classes of neurons in the gut.

      We believe our phrasing in this sentence was misleading. Whilst single neurochemical classes of enteric neurons have been manipulated to alter gut functions, all such instances to date do not represent manipulation of a single functional class of enteric neurons. In the given examples, NOS and calretinin are each expressed to varying degrees across putative motor neurons, interneurons and IPANs. In contrast, Chd6 is restricted to IPANs and therefore this study is the first optogenetic investigation of enteric neurons from a single putative functional class. We will alter this segment in the revised manuscript to emphasize this point and differentiate this study from those previous.

      (3) Critical controls are needed to support the optogenetic experiments. Control experiments are needed to show that ChR2 expression a) does not change the baseline properties of the neurons, b) that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons, and c) that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions focused on here.

      We completely agree controls are essential. However, our paper is not the first to express ChR2 in enteric neurons. Authors of our paper have shown in Hibberd et al. 2018 that expression of ChR2 in a heterogeneous population of myenteric neurons did not change network properties of the myenteric plexus. This was demonstrated in the lack of change in control CMC characteristics in mice expressing ChR2 under basal conditions (without blue light exposure). Regarding question (b), that it should be shown that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons. We show the restricted expression of ChR2 in IPANs and that motor responses (to blue light) are blocked by selective nerve conduction blockade.

      Regarding question (c), that our study should demonstrate that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions. We would not expect each region of the gut to behave comparably. This is because the different gut regions (i.e. proximal, mid, distal) are very different anatomically, as is anatomy of the myenteric plexus and myenteric ganglia between each region, including the density of IPANs within each ganglia, in addition to the presence of different patterns of electrical and mechanical activity [Spencer et al., 2020]. Hence, it is difficult to expect that between regions stimulation of ChR2 should induce similar physiological responses. The motor output we record in our study (CMCs) is a unified motor program that involves the temporal coordination of hundreds of thousands of enteric neurons and a complex neural circuit that we have previously characterized [Spencer et al., 2018]. But, never has any study until now been able to selectively stimulate a single functional class of enteric neurons (with light) to avoid indiscriminate activation of other classes of neurons.

      (1) T. J. Hibberd, J. Feng, J. Luo, P. Yang, V. K. Samineni, R. W. Gereau, N. Kelley, H. Hu, N. J. Spencer, Optogenetic Induction of Colonic Motility in Mice. Gastroenterology 155, 514-528.e6 (2018).

      (2) N. J. Spencer, L. Travis, L. Wiklendt, T. J. Hibberd, M. Costa, P. Dinning, H. Hu, Diversity of neurogenic smooth muscle electrical rhythmicity in mouse proximal colon. American Journal of Physiology-Gastrointestinal and Liver Physiology 318, G244–G253 (2020).

      (3) N. J. Spencer, T. J. Hibberd, L. Travis, L. Wiklendt, M. Costa, H. Hu, S. J. Brookes, D. A. Wattchow, P. G. Dinning, D. J. Keating, J. Sorensen, Identification of a Rhythmic Firing Pattern in the Enteric Nervous System That Generates Rhythmic Electrical Activity in Smooth Muscle. J. Neurosci. 38, 5507–5522 (2018).

      (4) The electrophysiological characterization of mouse IPANs is useful but this is a basic characterization of any IPAN and really says nothing specifically about Cdh6+ neurons. The electrophysiological characterization was also only done in a small fraction of colonic IPANs, and it is not clear if these represent cell properties in the distal colon or proximal colon, and whether these properties might be extrapolated to IPANs in the different regions. Similarly, blocking IH with ZD7288 affects all IPANs and does not add specific information regarding the role of the proposed Cdh6+ subtype.

      Our electrophysiological characterization was guided to be within a subset of Cdh6+ neurons by Hb9:GFP expression. As in the prior comment (1) above, we used these experiments to confirm classification of Cdh6+ (Hb9:GFP+) neurons in the distal colon as IPANs. We will clarify that these experiments were performed in the distal colon and agree that we cannot extrapolate that these properties are also representative of IPANs in the proximal colon. We apologize that this was confusing. Finally, we agree with the reviewer that ZD7288 affects all IPANs in the ENS and will clarify this in the text.

      (5) Why SMP IPANs were not included in the analysis of Cdh6 expression is a little puzzling. IPANs are present in the SMP of the small intestine and colon, and it would be useful to know if this proposed marker is also present in these cells.

      We agree with the reviewer. In addition to characterizing Cdh6 in the myenteric plexus, it would be interesting to query if sensory neurons located within the SMP also express Cdh6. Our preliminary data (n=2) show ~6-12% tdT/Hu neurons in Cdh6-tdT ileum and colon (data not shown). We will add a sentence to the discussion.

      (6) The emphasis on IH being a rhythmicity indicator seems a bit premature. There is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS.

      Regarding the statement there is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS. We agree with the reviewer that evidence of rhythm generation by IH and IT in the ENS has not been explicitly confirmed. We are confident the reviewer agrees that an absence of evidence is not evidence of absence, although the presence of IH has been well described in enteric neurons. We will modify the text in the results to indicate more clearly that IH and IT are known to participate in rhythm generation in thalamocortical circuits, though their roles in the ENS remain unknown. Our discussion of the potential role of IH or IT in rhythm generation or oscillatory firing of the ENS is constrained to speculation in the discussion section of the text.

      (7) As the authors point out in the introduction and discuss later on, Type II Cadherins such as Cdh6 bind homophillically to the same cadherin at both pre- and post-synapse. The apparent enrichment of Cdh6 in IPANs would suggest extensive expression in synaptic terminals that would also suggest extensive IPAN-IPAN connections unless other subtypes of neurons express this protein. Such synaptic connections are not typical of IPANs and raise the question of whether or not IPANs actually express the functional protein and if so, what might be its role. Not having this information limits the usefulness of this as a proposed marker.

      We agree with the reviewer that the proposed IPAN-IPAN connection is novel although it has been proposed before (Kunze et al., 1993). As detailed in our response to Reviewer #1, we attempted to confirm Cdh6 protein expression, but were unsuccessful, due to insufficient signal and resolution. We therefore discuss potential IPAN interconnectivity in the discussion, in the context of contrasting literature.

      (1) W. A. A. Kunze, J. B. Furness, J. C. Bornstein, Simultaneous intracellular recordings from enteric neurons reveal that myenteric ah neurons transmit via slow excitatory postsynaptic potentials. Neuroscience 55, 685–694 (1993).

      (8) Experiments shown in Figures 6J and K use a tethered pellet to drive motor responses. By definition, these are not CMCs as stated by the authors.

      The reviewer makes a valid criticism as to the terminology, since tethered pellet experiments do not record propagation. We believe the periodic bouts of propulsive force on the pellet is triggered by the same activity underlying the CMC. In our experience, these activities have similar periodicity, force and identical pharmacological properties. Consistent with this, we also tested full colons (n = 2) set up for typical CMC recordings by multiple force transducers, finding that CMCs were abolished by ZD7288, similar to fixed pellet recordings (data not shown).

      (9) The data from the optogenetic experiments are difficult to understand. How would stimulating IPANs in the distal colon generate retrograde CMCs and stimulating IPANs in the proximal colon do nothing? Additional characterization of the Cdh6+ population of cells is needed to understand the mechanisms underlying these effects.

      We agree that the different optogenetic responses in the proximal and distal colon are challenging to interpret, but perhaps not surprising in the wider context. It is not only possible that the different optogenetic responses in this study reflect regional differences in the Chd6+ neuronal populations, but also differences in neural circuits within these gut regions. A study some time ago by the authors showed that electrical stimulation of the proximal mouse colon was unable to evoke a retrograde (aborally) propagating CMC (Spencer, Bywater, 2002), but stimulation of the distal colon was readily able to. We concluded that at the oral lesion site there is a preferential bias of descending inhibitory nerve projections, since the ascending excitatory pathways have been cut off. In contrast, stimulation of the distal colon was readily able to activate an ascending excitatory neural pathway, and hence induce the complex CMC circuits required to generate an orally propagating CMC. Indeed, other recent studies have added to a growing body of evidence for significant differences in the behaviors and neural circuits of the two regions (Li et al., 2019, Costa et al., 2021a, Costa et al., 2021b, Nestor-Kalinoski et al., 2022). We will expand this discussion.

      (1) N. J. Spencer, R. A. Bywater, Enteric nerve stimulation evokes a premature colonic migrating motor complex in mouse. Neurogastroenterology & Motility 14, 657–665 (2002).

      (2) Li Z, Hao MM, Van den Haute C, Baekelandt V, Boesmans W, Vanden Berghe P (2019) Regional complexity in enteric neuron wiring reflects diversity of motility patterns in the mouse large intestine. Elife 8.

      (3). Costa M, Keightley LJ, Hibberd TJ, Wiklendt L, Dinning PG, Brookes SJ, Spencer NJ (2021a) Motor patterns in the proximal and distal mouse colon which underlie formation and propulsion of feces. Neurogastroenterol Motil e14098.

      (4) Costa M, Keightley LJ, Hibberd TJ, Wiklendt L, Smolilo DJ, Dinning PG, Brookes SJ, Spencer NJ (2021b) Characterization of alternating neurogenic motor patterns in mouse colon. Neurogastroenterol Motil 33:e14047.

      (5) Nestor-Kalinoski A, Smith-Edwards KM, Meerschaert K, Margiotta JF, Rajwa B, Davis BM, Howard MJ (2022) Unique Neural Circuit Connectivity of Mouse Proximal, Middle, and Distal Colon Defines Regional Colonic Motor Patterns. Cell Mol Gastroenterol Hepatol 13:309-337.e303.

    1. eLife Assessment

      This important study reports on a basis for neurabin-mediated specification of substrate choice by protein phosphatase-1. The data from the comprehensive approach using structural, biochemical, and computational methods are compelling, but the role of the crucial tryptophan residue in the recognition motif can be further tested to strengthen the main argument. This paper is broadly relevant to those investigating various cellular signaling cascades that entail phosphorylation as the main mechanism.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript the Treisman and colleagues address the question of how protein phosphatase 1 (PP1) regulatory subunits (or PP1-interacting protein (PIPs)) confer specificity on the PP1 catalytic subunit which by itself possesses little substrate specificity. In prior work the authors showed that the PIP Phactrs confers specificity by remodelling a hydrophobic groove immediately adjacent to the PP1 catalytic site through residues within the RVxF- ø ø -R-W string of Phactrs. Specifically, the residues proximal and including the 'W' of the RVxF- ø ø -R-W string remodel the hydrophobic groove. Other residues of the RVxF- ø ø -R-W string (i.e. the RVxF- ø ø -R) are not involved in this remodelling.

      The authors suggest that the RVxF- ø ø -R-W string is a conserved feature of many PIPs including PNUTS, Neurabin/spinophilin and R15A. However, from a sequence and structural perspective, only the RVxF- ø ø -R- is conserved. The W is not conserved in most and in the R15A structure (PDB:7NZM) the Trp side chain points away from the hydrophobic channel - this could be a questionable interpretation due to model-building into the low-resolution cryo-EM map (4 A).

      In this paper, the authors convincingly show that Neurabin confers substrate specificity through interactions of its PDZ domain with the PDZ domain-binding motif (PBM) of 4E-BP. They show the PBM motif is required for Neurabin to increase PP1 activity towards 4E-BP and a synthetic peptide modelled on 4E-BP and also a synthetic peptide based on IRSp53 with a PBM added. The PBM of 4E-BP1 confers high affinity binding to the Neurabin PDZ domain. A crystal structure of a PP1-4E-BP1 fusion with Neurabin shows that the PBM of 4E-BP interacts with the PDZ domain of Neurabin. No interactions of 4E-BP and the catalytic site of PP1 are observed. Cell biology work showed that Neurabin-PP1 regulates the TOR signalling pathway by dephosphorylating 4E-BPs.

      Strengths:

      This work demonstrates convincingly using a variety of cell biology, proteomics, biophysics and structural biology that the PP1 interacting protein Neurabin confers specificity on PP1 through an interaction of its PDZ domain with a PDZ-binding motif of 4E-BP1 proteins. Remodelling of the hydrophobic groove of the PP1 catalytic subunit is not involved in Neurabin-dependent substrate specificity, in contrast to how Phactrs confers specificity on PP1. The active site of the Neurabin/PP1 complex does not recognise residues in the vicinity of the phospho-residue, thus allowing for multiple phospho-sites on 4E-BP to be dephosphorylated by Neurabin/PP1. This contrasts with substrate specificity conferred by the Phactrs PIP that confers specificity of Phactrs/PP1 towards its substrates in a sequence-specific context by remodelling the hydrophobic groove immediately adjacent to the catalytic. The structural and biochemical insights are used to explore the role of Neurabin/PP1 in dephosphorylation 4E-BPs in vivo, showing that Neurabin/PP1 regulates the TOR signalling pathway, specifically mTORC1-dependent translational control.

      Weaknesses:

      The only weakness is the suggestion that a conserved RVxF- ø ø -R-W string exists in PIPs. The 'W' is not conserved in sequence and 3 dimensions in most of the PIPs discussed in this manuscript. The lack of conservation of the W would be consistent with the finding based on multiple PP1-PIP structures that apart from Phactrs, no other PIP appears to remodel the PP1 hydrophobic channel.

    3. Reviewer #2 (Public review):

      This manuscript explores the molecular mechanisms that are involved in substrate recognition by the PP1 phosphatase. The authors previously showed that the PP1 interacting protein (PPI), PhactrI, conferred substrate specificity by remodelling the PP1 hydrophobic substrate groove. In this work, the authors aimed to understand the key determinant of how other PIPs, Neurabin and Spinophilin, mediate substrate recognition.

      The authors generated a few PP1-PIP fusion constructs, undertook TMT phosphoproteomics and validated their method using PP1-Phactr1/2/3/4 fusion constructs. Using this method, the authors identified phsophorylation sites controlled by PP1-Neurabin and focussed their work on 4E-BP1, thereby linking PP1-Neurabin to mTORC1 signalling. Upon validating that PP1-Neurabin dephosphorylates 4E-BP1, they determined that 4E-BP1 PBM binds to the PDZ domain of Neurabin with an affinity that was greater than 30-fold as compared to other substrates. PP1-Neurabin dephosphorylated 4E-BP1WT and IRSp53WT with a catalytic efficiency much greater than PP1 alone. However, PP1-Neurabin bound to 4E-BP1 and IRSp53 mutants lacking the Neurabin PDZ domain with a catalytic efficiency lesser than that observed with 4E-BP1WT. These results indicate the involvement of the PDZ domain in facilitating substrate recruitment by PP1-Neurabin. Interestingly, PP1-Phactr1 dephosphorylation of 4E-BP1 phenocopies PP1 alone, while PP1-Phactr1 dephosphorylates IRSp53 to a much higher extent than PP1 alone. These results highlight the importance of the PDZ domain and also shed light on how different PP1-PIP holoenzymes mediate substrate recognition using distinct mechanisms. The authors also show that the remodelling of the hydrophobic PP1 substrate groove which is essential for substrate recognition by PP1-Phactr1, was not required by PP1-Neurabin. Additionally, the authors also resolved the structure of a PP1-4E-BP1 fusion with the PDZ-containing C-terminal of Neurabin and observed that the Neurabin/PP1-4E-BP1 complex structure was oriented at 21{degree sign} to that in the unliganded Spinophilin/PP1 complex (resolved by Ragusa et al., 2010) owing to a slight bend in the C-terminal section that connects it to the RVxF-ΦΦ-R-W string. Since no interaction was observed with the remodelled PP1-Neurabin hydrophobic groove, the authors utilised AlphaFold3 to further answer this. They observed a high confidence of interaction between the groove and phosphorylated substrate and a low confidence of interaction between the groove and unphosphorylated substrate, thereby suggesting that the hydrophobic groove remodelling is not involved in PP1-Neurabin recognition and dephosphorylation of 4E-BP1.

      In this work, the authors provide novel insights into how Neurabin depends on the interaction between its PDZ domain and PBM domains of potential substrates to mediate its recruitment by PP1. Additionally, they uncover a novel PP1-Neurabin substrate, 4E-BP1. They systematically employ phosphoproteomics, biochemical, and structural methods to investigate substrate specificity in a robust fashion. Furthermore, the authors also compare the interactions between PP1-Neurabin to 4E-BP1 and IRSp53 (PP1-Phactr1 substrate) with PP1-Phactr1, to showcase the specificity of the mode of action employed by these complexes in mediating substrate specificity. The authors employ an innovative PP1-PIP fusion strategy previously explored by Oberoi et al., 2016 and the authors themselves in Fedoryshchak et al., 2020. Although this method, allows for a more controlled investigation of the interactions between PP1-PIPs and its substrates, this methodology may not fully recapitulate the interactions that may occur in a physiological setting. This could potentially be overcome by studying the interactions of the full proteins using classical biochemical approaches in cell lines. Furthermore, the authors have substantially characterised the importance of the PDZ domain using their fusion constructs, however, I believe that further exploration into either structural or AlphaFold3 modelling of PBM domain substrate mutants, or a Neurabin PDZ-domain mutant might further strengthen this claim. Overall, the paper makes a substantial contribution to understanding substrate recognition and specificity in PP1-PIP complexes. The study's innovative methods, biological relevance, and mechanistic insights are strengths, but whether this mechanism occurs in a physiological context is unclear.

    4. Reviewer #3 (Public review):

      Protein Phosphatase 1 (PP1), a vital member of the PPP superfamily, drives most cellular serine/threonine dephosphorylation. Despite PP1's low intrinsic sequence preference, its substrate specificity is finely tuned by over 200 PP1-interacting proteins (PIPs), which employ short linear motifs (SLIMs) to bind specific PP1 surface regions. By targeting PP1 to cellular sites, modifying substrate grooves, or altering surface electrostatics, PIPs influence substrate specificity. Although many PIP-PP1-substrate interactions remain uncharacterized, the Phactr family of PIPs uniquely imposes sequence specificity at dephosphorylation sites through a conserved "RVxF-ΦΦ-R-W" motif. In Phactr1-PP1, this motif forms a hydrophobic pocket that favors substrates with hydrophobic residues at +4/+5 in acidic contexts (the "LLD motif"), a specificity that endures even in PP1-Phactr1 fusions. Neurabin/Spinophilin remodel PP1's hydrophobic groove in distinct ways, creating unique holoenzyme surfaces, though the impact on substrate specificity remains underexplored. This study investigates Neurabin/Spinophilin specificity via PDZ domain-driven interactions, showing that Neurabin/PP1 specificity is governed more by PDZ domain interactions than by substrate sequence, unlike Phactr1/PP1.

      A significant strength of this work is the use of PP1-PIP fusion proteins to effectively model intact PP1•PIP holoenzymes by replicating the interactions that remodel the PP1 interface and confer site-specific substrate specificity. When combined with proteomic analyses to assess phospho-site depletion in mammalian cells, these fusions offer critical insights into holoenzyme specificity, revealing new candidate substrates for Neurabin and Spinophilin. The studies present compelling evidence that the PDZ domain of PP1-Neurabin directs its specificity, with the remodelled PP1 hydrophobic groove interactions having minimal impact. This mechanism is supported by structural analysis of the PP1-4E-BP1 substrate fusion bound to a Neurabin construct, highlighting the 4E-BP1/PDZ interaction. This work delivers crucial insights into PP1-PIP holoenzyme function, combining biochemical, proteomic, and structural approaches. It validates the PP1-PIP fusion protein model as a powerful tool, suggesting it may extend to studying additional holoenzymes. While an extremely useful model, it must be considered unlikely the PP1-PIP fusions fully recapitulate the specificity and regulation of the holoenzyme.

    5. Author response:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

    1. eLife Assessment

      This solid paper reports on the use of artificial intelligence to assess bone marrow adipose tissue in the skull. The method employing MRI is novel and that approach allows for the identification of genetic loci that regulate this trait as well as others using data from the UK biobank. Overall this is an important contribution although the authors should consider several points: 1-validation of the T1-weighted MRI signal intensity; 2-further discussion of the sex differences; and 3-cross-trait linkage disequilibrium score regression (LDSC) for osteoporosis, Parkinson's disease, and cognitive function.

    2. Reviewer #1 (Public review):

      The authors of this study developed a method to quantify calvarial bone marrow from MRI head scans, enabling the study of its composition in large datasets of adults, usually collected to study the brain. Bone marrow intensity can be semi-quantitatively measured in T1-weighted MRI scans due to the greater signal intensity of fat than watery red marrow. This is an ingenious use of the MRI-produced information for other important phenotypes, such as bone structure and marrow content. Different head types were tested for complying with the model, which is notable.

      The model was also successfully validated using several publicly available MRI resources - real data - in (1) a dataset consisting of 30 individuals that were scanned 10 times each at 3-day intervals, and (2) the monozygotic (MZ) twin data from the Human Connectome Project cohort. Then the authors applied this validated method to head-MRI scans from the UK Biobank (n=33,042) to extract information on the spatial distribution of bone marrow adiposity (BMA) in the calvaria, allowing a GWAS to identify associated genes.

      The authors revealed high heritability and identified 41 genetic loci significantly associated with the BMA trait, including six sex-specific loci. Of note, statistics estimate that 99% of BMA trait-influencing variants are shared with BMD (497 of 500 variants), which may mean these results demonstrate the biological relevance to bone health. Some of the BMA genes were found related to the Wnt pathway, including WNT16, WNT4, NXN; this is a "positive control", since the Wnt/β-catenin signaling pathway was suggested as an important determinant of BMA. Also, associations in genes (BMP4, DLX5, LGR4, LRP4, SFRP4) that are known to specifically influence adiposity, are encouraging. Integrating mapped genes with bone marrow single-cell RNA-seq data revealed patterns of adipogenic lineage differentiation and lipid loading.

      The study also investigated the genetic overlap between BMA and twelve (or 13) "brain and body" traits and identified significant genetic correlations with BMI, cognitive ability, and Parkinson's disease.

      In sum, since MRI head scans present a hitherto unexplored opportunity to address unresolved aspects of bone marrow biology, this study is both timely and innovative.

      There are, however, some assumptions, findings, and their interpretation, which require more critical focus.

      Sex-specificity is well described and studied here. Men have higher BMA than women, but post-menopausal women catch up in the BMA values. The authors believe that calvarial marrow has a number of features that make it particularly well-suited to the study of BMA process - which is clinically important in other bone sites. It has a simple "sandwiched" structure that they are able to model. This is true only to some extent: a condition called "Hyperostosis frontalis interna", of unknown etiology (described by Smith & Hemphill in 1956) - is characterized by irregular overgrowth of the inner table of the frontal bone (symmetric/bilateral). Although not of clinical significance, typically benign, studies report a prevalence of 12%; However, it's most common in postmenopausal women - where prevalences up to 49% in women over the age of 65 - have been reported. Thus, sexual dimorphism is obvious and the effect of estrogen is likely shared with whichever bone - and marrow - age-related pathology. So, for women not using HRT, this new layer of the bone might interfere with the calvarial BMA readings and in turn, affect the BMA-related analyses. The authors suspect that the effect of BMA on BMD may be biased in women; they should comment on those "with low BMD and high BMA" given that hyperostosis frontalis might be an issue. A strong effect of SNPs in the ESR1 chromosomal region might be akin to the above concern.

      Then, there is a perfect overlap of the BMA SNPs that are shared with BMD (497 of 500 variants), which may prove a "face validity" of the MRI-derived BMA. However, the BMD in the study was heel-derived eBMD - which is a good proxy for osteoporosis and is mostly driven by trabecular bone. Thus, there might be a concern that the BMA metrics capture some trabecular BMD.

      Next, integrating mapped genes with existing bone marrow single-cell RNA-sequencing data revealed patterns of adipogenic lineage differentiation and lipid loading. The problem here is that the scRNAseq studies of the Bone Marrow niche are overwhelmingly mouse. The authors might wish to justify why they are relevant to humans (in the absence of the human-specific scRNAseq).

      For genetic correlation analysis, the authors selected 7 body and 6 brain traits. The latter traits reflect cognition (general cognitive ability and educational attainment) and brain-related disorders. This selection might seem arbitrary. The interpretation of genetic correlation with cognitive ability, education, and Parkinson's disease was attributed to the recently discovered vascular channels that link calvarial bone marrow to the meninges. This is a fascinating hypothesis, which requires functional proof. However, there might be simpler explanations. Thus, the diploe and the inner table of the calvarium are drained by the same veins as the dura. From the anatomy textbook, we know that diploic veins connect the pericranial and endocranial venous system through the skull.

    3. Reviewer #2 (Public review):

      Summary:

      This study develops a new artificial intelligence method for high-throughput analysis of skull bone marrow from MRI data, which may be useful for large-scale biological analyses. Using this method, the authors then attempt to estimate skull bone marrow adiposity (BMA) using T1-weighted signal intensity from MRI scans of ~33,000 people, followed by genome-wide association analysis; however, the approach is inadequate because T1-weighted signal intensity is not validated for measurement of bone marrow adiposity. If it could be validated, the study would be an important advance in understanding of bone marrow adiposity and skeletal biology.

      Strengths:

      This paper is well-written, and the figures are nicely presented. The neural network method used for analysing skull bone marrow is innovative, and the authors validate this through several approaches. Therefore, the authors have achieved the aim of developing a method for large-scale analysis of skull bone marrow from MRI data.

      The GWAS is reasonably well-powered and addresses potential ethnicity differences, with one GWAS done across white males and females, and a separate GWAS in non-white participants. The methodology also conforms to common GWAS standards, including for mapping genetic variants to candidate genes. Moreover, the study further investigates the biological roles of these genes by analysing their expression in single-cell RNA sequencing data.

      Weaknesses:

      The fundamental weakness is that T1-weighted MRI signal intensity (T1W) is used as an estimate of BMA, but it has never been validated for this. The authors show that this T1W parameter measures something that is heritable and can be compared between subjects, but they don't show that it actually measures (or even estimates) calvarial BMA. There is an attempt to do so by comparing the T1W parameter with data from quantitative T1 images: the authors show a reasonable correlation with some of the quantitative T1 image data. However, this still does not show that the parameter is measuring BMA; it could be measuring some other biological characteristic, but this remains unclear. So, there is a need to validate the T1W parameter against an established measure of BMA, such as the bone marrow fat-fraction or proton density fat fraction measured from multi-echo MRI analysis.

      Without validating this BMA measurement method, it is not possible to interpret the GWAS or other findings reported in the study.

      A less critical weakness is that the GWAS has been done only on a single cohort, without replicating the findings in a follow-up cohort. For example, the authors could repeat their analysis on the remaining ~50,000 UK Biobank imaging participants for whom MRI data is now available. However, this would be pointless without knowing what biological characteristic(s) the T1W parameter is actually reflecting.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript, "Estimating bone marrow adiposity from head MRI and identifying its genetic 2 architecture", brings together the groups of Drs. Kaufmann and Hughes in a tour de force work to develop an artificial neural network that localizes calvaria bone marrow in T1-weighted MRI head scans, with the goal of studying its composition in several large MRI datasets, and to model sex-dimorphic age trajectories, including the effect of menopause.

      Strengths:

      Bone marrow adiposity is a very active tissue with far-reaching implications for tissue crosstalk and human health than we had initially recognized. Although MRI has been used to measure BM, studies such as the one by these two groups are still lacking whereas very large datasets are analyzed using advanced AI machine learning tools coupled with genetic studies and a specific pathology. The groups had to develop new methods and new AI machine-learning tools for the imaging analyses.

      Weaknesses:

      Some aspects of the work that authors could add additional clarification.

      (1) Imaging Limitations: The authors provide an excellent overview and references supporting the use of MRI as a method for assessing marrow fat, particularly with some specific modifications. However, MRI images can be affected by various factors, including the presence of other tissues as well as specific MRI settings, which are much harder to precisely control when using different datasets.

      (2) The specific density of cranial bones as it relates to the types of bone marrow: Cranial bones are extremely dense structures, which naturally interfere with MRI imaging. While it is thought that cranial bones have mostly "red bone marrow", this is only true for a short time in humans. How sensitive is their system in differentiating between red and yellow BM?

      (3) Both items above are further complicated by aging, but aging is not a linear event as we have learned. There are specific bursts of aging in humans around the age of 45 and early 60s. How do the system and model predict or incorporate these peaks of aging? It seems from the data shown that aging is reflected more as a linear phenomenon. Is this because additional aging datasets are needed?

      (4) The authors describe in richness of detail their AI learning programming and how it extracted the data from datasets. The authors also show some important correlations with specific genes, SNPs. What is not clear is how conditions such as anemia for example. An expected finding would be that patients with chronic anemia have lower bone marrow (BM) signal intensity on MRI scans than healthy people. This is because the signal intensity of BM depends on the fat-to-cell ratio in the tissue. Furthermore, patients with a host of musculoskeletal disorders ranging from osteopenia to osteoporosis, sarcopenia, and osteosarcopenia will also have altered MRI scans. When using such large datasets how did the authors control or exclude these pathological conditions, or were all these conditions likely present?

      (5) Some of the genes and SNPs although significant showed very small correlations. What is their likely physiological significance?

      (6) The authors could use this excellent manuscript to expand their discussion to include the need for studies like theirs to be also complemented by multi-OMICS studies that will include proteomics and lipidomics of BM, bones, and muscles.

    1. eLife Assessment

      The authors use a multidisciplinary approach to provide a useful link between Beta-alanine and S. Typhimurium (STM) infection and virulence. The work shows how Beta-alanine synthesis mediates zinc homeostasis regulation, possibly contributing to virulence. However, the work is incomplete and requires additional data to firmly establish the connection between Beta-alanine synthesis and zinc homeostasis. Measuring the source and zinc content of STM in vivo and examining mechanisms in human clinical strains and other serovars would be essential.

    2. Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

    3. Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

    4. Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading as well as the constructive comments and advice about our manuscript. We will revise the manuscript based on your comments and suggestions.

      You are right that this work have not thoroughly investigated the mechanisms underlying the roles of β-Ala, panD and zinc in impacting Salmonella infection. We will perform additional experiments to detect the content of zinc during Salmonella infection in vivo and in vitro, according to your suggestions.

      We agree that other unknown mechanism(s) are also involved in the virulence regulation by β-Ala in Salmonella, as our results showed that the double mutant Δ_panD_Δ_znuA_ (cannot synthesis of β-Ala and uptake of zinc) is more attenuated than the single mutant Δ_znuA_ (Figure 5D), suggesting that the contribution of β-Ala to the virulence of Salmonella is partially dependent on zinc acquisition_._ We will reword the related description throughout the manuscript for clarity.

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank your comments and advice regarding our manuscript and are delighted to accept them.

      You are right that our current findings are relatively limited and not sufficient for disease therapeutics. We will reword the related description throughout the manuscript. Based on this comment, we will also use Salmonella Typhi and human macrophages to perform additional experiments to extend our findings. Salmonella Typhi is a human-limited Salmonella serovar and the cause of typhoid fever, a severe lethal systemic disease. Salmonella Typhimurium (STM) cause systemic disease in mice, which is similar to the symptoms of typhoid fever in human and has been widely used to explore the pathogenesis of Salmonella.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. We will design and perform additional experiments to further investigate the mechanisms by which β-Ala, panD and zinc influence Salmonella infection, according to your suggestions. For example, we will detect the content of zinc during Salmonella infection in vivo and in vitro.

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It is reported that β-alanine is delivered to eukaryotic cells through TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Br J Pharmacol 161: 589 –600, 2010; Biochim Biophys Acta 1194: 44 –52, 1994). We will add this information in the revised manuscript.

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      Thank you for pointing it out. You are right that the above question is not clear. We will do our best to achieve this issue, via reviewing literature, designing and performing additional experiments.

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to find the transporter of β-alanine in Salmonella, but we found that the CycA transporter transports β-alanine  in Escherichia coli but not in Salmonella, despite Salmonella is the closely related species of E. coli.

      According to your suggestion, we will perform additional experiments to verify whether BasC is involved in the transport of β-alanine into Salmonella cytosol.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the question. Our results showed that β-alanine concentrations were downregulated in the Salmonella-infected RAW264.7 cells, and the replication of Salmonella in RAW264.7 cells was significantly increased with the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells, implying that intracellular Salmonella use host-derived β-alanine for growth. Unfortunately, we have not found the transporter of exogenous β-alanine into Salmonella cytosol. We will perform additional experiments to verify whether BasC is involved in the transport of β-alanine into Salmonella cytosol, or search for other transporters that are responsible for the uptake of β-alanine into Salmonella.

      Upon confirming the β-alanine transporter in Salmonella, we will compare the intracellular replication and virulence between WT and the transporter mutant strain, via cell and mice infection assays. If the replication ability and virulence of the mutant strain decreases relative to WT, suggesting that Salmonella uptakes the exogenous beta-alanine of the host to enhance intracellular replication and its virulence in mice.

      We have found that the replication of Salmonella panD mutant in macrophages and the virulence in mice were significantly decreased relative to WT, suggesting that the de novo synthesis of β-alanine is important for Salmonella intracellular replication and virulence_. To further confirm that both uptake of host-derived β-alanine and de novo synthesis of β-alanine are critical for the full virulence of _Salmonella, we will generate the double mutant of panD and β-alanine transporter gene. If the replication ability and virulence of the double mutant decreases compared with each of the single mutant, suggesting that Salmonella both utilizes the exogenous beta-alanine of the host and de novo synthesis of β-alanine for full virulence.

    1. eLife Assessment

      This article reports a useful set of findings on how electrophysiological response properties of neurons correlate with their position in the brain. The evidence currently remains incomplete, with reviewers making specific suggestions for how clustering needs to be redone. The manuscript would also benefit from a more focused presentation of results and the removal of incorrect claims about recording biases.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by Tolossa et al. presents classification studies that aim to predict the anatomical location of a neuron from the statistics of its in-vivo firing pattern. They study two types of statistics (ISI distribution, PSTH) and try to predict the location at different resolutions (region, subregion, cortical layer).

      Strengths:

      This paper provides a systematic quantification of the single-neuron firing vs location relationship.

      The quality of the classification setup seems high.

      The paper uncovers that, at the single neuron level, the firing pattern of a neuron carries some information on the neuron's anatomical location, although the predictive accuracy is not high enough to rely on this relationship in most cases.

      Weaknesses:

      As the authors mention in the Discussion, it is not clear whether the observed differences in firing are epiphenomenal. If the anatomical location information is useful to the neuron, to what extent can this be inferred from the vicinity of the synaptic site, based on the neurotransmitter and neuromodulator identities? Why would the neuron need to dynamically update its prediction of the anatomical location of its pre-synaptic partner based on activity when that location is static, and if that information is genetically encoded in synaptic proteins, etc (e.g., the type of the synaptic site)? Note that the neuron does not need to classify all possible locations to guess the location of its pre-synaptic partner because it may only receive input from a subset of locations. If an argument on activity-based estimation being more advantageous to the neuron than synaptic site-based estimation cannot be made, I believe limiting the scope of the paper (e.g., in the Introduction) to an epiphenomenal observation and its quantification will improve the scientific quality.Life Assessment

      This article reports a useful set of findings on how electrophysiological response properties of neurons correlate with their position in the brain. The evidence currently remains incomplete, with reviewers making specific suggestions for how clustering needs to be redone. The manuscript would also benefit from a more focused presentation of results and the removal of incorrect claims about recording biases.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Tolossa et al. analyze Inter-spike intervals from various freely available datasets from the Allen Institute and from a dataset from Steinmetz et al. They show that they can modestly decode between gross brain regions (Visual vs. Hippocampus vs. Thalamus), and modestly separate sub-areas within brain regions (DG vs. CA1 or various visual brain areas).

      Strengths:

      The paper is reasonably well written, and the definitions are quite well done. For example, the authors clearly explained transductive vs. inductive inference in their decoders. E.g., transductive learning allows the decoder to learn features from each animal, whereas inductive inference focuses on withheld animals and prioritizes the learning of generalizable features.

      Weaknesses:

      However, even with some of these positive aspects, I still found the manuscript to be a laundry list of results, where some results are overly explained and not particularly compelling or interesting, whereas interesting results are not strongly described or emphasized. The overall problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding. The current version attempts to split the middle and thus is not as impactful as it could be.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The paper by Tolossa et al. presents classification studies that aim to predict the anatomical location of a neuron from the statistics of its in-vivo firing pattern. They study two types of statistics (ISI distribution, PSTH) and try to predict the location at different resolutions (region, subregion, cortical layer).

      Strengths:

      This paper provides a systematic quantification of the single-neuron firing vs location relationship.

      The quality of the classification setup seems high.

      The paper uncovers that, at the single neuron level, the firing pattern of a neuron carries some information on the neuron's anatomical location, although the predictive accuracy is not high enough to rely on this relationship in most cases.

      Thank you for your thoughtful feedback. The level of predictive accuracy offered by our current approach, while far above chance, is insufficient for electrode localization in most cases. Although, we speculate that our results represent a lower limit on possible performance—future improvements are almost certain as larger datasets are generated, more diverse features of neural activity are employed, and more advanced ML tools are implemented. We note that the current performance indicates a far more reliable embedding of anatomy in spiking than precedented by the modest statistical significance previously described in the literature. It would have been impossible to achieve this without the tremendous resources provided by the Allen Institute. In our revision, we will clarify that major performance improvements are both possible and probable.

      Weaknesses:

      As the authors mention in the Discussion, it is not clear whether the observed differences in firing are epiphenomenal. If the anatomical location information is useful to the neuron, to what extent can this be inferred from the vicinity of the synaptic site, based on the neurotransmitter and neuromodulator identities? Why would the neuron need to dynamically update its prediction of the anatomical location of its pre-synaptic partner based on activity when that location is static, and if that information is genetically encoded in synaptic proteins, etc (e.g., the type of the synaptic site)? Note that the neuron does not need to classify all possible locations to guess the location of its pre-synaptic partner because it may only receive input from a subset of locations.  If an argument on activity-based estimation being more advantageous to the neuron than synaptic site-based estimation cannot be made, I believe limiting the scope of the paper (e.g., in the Introduction) to an epiphenomenal observation and its quantification will improve the scientific quality.

      Summarily, in response to the two reviewers, we will minimize our discussion of this question in the revision. However, given that our results are either epiphenomenal or functional, we feel that it is important to indicate these possibilities, even if this indication is succinct and conservative.

      In pursuit of a more concise revision, we will not expand our discussion to accommodate this interesting conversation with the reviewer, but we are excited to briefly offer our perspective here.

      Regarding the epiphenomenal nature of our observations: this is a complex question that would be challenging but not impossible to validate experimentally. It has been previously established that neurons, especially those that integrate inputs from a variety of regions and are involved in diverse functions, could benefit from mechanisms for dynamically parsing inputs (Gutig, Sompolinsky 2006). Neurotransmitter and neuromodulator identities may indeed convey some information about presynaptic neuron location (e.g., NE may originate from the locus coeruleus). However, hypothetically, the binding of a neurotransmitter only bears on the postsynaptic neuron via ionic current, or second messenger activity. Postsynaptic neurons do not consume or otherwise endocytose the neurotransmitter, thus the ability of a neuron to “know” the presynaptic identity is a function of induced postsynaptic activity. Certainly, there are multiple streams of information that can provide insight into anatomical location all taking the ultimate form of neural activity and membrane dynamics. This would be broadly consistent with (for example) reward prediction error which is evident in dopamine release, firing rates, spiking patterns, and oscillatory rhythms.

      We could imagine a possible role for the embedding of location in spiking patterns. It is important to note that many neurons in neighboring areas share common neurotransmitters (e.g., glutamate, GABA). Neurons receiving input from multiple regions with similar neurotransmitter profiles could benefit from additional information in the spiking patterns for distinguishing input sources, especially for multimodal integration. For instance, an inferior parietal lobule neuron or microcircuit could be downstream from both auditory cortex (listening) and Broca’s area (speaking). Imagine an individual is in a crowded coffee shop waiting for their drink order to be called while speaking to their friend. In this scenario, it may be important to recognize region-specific activity and thus selectively attend to it. Thus, it is unlikely that neurons actively update a “location prediction,” but rather that location-related information is passively embedded in spike patterning and this might be dynamically leveraged in computation. We emphasize that this is a simplified conceptual example and not a hypothesis that we test in the paper. This conversation, however, is a wonderful example of the thought experiments that we hope will grow from this type of work.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Tolossa et al. analyze Inter-spike intervals from various freely available datasets from the Allen Institute and from a dataset from Steinmetz et al. They show that they can modestly decode between gross brain regions (Visual vs. Hippocampus vs. Thalamus), and modestly separate sub-areas within brain regions (DG vs. CA1 or various visual brain areas).

      Strengths:

      The paper is reasonably well written, and the definitions are quite well done. For example, the authors clearly explained transductive vs. inductive inference in their decoders. E.g., transductive learning allows the decoder to learn features from each animal, whereas inductive inference focuses on withheld animals and prioritizes the learning of generalizable features.

      Thank you!

      Weaknesses:

      However, even with some of these positive aspects, I still found the manuscript to be a laundry list of results, where some results are overly explained and not particularly compelling or interesting, whereas interesting results are not strongly described or emphasized. The overall problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding. The current version attempts to split the middle and thus is not as impactful as it could be

      In our revision, we will endeavor to present our results in line with your suggestions. Thank you for the careful and thorough feedback that will improve the readability of our manuscript. We strove to be complete in establishing the logic leading to our ultimate finding—that a robust code for anatomical location can be extracted from single neuron spike trains, but not from more traditional descriptions of neural activity. Our detection of this code, albeit not perfect in performance, is, in most cases, both far above chance levels and is robust to animal identity and laboratory of origin. Our presentation of these results is cohesive in as much as we sequentially establish a series of results that build towards a concluding set of experiments. We start by establishing a baseline via standard measurements and then explore more challenging problems through more complex models that build toward our final test.  Based on your feedback, we will contract and expand elements of this sequence.

      While our findings raise the possibility of developing a computational tool for electrode localization, pending additional features and/or datasets, our current focus is on establishing the neurobiological principle of anatomical embedding in spike trains. The purpose of briefly mentioning a possible application is that we hope to encourage those engaged in machine-learning on multi-modal neural data that this problem is tractable, yet still open. Based on your feedback, we will clarify that the focus of our current work is not an introduction of a new tool.

    1. eLife Assessment

      This valuable study clarifies the mechanism by which the kinesin-10 motor protein, chromosome-associated kinesin, Kid (KIF22), enables chromosome movement during mitosis, demonstrating that human and Xenopus Kid proteins function as processive, homodimeric kinesins capable of processive microtubule plus-end motility. The convincing work highlights that Kid can recruit and transport duplex DNA along microtubules via its conserved C-terminal DNA binding domain, revising our understanding of chromokinesins' role in chromosome motility during mitosis. Although the data are robust, the manuscript would benefit from some editing for clarity.

    2. Reviewer #1 (Public review):

      Summary:

      Mitotic kinesins carry out crucial roles in intracellular motility and mitotic spindle organization. Although many mitotic kinesins have been extensively studied, a few conserved mitotic motors remain poorly explored, including chromosome-associated kinesins. Here, Furusaki et al reconstitute recombinant chromosome-associated kinesin or chromokinesin (Kid) and reveal processive plus-end motility along microtubules. The authors purify multiple versions of Kid, revealing dimeric organization and their processive microtubule plus-ended motility which depends on their conserved motor domains, neck linkers, and coiled-coil regions. The study reveals for the first time that KID can recruit and transport duplex DNA along microtubules using its conserved C-terminal DNA binding domain. The work provides crucial revised thinking about the mechanisms of Chromokinesins mitosis as physical processive motors that mobilize chromosomes towards the microtubule plus ends in early metaphase.

      Strengths:

      The authors reconstitute multiple chromosome-associated kinesin (KID) orthologs from Xenopus and humans with microtubules and determine their oligomerization. The study shows how coiled-coil and neck linker regions of KID are essential for its function as its deletion leads to non-processive motility. CHimeras placing the KID coiled-coil and neck linker on the KIF1A motor domain led to the production of a processive recombinant motor supporting the compatibility of their motility mechanisms. The KID c-terminal tail binds and transports only double-stranded DNA and its deletion or single-stranded DNA leads to defects in this activity.

      Weaknesses:

      A minor weakness in the studies is that they do not resolve the mechanisms of KID in binding large duplex DNA molecules or condensed chromatin. The authors suggest a model in which KID forms multimers along large chromosomes that lead to their transport, but this model was not directly tested.

    3. Reviewer #2 (Public review):

      Summary:

      Previous work in the field highlighted the role of the kinesin-10 motor protein Kid (KIF22) in the polar ejection force during prometaphase. However, the biochemical and biophysical properties of Kid that enabled it to serve in this role were unclear. The authors demonstrate that human and xenopus Kid proteins are processive kinesins that function as homodimeric molecules. The data are solid and support the findings although the text could use some editing to improve clarity.

      Strengths:

      A highlight of the work is the reconstitution of DNA transport in vitro.

      A second highlight is the demonstration that the monomer vs dimer state is dependent on protein concentration.

      Weaknesses:

      The authors make several assumptions of the monomer vs dimer state of various Kid constructs without verifying the protein state using e.g. size exclusion chromatography and/or nanophotometry. They also make statements about monomer-to-dimer transitions on the microtubule without showing or quantifying the data.

      The discussion needs to better put the work into context regarding the ability of non-processive motors to work in teams (formerly thought to be the case for Kid) and how their findings on Kid change this prevailing view in the case of polar ejection force.

      The authors also do not mention previous work on kinesins with non-conventional neck linker/neck coil regions that have been shown to move processively. Their work on Kid needs to be put into this context.

    4. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Mitotic kinesins carry out crucial roles in intracellular motility and mitotic spindle organization. Although many mitotic kinesins have been extensively studied, a few conserved mitotic motors remain poorly explored, including chromosome-associated kinesins. Here, Furusaki et al reconstitute recombinant chromosome-associated kinesin or chromokinesin (Kid) and reveal processive plus-end motility along microtubules. The authors purify multiple versions of Kid, revealing dimeric organization and their processive microtubule plus-ended motility which depends on their conserved motor domains, neck linkers, and coiled-coil regions. The study reveals for the first time that KID can recruit and transport duplex DNA along microtubules using its conserved C-terminal DNA binding domain. The work provides crucial revised thinking about the mechanisms of Chromokinesins mitosis as physical processive motors that mobilize chromosomes towards the microtubule plus ends in early metaphase. 

      Strengths: 

      The authors reconstitute multiple chromosome-associated kinesin (KID) orthologs from Xenopus and humans with microtubules and determine their oligomerization. The study shows how coiled-coil and neck linker regions of KID are essential for its function as its deletion leads to non-processive motility. CHimeras placing the KID coiled-coil and neck linker on the KIF1A motor domain led to the production of a processive recombinant motor supporting the compatibility of their motility mechanisms. The KID c-terminal tail binds and transports only double-stranded DNA and its deletion or single-stranded DNA leads to defects in this activity.

      Thank you very much.

      Weaknesses: 

      A minor weakness in the studies is that they do not resolve the mechanisms of KID in binding large duplex DNA molecules or condensed chromatin. The authors suggest a model in which KID forms multimers along large chromosomes that lead to their transport, but this model was not directly tested. 

      Thank you very much for your suggestion.

      We will attempt to observe the movement of longer dsDNA and/or DNA-bead complexes and compare their motility with that of a single KID motor to elucidate the cooperativity of the motor protein.

      Reviewer #2 (Public review): 

      Summary: 

      Previous work in the field highlighted the role of the kinesin-10 motor protein Kid (KIF22) in the polar ejection force during prometaphase. However, the biochemical and biophysical properties of Kid that enabled it to serve in this role were unclear. The authors demonstrate that human and xenopus Kid proteins are processive kinesins that function as homodimeric molecules. The data are solid and support the findings although the text could use some editing to improve clarity. 

      Strengths: 

      A highlight of the work is the reconstitution of DNA transport in vitro. 

      A second highlight is the demonstration that the monomer vs dimer state is dependent on protein concentration. 

      Thank you very much.

      Weaknesses: 

      The authors make several assumptions of the monomer vs dimer state of various Kid constructs without verifying the protein state using e.g. size exclusion chromatography and/or nanophotometry. They also make statements about monomer-to-dimer transitions on the microtubule without showing or quantifying the data. 

      As reviewer suggests, the monomer-to-dimer transitions on the microtubule is a speculation. What we can measure in our hands are (1) monomer and dimer ratio in the solution and (2) particle movement on microtubules. At the pmol/L condition, Kid is monomeric in solution but exhibits processive movement on microtubules. Dimerization is generally required for the processivity. Therefore, we suggest Kid forms a dimer on microtubules.

      To show that Kid forms a dimer on microtubules, we will perform photobleaching assays and measure the fluorescent intensities of each particle on microtubules to determine their oligomeric state.

      The discussion needs to better put the work into context regarding the ability of non-processive motors to work in teams (formerly thought to be the case for Kid) and how their findings on Kid change this prevailing view in the case of polar ejection force. 

      We will look for the example of non-processive motors and include them in the Discussion and Citation. As described by this reviewer, Kid was originally thought to be a non-processive motor. We hope that our current work would change that view.  

      The authors also do not mention previous work on kinesins with non-conventional neck linker/neck coil regions that have been shown to move processively. Their work on Kid needs to be put into this context.

      We have thought that most kinesins, belonging to the cargo-transport classes, have conserved neck linker domain and neck coil domains, with Kid being exception. We will search for more citations, including non-transport classes of kinesins, and re-write the Discussion.

    1. eLife Assessment

      This valuable study uses the analysis of connectomic and transcriptomic datasets to survey the anatomy and connectivity of neurosecretory cells in the Drosophila brain. While the connectivity analyses are convincing, the anatomical and functional data provided to verify cell type identity and paracrine signaling is incomplete. Once these aspects are improved, this study would be of interest to neuroscientists working on hormonal signaling in Drosophila and other animals.

    2. Reviewer #1 (Public review):

      Summary:

      The study by McKim et al seeks to provide a comprehensive description of the connectivity of neurosecretory cells (NSCs) using a high-resolution electron microscopy dataset of the fly brain and several single-cell RNA seq transcriptomic datasets from the brain and peripheral tissues of the fly. They use connectomic analyses to identify discrete functional subgroups of NSCs and describe both the broad architecture of the synaptic inputs to these subgroups as well as some of the specific inputs including from chemosensory pathways. They then demonstrate that NSCs have very few traditional presynapses consistent with their known function as providing paracrine release of neuropeptides. Acknowledging that EM datasets can't account for paracrine release, the authors use several scRNAseq datasets to explore signaling between NSCs and characterize widespread patterns of neuropeptide receptor expression across the brain and several body tissues. The thoroughness of this study allows it to largely achieve it's goal and provides a useful resource for anyone studying neurohormonal signaling.

      Strengths:

      The strengths of this study are the thorough nature of the approach and the integration of several large-scale datasets to address short-comings of individual datasets. The study also acknowledges the limitations that are inherent to studying hormonal signaling and provides interpretations within the the context of these limitations.

      Weaknesses:

      Overall, the framing of this paper needs to be shifted from statements of what was done to what was found. Each subsection, and the narrative within each, is framed on topics such as "synaptic output pathways from NSC" when there are clear and impactful findings such as "NSCs have sparse synaptic output". Framing the manuscript in this way allows the reader to identify broad takeaways that are applicable to other model system. Otherwise, the manuscript risks being encyclopedic in nature. An overall synthesis of the results would help provide the larger context within which this study falls.

      The cartoon schematic in Figure 5A (which is adapted from a 2020 review) has an error. This schematic depicts uniglomerular projection neurons of the antennal lobe projecting directly to the lateral horn (without synapsing in the mushroom bodies) and multiglomerular projection neurons projecting to the mushroom bodies and then lateral horn. This should be reversed (uniglomerular PNs synapse in the calyx and then further project to the LH and multiglomerular PNs project along the mlACT directly to the LH) and is nicely depicted in a Strutz et al 2014 publication in eLife.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive description of the neurosecretory network in the adult Drosophila brain. They sought to assign and verify the types of 80 neurosecretory cells (NSCs) found in the publicly available FlyWire female brain connectome. They then describe the organization of synaptic inputs and outputs across NSC types and outline circuits by which olfaction may regulate NSCs, and by which Corazon-producing NSCs may regulate flight behavior. Leveraging existing transcriptomic data, they also describe the hormone and receptor expressions in the NSCs and suggest putative paracrine signaling between NSCs. Taken together, these analyses provide a framework for future experiments, which may demonstrate whether and how NSCs, and the circuits to which they belong, may shape physiological function or animal behavior.

      Strengths:

      This study uses the FlyWire female brain connectome (Dorkenwald et al. 2023) to assign putative cell types to the 80 neurosecretory cells (NSCs) based on clustering of synaptic connectivity and morphological features. The authors then verify type assignments for selected populations by matching cluster sizes to anatomical localization and cell counts using immunohistochemistry of neuropeptide expression and markers with known co-expression.

      The authors compare their findings to previous work describing the synaptic connectivity of the neurosecretory network in larval Drosophila (Huckesfeld et al., 2021), finding that there are some differences between these developmental stages. Direct comparisons between adults and larvae are made possible through direct comparison in Table 1, as well as the authors' choice to adopt similar (or equivalent) analyses and data visualizations in the present paper's figures.

      The authors extract core themes in NSC synaptic connectivity that speak to their function: different NSC types are downstream of shared presynaptic outputs, suggesting the possibility of joint or coordinated activation, depending on upstream activity. NSCs receive some but not all modalities of sensory input. NSCs have more synaptic inputs than outputs, suggesting they predominantly influence neuronal and whole-body physiology through paracrine and endocrine signaling.

      The authors outline synaptic pathways by which olfactory inputs may influence NSC activity and by which Corazon-releasing NSCs may regulate flight. These analyses provide a basis for future experiments, which may demonstrate whether and how such circuits shape physiological function or animal behavior.

      The authors extract expression patterns of neuropeptides and receptors across NSC cell types from existing transcriptomic data (Davie et al., 2018) and present the hypothesis that NSCs could be interconnected via paracrine signaling. The authors also catalog hormone receptor expression across tissues, drawing from the Fly Cell Atlas (Li et al., 2022).

      Weaknesses:

      The clustering of NSCs by their presynaptic inputs and morphological features, along with corroboration with their anatomical locations, distinguished some, but not all cell types. The authors attempt to distinguish cell types using additional methodologies: immunohistochemistry (Figure 2), retrograde trans-synaptic labeling, and characterization of dense core vesicle characteristics in the FlyWire dataset (Figure 1, Supplement 1). However, these corroborating experiments often lacked experimental replicates, were not rigorously quantified, and/or were presented as singular images from individual animals or even individual cells of interest. The assignments of DH44 and DMS types remain particularly unconvincing.

      The authors present connectivity diagrams for visualization of putative paracrine signaling between NSCs based on their peptide and receptor expression patterns. These transcriptomic data alone are inadequate for drawing these conclusions, and these connectivity diagrams are untested hypotheses rather than results. The authors do discuss this in the Discussion section.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript presents an ambitious and comprehensive synaptic connectome of neurosecretory cells (NSC) in the Drosophila brain, which highlights the neural circuits underlying hormonal regulation of physiology and behaviour. The authors use EM-based connectomics, retrograde tracing, and previously characterised single-cell transcriptomic data. The goal was to map the inputs to and outputs from NSCs, revealing novel interactions between sensory, motor, and neurosecretory systems. The results are of great value for the field of neuroendocrinology, with implications for understanding how hormonal signals integrate with brain function to coordinate physiology.

      The manuscript is well-written and provides novel insights into the neurosecretory connectome in the adult Drosophila brain. Some, additional behavioural experiments will significantly strengthen the conclusions.

      Strengths:

      (1) Rigorous anatomical analysis<br /> (2) Novel insights on the wiring logic of the neurosecretory cells.

      Weaknesses:

      (1) Functional validation of findings would greatly improve the manuscript.

    1. eLife Assessment

      The study describes a valuable new technology in the field of targeted protein degradation that allows identification of E3-ubiquitin ligases that target a protein of interest. The presented data are convincing, however, additional work will be needed to optimize for high-throughput evaluation. This technology will therefore serve the community in the initial stages of developing targeted protein degraders.

    2. Reviewer #1 (Public review):

      Summary:

      PROTACs are heterobifunctional molecules that utilize the Ubiquitin Proteasome System to selectively degrade target proteins within cells. Upon introduction to the cells, PROTACs capture the activity of the E3 ubiquitin ligases for ubiquitination of the targeted protein, leading to its subsequent degradation by the proteasome. The main benefit of PROTAC technology is that it expands the "druggable proteome" and provides numerous possibilities for therapeutic use. However, there are also some difficulties, including the one addressed in this manuscript: identifying suitable target-E3 ligase pairs for successful degradation. Currently, only a few out of about 600 E3 ligases are used to develop PROTAC compounds, which creates the need to identify other E3 ligases that could be used in PROTAC synthesis. Testing the efficacy of PROTAC compounds has been limited to empirical tests, leading to lengthy and often failure-prone processes. This manuscript addressed the need for faster and more reliable assays to identify the compatible pairs of E3 ligases-target proteins. The authors propose using the RiPA assay, which depends on rapamycin-induced dimerization of FKBP12 protein with FRB domain. The PROTAC technology is advancing rapidly, making this manuscript both timely and essential. The RiPA assay might be useful in identifying novel E3 ligases that could be utilized in PROTAC technology. Additionally, it could be used at the initial stages of PROTAC development, looking for the best E3 ligase for the specific target.

      The authors described an elegant assay that is scalable, easy-to-use and applicable to a wide range of cellular models. This method allows for the quantitative validation of the degradation efficacy of a given pair of E3 ligase-target protein, using luciferase activity as a measure. Importantly, the assay also enables the measurement of kinetics in living cells, enhancing its practicality.

      Strengths:

      (1) The authors have addressed the crucial needs that arise during PROTAC development. In the introduction, they nicely describe the advantages and disadvantages of the PROTAC technology and explain why such an assay is needed.

      (2) The study includes essential controls in experiments (important for generating new assay), such as using the FRB vector without E3 ligase as a negative control, testing different linkers (which may influence the efficacy of the degradation), and creating and testing K-less vectors to exclude the possibility of luciferase or FKBP12 ubiquitination instead of WDR5 (the target protein). Additionally, the position of the luc in the FKBP12 vector and the position of VHL in the FRB vector are tested. Different E3 ligases are tested using previously identified target proteins, confirming the assay's utility and accuracy.

      (3) The study identified a "new" E3 ligase that is suitable for PROTAC technology (FBXL).

      Weaknesses:

      It is not clear how feasible it would be to adapt the assay for high-throughput screens.

      Comments on revisions:

      The authors have addressed my previous concerns and made changes to the manuscript, resulting in a well-written paper.

    3. Reviewer #2 (Public review):

      Summary:

      Adhikari and colleagues developed a new technique, rapamycin-induced proximity assay (RiPA), to identify E3-ubiquitin (ub) ligases of a protein target, aiming at identifying additional E3 ligases that could be targeted for PROTAC generation or ligases that may degrade a protein target. The study is timely, as expanding the landscape of E3-ub ligases for developing targeted degraders is a primary direction in the field.

      Strengths:

      (1) The study's strength lies in its practical application of the FRB:FKBP12 system. This system is used to identify E3-ub ligases that would degrade a target of interest, as evidenced by the reduction in luminescence upon the addition of rapamycin. This approach effectively mimics the potential action of a PROTAC.

      Weaknesses:

      (1) While the technique shows promise, its application in a discovery setting, particularly for high-throughput or unbiased E3-ub ligase identification, may pose challenges. The authors now discuss these potential difficulties providing a more comprehensive understanding of RiPA's limitations.

      (2) While RiPA will help identify E3 ligases, PROTAC design would still be empirical. The authors provide some discussion of this limitation.

      Comments on revisions:

      I thank the authors for addressing my prior concerns. I would recommend that individual replicate values are plotted in all the mean -/+ s.d or sem graphs.

    4. Author response:

      The following is the authors’ response to the original reviews.

      First of all, we would like to thank the reviewers for their very constructive comments, which helped us to improve the manuscript! In response to the raised issues, we have performed new experiments and made necessary changes on the manuscript.

      eLife Assessment

      The study describes a valuable new technology in the field of targeted protein degradation that allows identification of E3-ubiquitin ligases that target a protein of interest. The presented data are convincing, however, it is unclear whether the proposed system can be successfully used in high throughput applications. This technology will serve the community in the initial stages of developing targeted protein degraders.

      We thank the eLife editors for the positive assessment and have clarified the scalability of our system for high throughput applications in the revised manuscript (see our response to both reviewer’s comment on weakness point 1).

      Reviewer #1 (Public Review):

      Summary:

      PROTACs are heterobifunctional molecules that utilize the Ubiquitin Proteasome System to selectively degrade target proteins within cells. Upon introduction to the cells, PROTACs capture the activity of the E3 ubiquitin ligases for ubiquitination of the targeted protein, leading to its subsequent degradation by the proteasome. The main benefit of PROTAC technology is that it expands the "druggable proteome" and provides numerous possibilities for therapeutic use. However, there are also some difficulties, including the one addressed in this manuscript: identifying suitable target-E3 ligase pairs for successful degradation. Currently, only a few out of about 600 E3 ligases are used to develop PROTAC compounds, which creates the need to identify other E3 ligases that could be used in PROTAC synthesis. Testing the efficacy of PROTAC compounds has been limited to empirical tests, leading to lengthy and often failure-prone processes. This manuscript addressed the need for faster and more reliable assays to identify the compatible pairs of E3 ligases-target proteins. The authors propose using the RiPA assay, which depends on rapamycin-induced dimerization of FKBP12 protein with FRB domain. The PROTAC technology is advancing rapidly, making this manuscript both timely and essential. The RiPA assay might be useful in identifying novel E3 ligases that could be utilized in PROTAC technology. Additionally, it could be used at the initial stages of PROTAC development, looking for the best E3 ligase for the specific target.

      The authors described an elegant assay that is scalable, easy-to-use, and applicable to a wide range of cellular models. This method allows for the quantitative validation of the degradation efficacy of a given pair of E3 ligase-target proteins, using luciferase activity as a measure. Importantly, the assay also enables the measurement of kinetics in living cells, enhancing its practicality.

      Strengths:

      (1) The authors have addressed the crucial needs that arise during PROTAC development. In the introduction, they nicely describe the advantages and disadvantages of the PROTAC technology and explain why such an assay is needed.

      (2) The study includes essential controls in experiments (important for generating new assay), such as using the FRB vector without E3 ligase as a negative control, testing different linkers (which may influence the efficacy of the degradation), and creating and testing K-less vectors to exclude the possibility of luciferase or FKBP12 ubiquitination instead of WDR5 (the target protein). Additionally, the position of the luc in the FKBP12 vector and the position of VHL in the FRB vector are tested. Different E3 ligases are tested using previously identified target proteins, confirming the assay's utility and accuracy.

      (3) The study identified a "new" E3 ligase that is suitable for PROTAC technology (FBXL).

      We greatly appreciate the reviewer’s positive feedback on our work. To evaluate our system further, in our revised manuscript we have conducted additional analysis on KRASG12D degradation via VHL and CRBN within our K-less system. Consistent with previous findings of VHL-harnessing PROTACs, our assay demonstrated that VHL mediated efficient degradation of KRASG12D while CRBN induced only a minor effect. This new data is presented in Figure 2 - figure supplement 1C of the revised manuscript.

      Weaknesses:

      · It is not clear how feasible it would be to adapt the assay for high-throughput screens.

      The design of our study is a well-based assay. It is therefore possible but not realistic to evaluate all 600 and more human E3 ligases. Nonetheless, if interested in all E3 ligases, our assay could be adapted for pooled experimental strategies, as demonstrated in Poirson, J., Cho, H., Dhillon, A. et al., Nature 628, 878–886 (2024).

      Our system offers several advantages over pooled screens, including the generation of more quantitative data and faster testing of selected candidates. Pooled screens, by contrast, require more time due to the necessity of next-generation sequencing and bioinformatics analysis. Moreover, in response to the reviewers comment, we have included a schematic in the revised manuscript (Figure 4 - figure supplement 1A) that outlines the assay duration and hands-on time for target and E3 ligase candidates.

      · In some experiments, the efficacy of WDR5 degradation tested by immunoblotting appears to be lower than luciferase activity (e.g., Figure 2G and H).

      We concur with the reviewer that in some instances, the degradation observed via immunoblotting appears lower than that indicated by luciferase activity. Thus, we have quantified the western and added it to the respective blots. This discrepancy may result from the non-linearity of western blots.

      Reviewer #2 (Public Review):

      Summary:

      Adhikari and colleagues developed a new technique, rapamycin-induced proximity assay (RiPA), to identify E3-ubiquitin (ub) ligases of a protein target, aiming at identifying additional E3 ligases that could be targeted for PROTAC generation or ligases that may degrade a protein target. The study is timely, as expanding the landscape of E3-ub ligases for developing targeted degraders is a primary direction in the field.

      Strengths:

      The study's strength lies in its practical application of the FRB:FKBP12 system. This system is used to identify E3-ub ligases that would degrade a target of interest, as evidenced by the reduction in luminescence upon the addition of rapamycin. This approach effectively mimics the potential action of a PROTAC.

      We are delighted with this assessment of our work by the reviewer. To evaluate our system further, in our revised manuscript we have conducted additional analysis on KRASG12D degradation via VHL and CRBN within our K-less system. Consistent with previous findings of VHL-harnessing PROTACs, our assay demonstrated that VHL mediated efficient degradation of KRASG12D while CRBN induced only a minor effect. This new data is presented in Figure 2 - figure supplement 1C of the revised manuscript.

      Weaknesses:

      (1) While the technique shows promise, its application in a discovery setting, particularly for high-throughput or unbiased E3-ub ligase identification, may pose challenges. The authors should provide more detailed insights into these potential difficulties to foster a more comprehensive understanding of RiPA's limitations.

      The design of our study is well-based assay . It is therefore possible but not realistic to evaluate all 600 and more human E3 ligases. Nonetheless, if interested in all E3 ligases, our assay could be adapted for pooled experimental strategies, as demonstrated in Poirson, J., Cho, H., Dhillon, A. et al., Nature 628, 878–886 (2024).

      Our system offers several advantages over pooled screens, including the generation of more quantitative data and faster testing of selected candidates. Pooled screens, by contrast, require more time due to the necessity of next-generation sequencing and bioinformatics analysis. Moreover, in response to the reviewers comment, we have included a schematic in the revised manuscript (Figure 4 - figure supplement 1A) that outlines the assay duration and hands-on time for target and E3 ligase candidates.

      We also added the following sentences to the Limitations of the study section of the revised manuscript (line 322-326): “While our system offers easy testing of different tagging approaches and due to its simple workflow facilitates the rapid characterization of novel E3 ligases across multiple targets, it is currently not optimized for high-throughput evaluation of all 600+ E3 ligases. Achieving such scale would necessitate further adaptations, including the incorporation of pooled experimental strategies.”

      (2) While RiPA will help identify E3 ligases, PROTAC design would still be empirical. The authors should discuss this limitation. Could the technology be applied to molecular glue generation?

      We agree with the reviewer that our assay rationalizes the choice of E3 ligases but that PROTAC design (“linkerology”) is still mostly empirical. To address this, we included the following line in the Limitations of the study section of our initial manuscript (line 327-330): “Conversely, it is also conceivable that an E3 ligase that can efficiently decrease the levels of a particular target in the RiPA setting may be less suitable for PROTACs, since PROTACs that mimic the steric interaction of the target/E3 pair may not be easily identified in the chemical space.”

      Regarding molecular glues, our assay could also be instrumental in identifying suitable E3 ligases for a target protein prior to screening for molecular glues, provided that the screening system specifically screens E3 ligase and target pairs. However, as most molecular glue screens are currently agnostic to specific E3 ligases or targets, our system may not be applicable in those cases. We have elaborated on this in the discussion section of the revised manuscript (line 271-274): “We envision that this setting will be valuable for identifying the most suitable E3 ligase candidates for PROTACs aimed at specific proteins, and for guiding E3 ligase selection when screening for molecular glues targeting specific E3 ligase and protein pairs.”

      (3) Controls to verify the intended mechanism of action are missing, such as using a proteasome inhibitor or VHL inhibitors/siRNA to verify on-target effects. Verification of the target E3 ligase complex after rapamycin addition via orthogonal approaches, such as IP, should be considered.

      We thank the reviewer for the comment. Particularly VHL siRNA is not beneficial in this setup, as we overexpress the E3 ligase rather than relying on endogenous protein.

      To verify mechanism of action, we performed additional experiments in the presence of proteosomal inhibitor MG132 and neddylation inhibitor MLN4924 with target KRASG12D and E3 ligase VHL. The results is shown in Figure 2H of the revised manuscript.

      Minor concern:

      The graphs in Figure 1E are missing.

      We thank the reviewer for pointing this out. We corrected the figure in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      •  Optionally, the authors could add control experiments with Aurora B and Crb vectors (there shouldn't be any degradation) and experiments confirming that the degradation occurs via the proteasome. For example, the addition of proteasome inhibitors (such as bortezomib) should decrease the efficiency of the target degradation and confirm that targets are degraded via the proteasome system.

      Regarding Aurora-B degradation, as far as we know, there are no specific Aurora-B PROTACs reported. Thus, there is no definitive evidence that CRBN could not degrade Aurora-B. Nevertheless, we performed assays with Aurora-B and VHL, CRBN, or FRB, and observed more effective degradation of Aurora-B by VHL than CRBN. This data is now included in Figure 2 - figure supplement 1B of the revised manuscript.

      • It would also be helpful to provide a possible explanation for why the ratio 1:1 of vectors did not induce the degradation (regarding Figure 1D).

      We believe the lack of degradation with 1:1 vector ratio is due to the differential expression levels of endogenous FKBP12 and mTOR in HEK293 cells. According to Human Protein Atlas, the normalized protein-coding transcripts per million (nTPM) for FKBP12 and mTOR in HEK293 cells are 160 and 24 respectively, indicating that FKBP12 is expressed at levels approximately 6.7 times higher than mTOR. This disparity likely limits the heterodimerization of exclusively fusion proteins upon rapamycin addition. To increase the likelihood of FKBP12 and FRB fusion protein dimerization, we used a higher ratio of the FRB component during transfection, considering the higher endogenous expression of FKBP12.

      • It would be helpful to add more explanation for the data in Figure 1F, including whether there is a difference between vectors with different positions of VHL and FRB and why the FRB-VHL vector is less expressed without rapamycin.

      We thank the reviewer for the comment. Regarding the vector orientations of VHL/FRB and WDR5/Luc/FKBP12, we have consistently observed different migration behaviors for WDR5 and VHL constructs, despite their same molecular weights. This observation aligns with literature reports where differential running behavior is noted when FRB or FKBP12 (or their mutants) are tagged to the N- or C-terminus of a protein (Bondeson, D.P., Mullin-Bernstein, Z., Oliver, S. et al. Nat Commun 13, 5495 (2022); Mabe, S., Nagamune, T. & Kawahara, M. Sci Rep 4, 6127 (2014)). We have now included the following explanation in the figure legend of Figure 1F of the revised manuscript: “WDR5 and VHL fusion proteins tagged at the N- and C-terminal show different migration behaviors despite having same molecular weight.”

      Additionally, the stabilizing effect of rapamycin on FRB (or its mutants), FRB fusion proteins, and FRB-containing proteins has been documented (Stankunas, K., Bayle, J.H., Havranek, J.J. et al. ChemBioChem, 8(10), 1162-1169 (2007); Stankunas, K., Bayle, J.H., Gestwicki J.E. et al. Mol Cell, 12(6), 1615–1624 (2003); Zhang, C., Cui, M., Cui, Y. et al. J. Vis. Exp. (150), e59656 (2019)). We believe that the degree of stabilization by rapamycin could differ between N- and C-terminal FRB fusion proteins.

      • Finally, the mistake in Figure 2G (where the lanes are wrongly labelled, BRBN-FRB and FRB) should be corrected. Also please correct the graph in Figure 1E (there seems to be a problem with bars for 1:100). There are some typos, such as in lines 38, 277, and 288.

      Thank you for bringing this to our attention. We have corrected all the mentioned errors.

    1. eLife Assessment

      This important study identifies the "H-state" as a potential conformational marker distinguishing amyloidogenic from non-amyloidogenic light chains, addressing a critical problem in protein misfolding and amyloidosis. By combining advanced techniques such as small-angle X-ray scattering, molecular dynamics simulations, and H-D exchange mass spectrometry, the authors provide convincing evidence for their novel findings. However, incomplete experimental descriptions, limitations in SAXS data interpretation, and the way HDX MS data is presented affect the strength and generalizability of the conclusions. Strengthening these aspects would enhance the impact of this work for researchers in amyloidosis and protein misfolding.

    2. Reviewer #1 (Public review):

      The study investigates light chains (LCs) using three distinct approaches, with a focus on identifying a conformational fingerprint to differentiate amyloidogenic light chains from multiple myeloma light chains. The study's major contribution is the identification of a low-populated "H state," which the authors propose as a unique marker for AL-LCs. While this finding is promising, the review highlights several strengths and weaknesses. Strengths include the valuable contribution of identifying the H state and the use of multiple approaches, which provide a comprehensive understanding of LC structural dynamics. However, the study suffers from weaknesses, particularly in the interpretation of SAXS data, lack of clarity in presentation, and methodological inconsistencies. Critical concerns include high error margins between SAXS profiles and MD fits, unclear validation of oligomeric species in SAXS measurements, and insufficient quantitative cross-validation between experimental (HDX) and computational data (MD). This reviewer calls for major revisions including clearer definitions, improved methodology, and additional validation, to strengthen the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This well-written manuscript addresses an important but recalcitrant problem - the molecular mechanism of protein misfolding in Ig light chain (LC) amyloidosis (AL), a major life-threatening form of systemic human amyloidosis. The authors use expertly recorded and analyzed small-angle X-ray scattering (SAXS) data as a restraint for molecular dynamics simulations (called M&M) and to explore six patient-based LC proteins. The authors report that a highly populated "H-state" determined computationally, wherein the two domains in an LC molecule acquire a straight rather than bent conformation, is what distinguishes AL from non-AL LCs. They then use H-D exchange mass spectrometry to verify this conclusion. If confirmed, this is a novel and interesting finding with potentially important translational implications.

      Strengths:

      Expertly recorded and analyzed SAXS data combined with clever M&M simulations lead to a novel and interesting conclusion.

      Regardless of whether or not the CL-CL domain interface is destabilized in AL LCs explored in this (Figure 6) and other studies, stabilization of this interface is an excellent idea that may help protect at least a subset of AL LCs from misfolding in amyloid. This idea increases the potential impact of this interesting study.

      Weaknesses:

      The HDX analysis could be strengthened.

    4. Reviewer #3 (Public review):

      Summary:

      This study identifies confirmational fingerprints of amylodogenic light chains, that set them apart from the non-amylodogenic ones.

      Strengths:

      The research employs a comprehensive combination of structural and dynamic analysis techniques, providing evidence that conformational dynamics at VL-CL interface and structural expansion are distinguished features of amylodogenic LCs.

      Weaknesses:

      The sample size is limited, which may affect the generalizability of the findings. Additionally, the study could benefit from deeper analysis of specific mutations driving this unique conformation to further strengthen therapeutic relevance.

    1. eLife Assessment

      The manuscript by Guo and colleagues reports valuable findings about the inhibitory activity of caffeic acid phenethyl ester (CAPE) against TcdB, a key toxin produced by Clostridioides difficile. C. difficile infections are a major public health concern, and this manuscript provides interesting data on toxin inhibition by CAPE, a potentially promising therapeutic alternative for this disease. The strength of the evidence to support the conclusions is solid, with some concerns about the moderate effects on the mouse infection model and direct binding assays of CAPE to the toxin.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

    3. Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Autoprocessing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

    4. Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

    1. eLife Assessment

      This important study uses Mendelian Randomisation to show that early life phenotypes (i.e. onset of age at menarche and age at first birth) have an influence on a multitude of health outcomes later in life. The provided empirical evidence supporting the antagonistic pleiotropy theory is solid. However, some additional analyses and a more comprehensive discussion of the findings are needed to make the study stronger.

    2. Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotorpy hypothesis of ageing, predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      There are some errors in the methodology, that require revisions.

      In particular, the main conclusions drawn by the authors refer to the Mendelian Randomization analyses. However, the authors made a few errors here that need to be reconsidered:

      (1) Many of the outcomes investigated by the authors are continuous outcomes, while the authors report odds ratios. This is not correct and should be revised.

      (2) Some of the odds ratios (for example the one for osteoporosis) are really small, while still reaching the level of statistical significance. After some checking, I found the GWAS data used to generate these MR estimates were processed by the program BOLT-LLM. This program is a linear mixed model program, which requires the transformation of the beta estimates to be useful for dichotomous outcomes. The authors should check the manual of BOLT-LLM and recalculate the beta estimates of the SNP-outcome associations prior to the Mendelian Randomization analyses. This should be checked for all outcomes as it doesn't apply to all.

      (3) The authors should follow the MR-Strobe guidelines for presentation.

      (4) The authors should report data in the text with a 95% confidence interval.

      (5) The authors should consider correction for multiple testing.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging, and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identified 128 fertility-related SNPs that are associated with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Points that have to be clarified/addressed:

      (1) The antagonistic pleiotropy is an evolutionary theory pointing to the possibility that mutations that are beneficial for fitness (early life health and reproduction) may be detrimental later in life. As it concerns an evolutionary process and the authors focus on contemporary data from a single generation, more context is necessary on how this theory is accurately testable. For example, why and how much natural variation is there for fitness outcomes in humans? How do genetic risk score distributions of the exposure data look like? Also, how can the authors distinguish in their data between the antagonistic pleiotropy theory and the disposable soma theory, which considers a trade-off between investment in reproduction and somatic maintenance and can be used to derive similar hypotheses? There is just a very brief mention of the disposable soma theory in lines 196-198.

      (2) The antagonistic pleiotropy theory, used to derive the hypothesis, does not necessarily distinguish between male and female fitness. Would the authors expect that their results extrapolate to males as well? And can they test that?

      (3) There is no statistical analyses section providing the exact equations that are tested. Hence it's not clear how many tests were performed and if correction for multiple testing is necessary. It is also not clear what type of analyses have been done and why they have been done. For example in the section starting at line 47, Odds Ratios are presented, indicating that logistic regression analyses have been performed. As it's not clear how the outcomes are defined (genotype or phenotype, cross-sectional or longitudinal, etc.) it's also not clear why logistic regression analysis was used for the analyses.

      (4) Mendelian Randomization is an important part of the analyses done in the manuscript. It is not clear to what extent the MR assumptions are met, how the assumptions were tested, and if/what sensitivity analyses are performed; e.g. reverse MR, biological knowledge of the studied traits, etc. Can the authors explain to what extent the genetic instruments represent their targets (applicable expression/protein levels) well?

      (5) It is not clear what reference genome is used and if or what imputation panel is used. It is also not clear what QC steps are applied to the genotype data in order to construct the genetic instruments of MR.

      (6) A code availability statement is missing. It is understandable that data cannot always be shared, but code should be openly accessible.

    1. eLife Assessment

      This basic research study presents useful data concerning the menstrual fluid composition and its potential for endometriosis biomarker research. However, despite solid bioinformatics analyses, the choice of markers used to separate or identify the different cell types needs to be justified and the results better discussed in relation to current knowledge of the pathophysiology of endometriosis.

    2. Reviewer #1 (Public review):

      Summary:

      The characteristics of endometrium health are an increasing topic in women's health issues, especially in the context of endometriosis. In this respect, having access to information is hampered by the inaccessibility of the uterine tissue. The authors propose here using the menstrual fluid (easily accessible by non-invasive methods) as an access door towards getting relevant information.

      Overall, the paper is divided into two parts:<br /> (1) The comparison between menstrual fluid samples and biopsies of the endometrium.<br /> 2) As a proof of concept, the authors then compared 11 controls and 7 endometriosis cases in this way, from different severity stages.

      Strengths:

      In Figure 1, general features of the 15 samples are presented (volume/number of cells/hematopoietic cells - cd45 labeling). The authors then used single-cell RNA-seq to characterize the different samples. Through having access to endometrium biopsies, they were able to compare the profiles obtained.

      In the MF samples from the second part of the paper - aiming at comparing endometriosis and controls - one question is raised about the effect of culture. The authors compared freshly isolated and cultured tissues (ex vivo vs in vitro) by bulk RNA seq. Biases induced by the culture procedure were identified. Deconvolution was applied to strengthen this observation, with an important increase of seemingly stromal and unknown cells, especially in the unsorted cells and the CD45+ cells.

      Interestingly, since the authors got successive samples from the same donor, they could evaluate the consistency of the samples and reveal indeed an overall stability of the molecular profile of the samples in a given patient.

      The authors then attempted - quite originally - to characterize biomarkers in two major cell compartments that they studied - CD45- (stromal-like) and CD45+ (immune cells).

      Weaknesses:

      A potential problem is the justification of the a priori mix of cell types of three different phenotypes (CD45+, CD45- EPCAM+, and CD45- EPCAM-) from each patient before moving to the scRNAseq. It is not clear to me why this has been done, I guess that using directly the samples would supposedly bias the result. But in this case, why is it supposed that three categories are enough (immune cells, epithelial cells, and stromal cells)? I suppose that other markers could characterize other subtypes of the cells, and take into account the possibility of other cell types, for instance, connected to pain sensitivity, such as neuron precursor. Hence, the justification of the organized mixes should be much more detailed in my opinion.

      It is a bit unclear to me when the biopsies were collected in the cycle of the donor patients.

      The description of these markers that are deregulated is presented as a list, and connected with existing publications, which could rather be presented in discussion than in the results. The authors do tend to demonstrate that the Menstrual Fluid is a good proxy to analyse the endometrium health status of the women affected with endometriosis.

      The identification of MTRNR2L1 seems to be a major discovery of the paper, as well as in a lesser measure HBG2, and it is a bit strange why these putative markers were not emphasized in the abstract. HBG2 was certainly identified previously in endometriosis endothelial cells but seems extremely variable from one sample to another - Geo profile (GDS3060, GDS3060 / 213515_x_at (inist.fr)).

      Overall, the transcriptome analysis is a bit shallow, with no effort made to try to find potential transcription factors or miRNA that could activate/inhibit a series of modified genes; it could be relevant to identify such master genes or master regulators through bioinformatics analyses and wet-lab validations, to understand better the cascade of events.

      Another issue that was overlooked is the presence of 'stem-cells' in the MF obtained. Since endometriosis is supposed to occur from the implantation of uterine stem cells, this category could be a major topic of scrutiny, in terms of quantity in the MF, as well as in terms of their specific molecular properties.

    3. Reviewer #2 (Public review):

      Summary:

      The authors provided further evidence that menstrual fluid (MF) can be used as a non-invasive source of endometrial tissue for studying its normal physiological state and when it is abnormal such as in endometriosis. Single-cell RNA sequencing confirmed the presence of the major cell types -blood and tissue immune cells and endometrial stromal, epithelial, and vascular cells. The major new finding was that interindividual variation for the blood immune cells was minimal between multiple MF samples from an individual. A comparison between the ex vivo MF gene profile and cultured MF showed the expected attachment and culture of stromal (and a small number of epithelial) cells, but the immune cells failed to attach. Several differentially expressed genes between controls and endometriosis were suggested as potential biomarkers of the disease, however, these were a mitochondrial pseudogene and a hemoglobin subunit, both very unlikely related to endometriosis pathogenesis.

      Strengths:

      The Spearman correlation analysis between the control MF gene profiles of multiple samples from the same individual and its graphic presentation provided strong evidence that there is little variation between MF samples. Together with another study which showed similar findings for endometrial stem cells and a number of proteins in MF supernatant, this important data shows MF as a promising biofluid for pathology testing.

      The bioinformatic analyses conducted by bioinformatic and computational experts are a major strength of the manuscript and in particular the comparison between MF and endometrial biopsy data obtained from published scRNAseq studies. This is an important finding, particularly if comparisons included late secretory and early proliferative stage biopsy tissue which would be most similar to shedding menstrual endometrium.

      The inclusion of workflows in the Figures for the various studies and the use of symbols in the various panels is very helpful for the reader.

      MF cell suspensions were enriched for stromal and epithelial cells to enable a detailed bioinformatic analysis of their respective gene profiles

      Weaknesses:

      Two patient cohorts from different institutions were used in the study and somewhat different methods were used to extract the cellular fraction from these cohorts for further study: (1) sample dilution and differential filtration to separate blood-derived immune cells from endometrial tissue then dissociated into single cells and separated into CD45+, CD45-EpCAM+ and CD45-EpCAM- cells, and (2) gradient density separation to generate unsorted, CD45+, CD45- and putative mesenchymal stem cells (MSC) CD45-CD105+ which were also cultured. In addition, questions on pelvic pain and proven fertility would have addressed the 2 key symptoms of endometriosis.

      The use of CD105 to purify MSC from MF rather than well-characterised markers of clonogenic, self-renewing, and mesodermal differentiating endometrial MSC such as CD146+PDGFRB+ or SUSD2 (both mentioned in references 22 and 23) is a weakness. The ISCT markers are not specific and are also found on stromal fibroblasts of many tissues (Phinney and Sensebe Cytotherapy 2013; Demu et al Acta Haematologica 2016).<br /> The UMAPs generated from the scRNAseq were at low resolution and more individual immune and endometrial cell types have previously been identified and reported in MF. More comparisons with these studies would also have enhanced the Discussion.

      It was not always possible to work out how the data was reported in the gene expression tables (Supplementary Tables 2, 4-10) as they were not in adjusted P value order and sometimes positive log2 fold change values appeared amongst the negative log2FC. In some comparisons described, the adj P values were not significant but were described as up or down-regulated in the text.

      The 2 DEGs highlighted in the endometriosis and control arm of the study appear as poor choices from many others that could have been chosen as MTRNR2L1 is a mitochondrial pseudogene and HBG2 is a hemoglobin subunit. Neither are likely indicators of endometriosis pathogenesis.

      The manuscript format and organisation could be improved by reducing the discussion in the Results section and providing a more in-depth Discussion. More references need to be included in the Discussion and other work in the MF analysis field that supports - or not - the authors' findings or at least puts them into context, and should be included and referenced.

      The potential to use MF as a non-invasive source of endometrial tissue for potential diagnosis is a very important avenue of research that is currently in its infancy and could have a major impact in the endometriosis research arena.

    1. eLife Assessment

      This important work combines self-report, neural and physiology data to examine the efficacy and mechanisms of counter conditioning versus extinction in reducing re-emergence of conditioned threat responses and show that this appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These findings are supported by convincing evidence, though some areas could benefit from added clarity and a few targeted refinements and justifications of analytical choices. Results will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians, particularly those with an interest in clinical applications in trauma therapies.

    2. Reviewer #1 (Public review):

      The authors attempted to replicate previous work showing that counterconditioning leads to more persistent reduction of threat responses, relative to extinction. They also aimed to examine the neural mechanisms underlying counterconditioning and extinction. They achieved both of these aims and were able to provide some additional information, such as how counterconditioning impacts memory consolidation. Having a better understanding of which neural networks are engaged during counterconditioning may provide novel pharmacological targets to aid in therapies for traumatic memories. It will be interesting to follow up by examining the impact of varying amounts of time between acquisition and counterconditioning phases, to enhance replicability to real-world therapeutic settings.

      Major strengths

      • This paper is very well written and attempts to comprehensively assess multiple aspects of counterconditioning and extinction processes. For instance, the addition of memory retrieval tests is not core to the primary hypotheses but provides additional mechanistic information on how episodic memory is impacted by counterconditioning. This methodical approach is commonly seen in animal literature, but less so in human studies.

      • The Group x Cs-type x Phase repeated measure statistical tests with 'differentials' as outcome variables are quite complex, however, the authors have generally done a good job of teasing out significant F test findings with post hoc tests and presenting the data well visually. It is reassuring that there is a convergence between self-report data on arousal and valence and the pupil dilation response. Skin conductance is a notoriously challenging modality, so it is not too concerning that this was placed in the supplementary materials. Neural responses also occurred in logical regions with regard to reward learning.

      • Strong methodology with regards to neuroimaging analysis, and physiological measures.

      • The authors are very clear on documenting where there were discrepancies from their pre-registration and providing valid rationales for why.

      Major Weaknesses

      • The statistics showing that counterconditioning prevents differential spontaneous recovery are the weakest p values of the paper (and using one-tailed tests, although this is valid due to directions being pre-hypothesised). This may be due to a relatively small number of participants and some variability in responses. It is difficult to see how many people were included in the final PDR and neuroimaging analyses, with exclusions not clearly documented. Based on Figure 3, there are relatively small numbers in the PDR analyses (n=14 and n=12 in counterconditioning and extinction, respectively). Of these, each group had 4 people with differential PDR results in the opposing direction to the group mean. This perhaps warrants mention as the reported effects may not hold in a subgroup of individuals, which could have clinical implications.

    3. Reviewer #2 (Public review):

      Summary:

      The present study sets out to examine the impact of counterconditioning (CC) and extinction on conditioned threat responses in humans, particularly looking at neural mechanisms involved in threat memory suppression. By combining behavioral, physiological, and neuroimaging (fMRI) data, the authors aim to provide a clear picture of how CC might engage unique neural circuits and coding dynamics, potentially offering a more robust reduction in threat responses compared to traditional extinction.

      Strengths:

      One major strength of this work lies in its thoughtful and unique design - integrating subjective, physiological, and neuroimaging measures to capture the variouse aspects of counterconditioning (CC) in humans. Additionally, the study is centered on a well-motivated hypothesis and the findings have the potential to improve the current understanding of pathways associated with emotional and cognitive control.

      The data presentation is systematic, and the results on behavioral and physiological measures fit well with the hypothesized outcomes. The neuroimaging results also provide strong support for distinct neural mechanisms underlying CC versus extinction.

      Weaknesses:

      Overall, this study is a well-conducted and thought-provoking investigation into counterconditioning, with strong potential to advance our understanding of threat modulation mechanisms. Two main weaknesses concern the scope and decisions regarding analysis choices. First, while the findings are solid, the topic of counterconditioning is relatively niche and may have limited appeal to a broader audience. Expanding the discussion to connect counterconditioning more explicitly to widely studied frameworks in emotional regulation or cognitive control would enhance the paper's accessibility and relevance to a wider range of readers. This broader framing could also underscore the generalizability and broader significance of the results. In addition, detailed steps in the statistical procedures and analysis parameters seem to be missing. This makes it challenging for readers to interpret the results in light of potential limitations given the data modality and/or analysis choices.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wirz et al use neuroimaging (fMRI) to show that counterconditioning produces a longer lasting reduction in fear conditioning relative to extinction and appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These important findings are supported by convincing evidence and will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians.

      In large part, the authors achieved their aims of giving a qualitative assessment of the behavioural mechanisms of counterconditioning versus extinction, as well as investigating the brain mechanisms. The results support their conclusions and give interesting insights into the psychological and neurobiological mechanisms of the processes that underlie the unlearning, or counteracting, of threat conditioning.

      Strengths:

      * Mostly clearly written with interesting psychological insights<br /> * Excellent behavioural design, well-controlled and tests for a number of different psychological phenomena (e.g. extinction, recovery, reinstatement, etc).<br /> * Very interesting results regarding the neural mechanisms of each process.<br /> * Good acknowledgement of the limitations of the study.

      Weaknesses:

      * I think the acquisition data belongs in the main figure, so the reader can discern whether or not there are directional differences prior to CC and extinction training that could account for the differences observed. This is particularly important for the valence data which appears to differ at baseline (supplemental figure 2C).<br /> * I was confused in several sections about the chronology of what was done and when. For instance, it appears that individuals went through re-extinction, but this is just called extinction in places.<br /> * I was also confused about the data in Figure 3. It appears that the CC group maintained differential pupil dilation during CC, whereas extinction participants didn't, and the authors suggest that this is indicative of the anticipation of reward. Do reward-associated cues typically cause pupil dilation? Is this a general arousal response? If so, does this mean that the CSs become equally arousing over time for the CC group whereas the opposite occurs for the extinction group (i.e. Figure 3, bottom graphs)? It is then further confusing as to why the CC group lose differential responding on the spontaneous recovery test. I'm not sure this was adequately addressed.<br /> * I am not sure that the memories tested were truly episodic<br /> * Twice as many female participants than males<br /> * No explanation as to why shocks were varied in intensity and how (psuedo-randomly?)

    1. eLife Assessment

      This valuable study examines the activity and function of dorsomedial striatal neurons in the estimation of time. The authors examine striatal activity as a function of time as well as the impact of optogenetic striatal manipulation on the animal's ability to estimate a time interval, providing solid evidence for their claims. The study could be further strengthened with a more rigorous characterization of activity and a stronger connection between their proposed model and the experimental data. The work will be of interest to neuroscientists examining how striatum contributes to behavior.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nosepoke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Thus, this task requires animals to estimate if at least 6 seconds have passed after the first nosepoke. After verifying that animals estimate the passage of 6 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition.

      Major strengths:

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs. This paper contributes to that line of work by investigating whether D1 and D2 neurons have similar activity patterns during the timed interval, as might be expected based on prior work based on striatal manipulations. However, the authors find that D1 and D2 neurons have distinct activity patterns. They then provide a decision-making model that is consistent with all results. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used.

      Major weaknesses:

      The results are based on a relatively small dataset (tens of cells).

      Impact:

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process.

    3. Reviewer #2 (Public review):

      This study found that D1-MSNs and D2-MSNs have opposing dynamics during interval timing in a mouse-optimized interval timing task. Further optogenetic and pharmacologic inhibition of either D1 or D2 MSNs increased response time. This study provides useful experimental evidence in the coding of time in striatum. However, there are some major weaknesses in this study.

      (1) Regarding the data in Figure S3, The variance within each mouse was too big, the authors need to figure out and explain what caused the large variance within the same mouse, or the authors need to increase the sample size.<br /> (2) Regarding the results in Figure 3 C and D, Figure 6 H and Figure 7 D, what is the sample size? From the single data points in the figures, it seems that the authors were using the number of cells to do statistical tests and plot the figures. For example, Figure 3 C, if the authors use n= 32 D2 MSNs and n= 41D1 MSNs to do the statistical test, it could make small difference to be statistically significant. The authors should use the number of mice to do the statistical tests.<br /> (3) Regarding the results in Figure 5, what is the reason for the increase in the response times? The authors should plot the position track during intervals (0-6 s) with or without optogenetic or pharmacologic inhibition. The authors can check Figure 3, 5, and 6 in paper https://doi.org/10.1016/j.cell.2016.06.032 for reference to analyze the data.

    4. Reviewer #3 (Public review):

      Summary:

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using various causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions.

      Strengths:

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model.

      Weaknesses:

      The behavioral task used in this study is best suited for investigating elapsed time perception rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals. Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Thus, this task requires animals to estimate if at least 6 seconds have passed after the first nose poke. After verifying that animals estimate the passage of 6 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2MSNs increase activity, throughout this interval. They suggest that this activity follows a driftdiffusion model, in which activity increases (or decreases) to a threshold after which a decision is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition. 

      We appreciate the careful read by this reviewer. 

      Major strengths: 

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs. This paper contributes to that line of work by investigating whether D1 and D2 neurons have similar activity patterns during the timed interval, as might be expected based on prior work based on striatal manipulations. However, the authors find that D1 and D2 neurons have distinct activity patterns. They then provide a decision-making model that is consistent with all results. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used. 

      We are glad that our main points come clearly through.

      Major weaknesses: 

      One weakness to me is the impact of identifying whether D1 and D2 had similar or different activity patterns. Does observing increasing/decreasing activity in D2 versus D1, or different activity patterns in D1 and D2, support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? 

      This is a great point - we were not clear.  We observe distinct patterns of D2 and D1-MSN activity, but that disrupting either D2-MSNs or D1-MSNs led to increased response time.  The model that this supports is that D2-MSNs and D1-MSN ensemble activity represents temporal evidence.  This is a very specific model that can be rigorously tested in future work.  We have now made this very clear in the abstract (Page 2). 

      “We found that D2-MSNs and D1-MSNs exhibited distinct dynamics over temporal intervals as quantified by principal component analyses and trial-by-trial generalized linear models. MSN recordings helped construct and constrain a fourparameter drift-diffusion computational model in which MSN ensemble activity represented the accumulation of temporal evidence. This model predicted that disrupting either D2-MSNs or D1-MSNs would increase interval timing response times and alter MSN firing. In line with this prediction, we found that optogenetic inhibition or pharmacological disruption of either D2-MSNs or D1-MSNs increased interval timing response times.”

      And in the results on Page 18:  

      “Because both D2-MSNs and D1-MSNs accumulate temporal evidence, disrupting either MSN type in the model changed the slope. The results were obtained by simultaneously decreasing the drift rate D (equivalent to lengthening the neurons’ integration time constant) and lowering the level of network noise 𝝈: D = 𝟎. 𝟏𝟐𝟗, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D2-MSNs in Fig 4A (in red; changes in noise had to accompany changes in drift rate to preserve switch response time variance. See Methods); and 𝑫 = 𝟎. 𝟏𝟐𝟐, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D1-MSNs in Fig 4B (in blue). The model predicted that disrupting either D2-MSNs or D1-MSNs would increase switch response times (Fig 4C and Fig 4D) and would shift MSN dynamics.” 

      And in the discussion (Page 30): 

      “Striatal MSNs are critical for temporal control of action (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015). Three broad models have been proposed for how striatal MSN ensembles represent time: 1) the striatal beat frequency model, in which MSNs encode temporal information based on neuronal synchrony (Matell and Meck, 2004); 2) the distributed coding model, in which time is represented by the state of the network (Paton and Buonomano, 2018); and 3) the DDM, in which neuronal activity monotonically drifts toward a threshold after which responses are initiated (Emmons et al., 2017; Simen et al., 2011; Wang et al., 2018). While our data do not formally resolve these possibilities, our results show that D2-MSNs and D1MSNs exhibit opposing changes in firing rate dynamics in PC1 over the interval. Past work by our group and others has demonstrated that PC1 dynamics can scale over multiple intervals to represent time (Emmons et al., 2020, 2017; Gouvea et al., 2015; Mello et al., 2015; Wang et al., 2018). We find that low-parameter DDMs account for interval timing behavior with both intact and disrupted striatal D2- and D1-MSNs. While other models can capture interval timing behavior and account for MSN neuronal activity, our model does so parsimoniously with relatively few parameters (Matell and Meck, 2004; Paton and Buonomano, 2018; Simen et al., 2011). We and others have shown previously that ramping activity scales to multiple intervals, and DDMs can be readily adapted by changing the drift rate (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015; Simen et al., 2011). Interestingly, decoding performance was high early in the interval; indeed, animals may have been focused on this initial interval (Balci and Gallistel, 2006) in making temporal comparisons and deciding whether to switch response nosepokes.”

      Regarding the reviewer’s specific question – it is not clear why D1-MSNs and D2-MSNs have opposing patterns of activity, as integration of temporal evidence can certainly be achieved increasing or decreasing firing rates alone. These patterns have been seen in motor control. Prefrontal neurons, which control striatal ramping, also ramp up and down. We have now included a paragraph on Page 30 explicitly discussing these ideas; however, future experiments will be required to investigate the source of the divergent patterns of activity among D2-MSNs and D1-MSNs.   

      “D2-MSNs and D1-MSNs play complementary roles in movement. For instance, stimulating D1-MSNs facilitates movement, whereas stimulating D2-MSNs impairs movement (Kravitz et al., 2010). Both populations have been shown to have complementary patterns of activity during movements with MSNs firing at different phases of action initiation and selection (Tecuapetla et al., 2016). Further dissection of action selection programs reveals that opposing patterns of activation among D2MSNs and D1-MSNs suppress and guide actions, respectively, in the dorsolateral striatum (Cruz et al., 2022). A particular advantage of interval timing is that it captures a cognitive behavior within a single dimension — time. When projected along the temporal dimension, it was surprising that D2-MSNs and D1-MSNs had opposing patterns of activity. Ramping activity in the prefrontal cortex can increase or decrease; and prefrontal neurons project to and control striatal ramping activity (Emmons et al., 2020, 2017; Wang et al., 2018).  It is possible that differences in D2MSNs and D1-MSNs reflect differences in cortical ramping, which may themselves reflect more complex integrative or accumulatory processes. Further experiments are required to investigate these differences. Past pharmacological work from our group and others has shown that disrupting D2- or D1-MSNs slows timing (De Corte et al., 2019b; Drew et al., 2007, 2003; Stutt et al., 2024) and are in agreement with pharmacological and optogenetic results in this manuscript. Computational modeling predicted that disrupting either D2-MSNs or D1-MSNs increased selfreported estimates of time, which was supported by both optogenetic and pharmacological experiments.”

      I found the results presented in Figures 2 and 3 to be a little confusing or misleading. In Figure 2, the authors appear to claim that D1 neurons decrease their activity over the time interval while D2 neurons increase activity. The authors use this result to suggest that D1/D2 activity patterns are different. In Figure 3, a different analysis is done, and this time D2 neurons do not significantly increase their activity with time, conflicting with Figure 2. While in both figures, there is a significant difference between the mean slopes across the population, the secondary effect of positive/negative slope for D2/D1 neurons changes. I find this especially confusing as the authors refer back to the positive/negative slope for D2/D1 neurons result throughout the rest of the text.  

      We were not clear.  First, we attempted to quantify these differences based on PCA and slope.  We have rephrased our characterization of these differences by changing text on (Page 9) to: 

      “These PETHs revealed that for the 6-second interval immediately after trial start, many putative D2-MSN neurons appeared to ramp up while many putative D1-MSNs appeared to ramp down. For 32 putative D2-MSNs average PETH activity increased over the 6-second interval immediately after trial start, whereas for 41 putative D1-MSNs, average PETH activity decreased. Accordingly, D2-MSNs and D1-MSNs had differences in activity early in the interval (0-5 seconds; F = 4.5, p = 0.04 accounting for variance between mice) but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice). Examination of a longer interval of 10 seconds before to 18 seconds after trial start revealed the greatest separation in D2-MSN and D1-MSN dynamics during the 6-second interval after trial start (Fig S2). Strikingly, these data suggest that D2-MSNs and D1-MSNs might display distinct dynamics during interval timing.” 

      We have rephrased our discussion on PCA to quantify differences in Fig 2G-H using data-driven methods (Page 12): 

      “To quantify differences between D2-MSNs vs D1-MSNs in Fig 2G-H, we turned to principal component analysis (PCA), a data-driven tool to capture the diversity of neuronal activity (Kim et al., 2017a). Work by our group and others has uniformly identified PC1 as a linear component among corticostriatal neuronal ensembles during interval timing (Bruce et al., 2021; Emmons et al., 2020, 2019, 2017; Kim et al., 2017a; Narayanan et al., 2013; Narayanan and Laubach, 2009; Parker et al., 2014; Wang et al., 2018). We analyzed PCA calculated from all D2-MSN and D1MSN PETHs over the 6-second interval immediately after trial start. PCA identified time-dependent ramping activity as PC1 (Fig 3A), a key temporal signal that explained 54% of variance among tagged MSNs (Fig 3B; variance for PC1 p = 0.009 vs 46 (44-49)% for any pattern of PC1 variance derived from random data; Narayanan, 2016). Consistent with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1-MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And finally, we directly investigate the heart of the reviewer’s question by explicitly comparing PC1 scores – a data-driven analysis of neuronal patterns that explain the least variance – and show that they are less than 0 for D2-MSNs (i.e., negatively correlated with a down-ramping pattern, or ramping up), and greater than 0 for D1MSNs (i.e., positively correlated with an up-ramping pattern): 

      “Importantly, PC1 scores for D2-MSNs were significantly less than 0 (signrank D2MSN PC1 scores vs 0: p = 0.02), implying that because PC1 ramps down, D2-MSNs tended to ramp up. Conversely, PC1 scores for D1-MSNs were significantly greater than 0 (signrank D1-MSN PC1 scores vs 0: p = 0.05), implying that D1-MSNs tended to ramp down.  Thus, analysis of PC1 in Fig 3A-C suggested that D2-MSNs (Fig 2G) and D1-MSNs (Fig 2H) had opposing ramping dynamics.”

      We interpret these data on Page 16: 

      “Our analysis of average activity (Fig 2G-H) and PC1 (Fig 3A-C) suggested that D2MSNs and D1-MSNs might have opposing dynamics. However, past computational models of interval timing have relied on drift-diffusion dynamics that increases over the interval and accumulates evidence over time (Nguyen et al., 2020; Simen et al., 2011).”

      The reviewer mentions our analysis of ‘mean slopes across the population’ -which we clarify as trial-by-trial slope analysis, which is distinct from the population averages in 2G-H and 3A-C.  We have now made this clear (Page 12). 

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015).  Note that this analysis focuses on each trial rather than population averages in Fig 2G-H and Fig 3A-C.”

      Finally, as the reviewer suggests, we have removed the term ‘slope’ from the rest of the paper, as the increasing/decreasing comes from averages and analyses of PC1.  We have removed all discussion of ‘opposing’ slope or ‘increasing/decreasing’ slope. 

      It is a bit unclear to me how the authors chose the parameters for the model, and how well the model explains behavior is quantified. It seems that the authors didn't perform cross-validation across trials (i.e., they chose parameters that explained behavior across all trials combined, rather than choosing parameters from a subset of trials and determining whether those parameters are robust enough to explain behavior on held-out trials). I think this would increase the robustness of the result. 

      In addition, it remains a bit unclear to me how the authors changed the specific parameters they did to model the optogenetic manipulation. It seems these parameters were chosen because they fit the manipulation data. This makes me wonder if this model is flexible enough that there is almost always a set of parameters that would explain any experimental result; in other words, I'm not sure this model has high explanatory power. 

      We are glad the reviewer raised these points.  First, we have now included a complete exploration of the parameter space, exactly as the reviewer recommends.  These are described in the methods (Page 41): 

      “Selection of DDMs parameters. Our goal was to build DDMs with dynamics that produce “response times” according to the observed distribution of mice switch times. The selection of parameter values in Fig 4 was done in three steps. First, we fit the distribution of the mice behavioral data with a Gamma distribution and found its fitting values for shape 𝜶𝑴 and rate 𝜷𝑴 (Table S2 and Fig S8; R2 Data vs Gamma ≥ 𝟎. 𝟗𝟒). We recognized that the mean 𝝁𝑴 and the coefficient of variation 𝑪𝑽𝑴 are directly related to the shape and rate of the Gamma distribution by formulas 𝝁𝑴 \= 𝜶𝑴/𝜷𝑴 and 𝑪𝑽𝑴 \= 𝟏/√𝜶𝑴.  Next, we fixed parameters 𝑭 and 𝒃 in DDM (e.g., for D2-MSNs: 𝑭 = 𝟏, 𝒃 = 𝟎. 𝟓𝟐) and simulated the DDM for a range of values for 𝑫 and 𝝈. For each pair (𝑫, 𝝈), one computational “experiment” generated 500 response times with mean 𝝁 and coefficient of variation 𝑪𝑽. We repeated the “experiment” 10 times and took the group median of 𝝁 and 𝑪𝑽 to obtain the simulation-based statistical measures 𝝁𝑺 and 𝑪𝑽𝑺. Last, we plotted 𝑬𝝁 \= |(𝝁𝑺 − 𝝁𝑴)/𝝁𝑴| and 𝑬𝒄𝒗 \= |𝑪𝑽𝑺 − 𝑪𝑽𝑴|, the respective relative error and the absolute error to data (Fig S7). We considered that parameter values (𝑫, 𝝈) provided a good DDM fit of mice behavioral data whenever  𝑬𝝁 ≤ 𝟎. 𝟎𝟓    and 𝑬𝒄𝒗

      And included a new Fig S7 which shows the parameter space: 

      These new data clearly comment on the parameter space of our model. 

      Finally, the reviewer mentions cross-validation.  We did this at length on our model and data fits.  We used 10-fold cross-validation as fitlm needs enough data for the individual fits.  We found that the fit was extremely stable – i.e, we ended up with standard deviations in R2<0.004 for all comparisons.  Thus, we added the following point to the methods on Page 41:  

      “10-fold cross-validation revealed highly stable fits between gamma, models and data.”

      Lastly, the results are based on a relatively small dataset (tens of cells). 

      This is an important point.  Although it is a small optogenetically-tagged dataset, we have adequate statistical power and large effect sizes, which we now detail in the text on Page 12:

      “Consistent with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And:  

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      And we have included the reviewers point as a limitation on Page 33:  

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      Impact: 

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding -- that D1 and D2 activity is distinct across time -- remains somewhat ambiguous to me. 

      Again, we are glad that the reviewer appreciated our main point, and we very much appreciate the additional points about interpretation, model parameters, and statistical power. If there is any way we can clarify the text further we are happy to do so.  

      Reviewer #2 (Public Review):  

      (1) Regarding the results in Figure 2 and Figure 5: for the heatmaps in Fig.2F and Fig.2E, the overall activity pattern of D1 and D2 MSNs looks very similar, both D1 and D2 MSNs contains neurons showing decreasing or increasing activity during interval timing. And the optogenetic and pharmacologic inhibition of either D1 or D2 MSNs resulted in similar behavior outcomes. To me, the D1 and D2 MSN activities were more complementary than opposing. 

      This is a great point. In our last revision, R3 suggested that complementary means opposing – and suggested we change the title to reflect this.  Our original title was ‘Complementary cognitive roles for D2-MSNs and D1-MSNs during interval timing’ – and we have changed the title back to this. We have clarified what we meant by complementary in the abstract (Page 2):

      “Together, our findings demonstrate that D2-MSNs and D1-MSNs had opposing dynamics yet played complementary cognitive roles, implying that striatal direct and indirect pathways work together to shape temporal control of action.”

      And on Page 30: 

      “These data, when combined with our model predictions, demonstrate that despite opposing dynamics,  D2-MSNs and D1-MSN contribute complementary temporal evidence to controlling actions in time.”

      If the authors want to emphasize the opposing side of D1 and D2 MSNs, then the manipulation experiments need to be re-designed, since the average activity of D2 MSNs increased, while D1 MSNs decreased during interval timing, instead of using inhibitory manipulations in both pathways, the authors should use inhibitory manipulation in D2-MSNs, while using optogenetic or pharmacology to activate D1-MSNs. In this way, the authors can demonstrate the opposing role of D1 and D2 MSNs and the functions of increased activity in D2-MSNs and decreased activity in D1-MSNs. 

      These are great ideas, which we agree with.  We would like to emphasize the complementary nature as noted in our original title, and not the opposing side of D1/D2 MSNs. The experiments proposed by reviewer are certainly worth doing, but would likely be quite complex to find the right stimulation parameters to affect timing without affecting movement – and we have now included them as an important limitation / future direction (Page 33):

      “Fifth, we did not deliver stimulation to the striatum because our pilot experiments triggered movement artifacts or task-specific dyskinesias (Kravitz et al., 2010). Future stimulation approaches carefully titrated to striatal physiology may affect interval timing without affecting movement.”

      (2) Regarding the results in Figure 3 C and D, Figure 6 H and Figure 7 D, what is the sample size? From the single data points in the figures, it seems that the authors were using the number of cells to do statistical tests and plot the figures. For example, Figure 3 C, if the authors use n= 32 D2 MSNs and n= 41D1 MSNs to do the statistical test, it could make a small difference to be statistically significant. The authors should use the number of mice to do the statistical tests. 

      These are important points that were discussed at length in the prior review.  First, for the sample size, we now have detailed in our Table 1: 

      Second, we have detailed our statistical approach which explicitly deals with repeated observations of neurons across mice (Page 43):

      “Statistics. All data and statistical approaches were reviewed by the Biostatistics, Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB. For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows us to account for inherent betweenmouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”   

      We have formally reviewed this approach with professional biostatisticians at the University of Iowa.

      Finally, we note that we do have adequate statistical power for analysis of Fig 3C and D:  we have adequate statistical power and large effect sizes, which we now detail in the text on Page 12:

      “Consistent with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And, on Page 12:  

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      And we have included the reviewers point as a limitation on Page 33: 

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      (3) Regarding the results in Figure 5, wly at is the reason for the increase in the response times? The authors should plot the position track during intervals (0-6 s) with or without optogenetic or pharmacologic inhibition. The authors can check Figures 3, 5, and 6 in the paper https://doi.org/10.1016/j.cell.2016.06.032 for reference to analyze the data. 

      These are key points, and we are glad the reviewer raised them.  Our interpretation is that response time increases – without reliable changes in other task-specific movements such as nosepoke reaction time or traversal time (Fig S9).  This was lacking in our prior manuscript, and we are glad the reviewer raised it.  We have now added this to Page 30

      “Our interpretation is that because the activity of D2-MSN and D1-MSN ensembles represents the accumulation evidence, pharmacological/optogenetic disruption of D2-MSN/D1-MSN activity slows this accumulation process, leading to slower interval timing-response times (Fig 5) without changing other task-specific movements (Fig S9).  These results provide new insight into how opposing patterns of striatal MSN activity control behavior in similar ways and show that they play a complementary role in elementary cognitive operations.”

      Regarding the tracking of velocity, we unfortunately do not have this information reliably across all conditions. This citation is a beautiful landmark paper, and we are working on collecting this information in our new datasets going forward.  We have included this as a major limitation (Page 34): 

      “Still, future work combining motion tracking/accelerometry with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023; Tecuapetla et al., 2016).”

      Once again, we are appreciative of the thoughtful points raised by this reviewer.  

      Reviewer #3 (Public Review): 

      Summary: 

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using various causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions. 

      Strengths: 

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model. 

      We very much appreciate the considered read and comments by the reviewer, and recognition of the breadth of techniques in this manuscript. 

      Weaknesses: 

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals. In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs. Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum. 

      These are important points.  We agree with them completely and have now included responses to them.  First, bisection tasks certainly have advantages – we have justified our approach in the discussion (Page 32):

      “Our task version has been used extensively to study interval timing in mice and humans (Balci et al., 2008; Bruce et al., 2021; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). However, temporal bisection tasks, in which animals hold during a temporal cue and respond at different locations depending on cue length, have advantages in studying how animals time an interval because animals are not moving while estimating cue duration (Paton and Buonomano, 2018; Robbe, 2023; Soares et al., 2016). Our interval timing task version – in which mice switch between two response nosepokes to indicate their interval estimate has elapsed – has been used extensively in rodent models of neurodegenerative disease (Larson et al., 2022; Weber et al., 2024, 2023; Zhang et al., 2021), as well as in humans (Stutt et al., 2024). This version of interval timing involves motor timing, which engages executive function and has more translational relevance for human diseases than perceptual timing or bisection tasks (Brown, 2006; Farajzadeh and Sanayei, 2024; Nombela et al., 2016; Singh et al., 2021).  Furthermore, because many therapeutics targeting dopamine receptors are used clinically, these findings help describe how dopaminergic drugs might affect cognitive function and dysfunction. Future studies of D2-MSNs and D1-MSNs in temporal bisection and other timing tasks may further clarify the relative roles of D2- and D1-MSNs in interval timing and time estimation.”

      Second – we have included an explicit control that has the same laser that is on for the same epoch as in the experimental animal – and find no effects.  This is now detailed in the methods: (Page 37): 

      “To control for heating and nonspecific effects of optogenetics, we performed control experiments in mice without opsins using identical laser parameters in D2-cre or D1-cre mice (Fig S6).”

      And in the results (Page 21): 

      “To control for heating and nonspecific effects of optogenetics, we performed control experiments in D2-cre mice without opsins using identical laser parameters; we found no reliable effects for opsin-negative controls (Fig S6).”

      And on Page 21:

      “As with D2-MSNs, we found no reliable effects with opsin-negative controls in D1MSNs (Fig S6).”

      We have now detailed these results in Figure S6:

      Regarding focal pharmacology, we performed this experiment with focal infusion of D1/D2 antagonists in our prior work, which we have now cited (Page 4):

      “Similar behavioral effects were found with systemic (Stutt et al., 2024) or focal infusion of D2 or D1 antagonists locally within the dorsomedial striatum (De Corte et al., 2019a).”

      Comments on revised version: 

      Thank you for the comprehensive revisions. Most of my (addressable) concerns were addressed. The current version of your manuscript appears significantly improved. 

      Once again, we appreciate the reviewer’s constructive and insightful comments and careful review of our manuscript.  Their comments have been extremely helpful.

    1. eLife Assessment

      This important study presents interesting results aimed at explaining the effects of a human mutation on the mitochondrial import protein TIMM50 on mitochondrial function and neuronal excitability. While the evidence supporting the conclusions is convincing, the mechanisms driving changes in the levels of certain proteins within and outside the mitochondria (such as certain ion channels) remain unexplained. This paper will be of interest to scientists in the mitochondria field.

    2. Reviewer #1 (Public review):

      Mitochondria are essential organelles consisting in mammalian cells of about 1500 different proteins. Most of those are synthesized in the cytosol as precursor proteins, imported into mitochondria, and sorted into one of the four sub-mitochondrial compartments. The TIM23 complex, which is embedded in the mitochondrial inner membrane, facilitates the import of proteins that harbor Mitochondrial Targeting Sequence (MTS) at their N-terminus. Such proteins are sorted mainly to the mitochondrial matrix while some sub-groups are destined also to the inner membrane or the intermembrane space. TIMM50 (Tim50 in yeast) is an essential component of the TIM23 complex and mutations in this protein were reported to cause several diseases.

      Summary:

      In the current study, the authors analyzed the impact of TIMM50 mutations on the mitochondrial proteome in both patients' cells and mouse neurons. They provide compelling evidence for several surprising and highly interesting observations: (i) TIMM50 mutations affect the steady-state levels of only a portion of the putative TIMM50 substrates, (ii) such mutations result in increased electrical activity in mice neurons and in reduced levels of some potassium ion channels in the plasma membrane. These findings shed new light on mitochondrial biogenesis in mammalian cells and hint at an unexpected link between mitochondria and ion channels at the plasma membrane.

      Strengths:

      The authors used both cells from patients and neurons from mice to investigate the impact of mutations in TIMM50 on mitochondrial proteome and function.

      Comments on revisions:

      The authors addressed all my concerns regarding the original submission.

    3. Reviewer #2 (Public review):

      Summary:

      Mitochondria import hundreds of precursor proteins from the cytosol. The TOM and TIM23 complexes facilitate the import on the matrix-targeting pathway of mitochondria. In yeast, Tim50 is a critical and essential subunit of the TIM23 complex that mediates the transition of precursors from the outer to the inner membrane. The human Tim50 homolog TIMM50 is highly similar in structure and a comparable function of Tim50 and TIMM50 was proven by several biochemical and genetic studies in the past.

      In this study, the authors characterize human cells which express lower levels or mutated versions of TIMM50. They found that in these TIMM50-depletion cells, the levels of other TIM23 core subunits are also diminished but many mitochondrial proteins are unaffected. Moreover, they observed alterations in the electrical activity and the levels of potassium channels in neuronal cells of TIMM50-deficient mice. They propose that these changes explain the pathology of patients who often suffer from epilepsy.

      Strengths:

      The paper is written by experts in the field, and it is very clear. The experiments are of high quality and sufficiently well-controlled. The study is interesting for a broad readership.

      Weaknesses:

      The authors show that even upon low levels of Tim50, mitochondrial proteins are not considerably depleted. However, it remains somewhat unclear why this is. TIMM50 and the TIM23 complex might not be rate-limiting for the biogenesis of mitochondrial proteins. Alternatively, the import defect is compensated indirectly, for example by a reduced growth of cells. It will be interesting to study the physiological consequences of TIMM50-depletion in more depth in the future.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editor for their positive view and constructive valuable comments on the manuscript.  Following we address the suggestions of the reviewers.

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed.

      These results are now displayed (Fig. S3B and C) and discussed in the revised manuscript.

      Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (2) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (3) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 1.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract –

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction –

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion –

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Lines 25-26: The authors write "Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its substrates". Since the current data challenges the definition of some proteins as substrates of TIMM50, I suggest using the term "putative substrates".

      Changed as suggested

      (2) Line 27: It is not clear whether the wording "general import role of TIM23" it refers to the TIM23 protein or the TIM23 complex. This should be clarified.

      Clarified. It now states "TIM23 complex".

      (3) Line 72: should be "and plays".

      Changed as suggested.

      (4) It will be helpful to include in Figure 1 a small scheme of TIMM50 and to indicate in which domain the T252M mutation is located.

      We predicted the AlphaFold human TIMM50 structure and indicated the mutation site and the different TIMM50 domains. The structure is included in Fig. 1A.

      (5) I suggest labelling the "Y" axis in Fig. 1B as "Protein level (% of control)".

      Changed as suggested in Fig. 1C (previously Fig. 1B) and in Fig. 2C.

      (6) Line 179: since the authors tested here only about 10 mitochondrial proteins (out of 1500), I think that the word "many" should be replaced by "several representative" resulting in "steady state levels of several representative mitochondrial proteins".

      Changed as requested.

      (7) Line 208: correct typo.

      Typo was corrected.

      (8) Figure 4 is partially redundant as its data is part of Figure 3. The authors can consider combining these two figures. Accordingly, large parts of the legend of Figure 4 are repeating information in the legend to Figure 3 and can refer to it.

      We revamped Figures 3 and 4. Figure 3 now shows the analysis of fibroblasts proteomics while Figure 4 focuses on neurons proteomics. We also modified the legend of Figure 4.

      Reviewer #2 (Recommendations For The Authors):

      (1) Abstract: 'Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its substrates, challenging the currently accepted import dogma of the essential general import role of TIM23 and suggesting that fully functioning TIM23 complex is not essential for maintaining the steady state level of the majority of mitochondrial proteins'. This sentence needs to be rephrased. The data do not challenge any dogma! The authors only show that lower levels of functional TIM23 are sufficient.

      We have rewritten all the relevant sentences as suggested (details are also mentioned in response to reviewer 2 public review point 1)

      (2) Introduction: 'Surprisingly, functional and physiological analysis points to the possibility that TIMM50 and a fully functional TIM23 complex are not essential for maintaining steady-state levels of most presequence-containing proteins'. This again needs to be rephrased.

      Rewritten as suggested (details mentioned in response to reviewer 2 public review point 1)

      (3) Discussion: 'In summary, our results challenge the main dogma that TIMM50 is essential for maintaining the mitochondrial matrix and inner membrane proteome, as steady state level of most mitochondrial matrix and inner membrane proteins did not change in either patient fibroblasts or mouse neurons following a significant decrease in TIMM50 levels.' This again needs to be rephrased.

      Rewritten as suggested (details mentioned in response to reviewer 2 public review point 1)

      (4) The analysis of the proteomics experiment should be improved. The authors show in Figures 3 and 4 several times the same volcano plots in which different groups of proteins are indicated. It would be good to add (a) a principal component analysis to show that the replicates from the mutant samples are consistently different from the controls, (b) a correlation plot that compares the log-fold-change of P1 to that of P2 to show which of the proteins are consistently changed in P1 and P2 and (c) a GO term analysis to show in an unbiased way whether mitochondrial proteins are particular affected upon TIMM50 depletion.

      Figures 3 and 4 have been changed to avoid redundancy. Figure 3 now focuses on fibroblasts proteomics (with additional analysis), while Figure 4 focuses on neurons proteomics. PCA analysis was added in Fig S1, showing that the proteomics replicates of both patients (P1 and P2) are consistently different than the healthy control (HC) replicates. Correlation plots were added in Figure 3C and D, showing high correlation of the downregulated and upregulated mitochondrial proteins between P1 and P2. These plots further highlight that MIM proteins are more affected than matrix proteins and that the OXPHOS and MRP systems comprise the majority of significantly downregulated proteins in both patients. GO term analysis was performed for all the detected proteins that got significantly downregulated in both patients. The GO term analysis is displayed in Figure S3A, and shows that mitochondrial proteins, mainly of the OXPHOS and MRP machineries, are particularly affected.

      (5) Figure 1. The figure shows the levels of TIM and TOM subunits in two mutant samples. The quantifications suggest that the levels of TIMM21, TOMM40, and mtHsp60 are not affected. However, from the figure, it seems that there are increased levels of TIMM21 and reduced levels of TOMM40 and mtHsp60. Unfortunately, in the figure most of the signals are overexposed. Since this is a central element of the study, it would be good to load dilutions of the samples to make sure that the signals are indeed in the linear range and do scale with the amounts of samples loaded.

      The representative WB panels display the Actin loading control of the representative TIMM50 repeat (the top panel). However, each protein was tested separately, at least three times, and was normalized to its own Actin loading control.

      (6) Figure 2B. All panels are shown in color except the panel for TIMM17B which is grayscale. This should be changed to make them look equal.

      All the western blot panels were changed to grayscale.

      (7) Discussion: 'Despite being involved in the import of the majority of the mitochondrial proteome, no study thus far characterized the effects of TIMM50 deficiency on the entire mitochondrial proteome.' This sentence is not correct as proteomic data were published previously, for example for Trypanosomes (PMID: 34517757) and human cells (PMID: 38828998).

      We have corrected the statement to “Despite being involved in the import of the majority of the mitochondrial proteome, little is known about the effects of TIMM50 deficiency on the entire mitochondrial proteome.”

      (8) A recent study on a very similar topic was published by Diana Stojanovki's group that needs to be cited: PMID: 38828998. The results of this comprehensive study also need to be discussed!!!

      We have added the following in the discussion:

      Line 362 – “These observations are similar to the recent analysis of patient-derived fibroblasts which demonstrated that TIMM50 mutations lead to severe deficiency in the level of TIMM50 protein (6,7). Notably, this decrease in TIMM50 was accompanied with a decrease in the level of other two core subunits, TIMM23 and TIMM17. However, unexpectedly, proteomics analysis in our study and that conducted by Crameri et al., 2024 indicate that steady state levels of most TIM23-dependent proteins are not affected despite a drastic decrease in the levels of the TIM23CORE complex (7). The most affected proteins constitute of intricate complexes, such as OXPHOS and MRP machineries. Thus, both these studies indicate a surprising possibility that even reduced levels of the TIM23CORE components are sufficient for maintaining the steady state levels of most presequence containing substrates.

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72.

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A. 2007;104(36):14330–5.

      (3) Ting SY, Schilke BA, Hayashi M, Craig EA. Architecture of the TIM23 inner mitochondrial translocon and interactions with the matrix import motor. J Biol Chem [Internet]. 2014;289(41):28689–96. Available from: http://dx.doi.org/10.1074/jbc.M114.588152

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from: https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

      (6) Reyes A, Melchionda L, Burlina A, Robinson AJ, Ghezzi D, Zeviani M.  Mutations in TIMM50 compromise cell survival in OxPhos‐dependent metabolic conditions . EMBO Mol Med. 2018;

      (7) Crameri JJ, Palmer CS, Stait T, Jackson TD, Lynch M, Sinclair A, et al. Reduced Protein Import via TIM23 SORT Drives Disease Pathology in TIMM50-Associated Mitochondrial Disease. Mol Cell Biol [Internet]. 2024;0(0):1–19. Available from: https://doi.org/10.1080/10985549.2024.2353652

    1. eLife Assessment

      This important study advances our understanding of how FGF13 variants confer seizure susceptibility. By acting in a set of inhibitory interneurons, FGF13 regulates synaptic transmission and excitability. The data presented here are convincing and combine cell type-specific knockouts and electrophysiology, complemented by histology/RNA studies. Collectively, this research will be of interest to a wide audience, particularly those involved in the study of epilepsy, inhibitory neurons, and ion channels.

    2. Reviewer #2 (Public review):

      Summary

      The authors address three primary questions:<br /> (1) how FGF13 variants confer seizure susceptibility,<br /> (2) the specific cell types involved, and<br /> (3) the underlying mechanisms, particularly regarding Nav dysfunction.

      They use different Cre drivers to generate cell type-specific knockouts (KOs). First, using Nestin-Cre to create a whole-brain Fgf13 KO, they observed spontaneous seizures and premature death. While KO of Fgf13 in excitatory neurons does not lead to spontaneous seizures, KO in inhibitory neurons recapitulates the seizures and premature death observed in the Nestin-Cre KO. They further narrow down the critical cell type to MGE-derived interneurons (INs), demonstrating that MGE-neuron-specific KO partially reproduces the observed phenotypes. "All interneuron" KOs exhibit deficits in synaptic transmission and interneuron excitability, not seen in excitatory neuron-specific KOs. Finally, they rescue the defects in the interneuron-specific KO by expressing specific Fgf13 isoforms. This is an elegant and important study adding to our knowledge of mechanisms that contribute to seizures.

      Strengths<br /> • The study provides much-needed cell type-specific KO models.<br /> • The authors use appropriate Cre lines and characterize the phenotypes of the different KOs.<br /> • The metabolomic analysis complements the rest of the data effectively.<br /> • The study confirms and extends previous research using improved approaches (KO lines vs. in vitro KD or antibody infusion).<br /> • The methods and analyses are robust and well-executed.

      Weaknesses

      • One weakness lies in the use of the Nkx2.1 line (instead of Nkx2.1CreER) in the paper. As a result, some answers to key questions are incomplete. For instance, it remains unclear whether the observed effects are due to Chandelier cells or NGFCs, potentially both MGE and CGE derived, explaining why Nkx2.1 alone does not fully replicate the overall inhibitory KO. Using Nkx2.1CreER could have helped address the cell specificity. With the Nkx2.1 line used in the paper, the answer is partial.<br /> • While the mechanism behind the reduced inhibitory drive in the IN-specific KO is suggested to be presynaptic, the chosen method does not allow them to exactly identify the mechanisms (spontaneous vs mEPSC/mIPSC), and whether it is a loss of inhibitory synapses (potentially axo-axonic) or release probability.

      General Assessment

      The general conclusions of this paper are supported by data. As it is, the claim that "these results enhance our understanding of the molecular mechanisms that drive the pathogenesis of Fgf13-related seizures" is partially supported. A more cautious term may be more appropriate, as the study shows the mechanism is not Nav-mediated and suggests alternative mechanisms without unambiguously identifying them. The conclusion that the findings "expand our understanding of FGF13 functions in different neuron subsets" is supported, although somewhat overstated, as the work is not conclusive about the exact neuron subtypes. However, it does indeed show differential functions for specific neuronal classes, which is a significant result.

      Impact and Utility

      This paper is undoubtedly valuable. Understanding that excitatory neurons are not the primary contributors to the observed phenotypes is crucial. The finding that the effects are not MGE-unique is also important. This work provides a solid foundation for further research and will be a useful resource for future studies.

    3. Reviewer #3 (Public review):

      Summary:

      The authors aimed to determine the mechanism by which seizures emerge in Developmental and Epileptic Encephalopathies caused by variants in the gene FGF13. Loss of FGF13 in excitatory neurons had no effect on seizure phenotype as compared to loss of FGF13 in GABAergic interneurons, which in contrast caused a dramatic proseizure phenotype and early death in these animals. They were able to show that Fgf13 ablation and consequent loss of FGF13-S and FGF13-VY reduced overall inhibitory input from Fgf13-expressing interneurons onto hippocampal pyramidal neurons. This was shown to occur not via disruption to voltage gated sodium channels but rather by reducing potassium currents and action potential repolarisation in these interneurons.

      Strengths:

      The authors employed multiple well validated, novel mouse lines with FGF13 knocked out in specific cell types including all neurons, all excitatory cells, all GABAergic interneurons, or a subset of MGE-derived interneurons, including axo-axonic chandelier cells. The phenotypes of each of these four mouse lines were carefully characterised to reveal clear differences with the most fundamental being that Interneuron-targeted deletion of FGF13 led to perinatal mortality associated with extensive seizures and impaired the hippocampal inhibitory/excitatory balance while deletion of FGF13 in excitatory neurons caused no detectable seizures and no survival deficits.<br /> The authors made excellent use of western blotting and in situ hybridisation of the different FGF13 isoforms to determine which isoforms are expressed in which cell types, with FGF3-S predominantly in excitatory neurons and FGF13-VY and FGF13-V predominantly in GABAergic neurons.

      The authors performed highly detailed electrophysiological analysis of excitatory neurons and GABAergic interneurons with FGF13 deficits using whole-cell patch clamp. This enabled them to show that FGF13 removal did not affect voltage-gated sodium channels in interneurons, but rather reduced the action of potassium channels, with the resultant effect of making it more likely that interneurons enter depolarisation block. These findings were strengthened by the demonstration that viral re-expression of different Fgf13 splice isoforms could partially rescue deficits in interneuron action potential output and restore K+ channel current size.

      Additionally, the discussion was nuanced, and demonstrated how the current findings resolved previous apparent contradictions in the field involving the function of FGF13.

      These findings will have a significant impact on our understanding of how FGF13 causes seizures and death in DEEs, and the action of different FGF13 isoforms within different neuronal cell types, particularly GABAergic interneurons.

      Comments on revisions:

      I appreciate the author's responses to the previous round of reviews. All my comments have been addressed. Congratulations on an excellent body of work.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      A subset of fibroblast growth factor (FGF) proteins (FGF11-FGF14; often referred to as fibroblast growth factor homologous factors because they are not thought to be secreted and do not seem to act as growth factors) have been implicated in modulating neuronal excitability, however, the exact mechanisms are unclear. In part, this is because it is unclear how different FGF isoforms alter ion channel activity in different neuronal populations. In this study, the authors explore the role of FGF 13 in epilepsy using a variety of FGF13 knock-out mouse models, including several targeted cell-type specific conditional knockout mouse lines. The study is intriguing as it indicates that FGF13 plays an especially important role in inhibitory neurons. Furthermore, although FGF13 has been studied as a regulator of neuronal voltage-gated sodium channels, the authors present data indicating that FGF13 knockout in inhibitory neurons induces seizures not by altering sodium current properties but by reducing voltage-gated potassium currents in inhibitory neurons. While intriguing, the data are incomplete in several aspects and thus the mechanisms by which various FGF13 variants induce Developmental and Epileptic Encephalopathies are not resolved by the data presented. 

      Strengths: 

      A major strength is the array of techniques used to assess the mice and the electrical activity of the neurons. 

      The multiple mouse knock-out models utilized are a strength, clearly demonstrating that FGF13 expression in inhibitory neurons, and possibly specific sub-populations of inhibitory neurons, is critically important. 

      The data on the increased sensitivity to febrile seizures in KO mice are very nice, provide clear evidence for regulation of excitability in inhibitory neurons by FGF13. 

      The Gad2Fgf13-KO mice indicated that several Fgf13 splice variants may be expressed in inhibitory neurons and suggest that the Fgf13-VY splice variants may have previously unrecognized specific roles in regulating neuronal excitability. 

      The data on males and females from the various KO mice lines indicates a clear gene dosage effect for this X-linked gene. 

      The unbiased metabolomic analysis supports the assertion that Fgf13 expression in inhibitory neurons is important in regulating seizure susceptibility. 

      Weaknesses: 

      The knockout approach can be powerful but also has distinct limitations. Multiple missense mutations in FGF13-S have been identified. The knockout models employed here are not appropriate for understanding how these missense variants lead to altered neuronal excitability. While the data show that complete loss of Fgf13 from excitatory forebrain neurons is not sufficient to induce seizure susceptibility, it does not rule out that specific variants (e.g., R11C) might alter the excitability of forebrain neurons. The missense variants may alter excitatory and/or inhibitory neuron excitability in distinct ways from a full FGF13 knockout. 

      We agree with this overall interpretation of our data and have updated our language in the Discussion to make the distinction between mechanisms attributable to a knockout compared to a missense variant. We note, however, that the proposed mechanism by which missense variants (e.g., R11C) drive seizures is through loss of long-term inactivation in excitatory neurons and our excitatory knockout model shows loss of long-term inactivation in excitatory neurons. Thus, our knockout model demonstrates that the mechanism(s) by which the missense variants alter neuronal excitability in excitatory neurons must exclude long-term inactivation, thereby providing some clarity regarding the proposed mechanism for those missense variants.

      The electrophysiological experiments are intriguing but not comprehensive enough to support all of the conclusions regarding how FGF13 modulates neuronal excitability. 

      We agree and have updated the language in our Discussion to clarify speculation from conclusions that are directly supported by data.

      Another concern is the use of different ages of neurons for different experiments. For example, sodium currents in Figures 2 and 5 (and Supplemental Figures 2 and 7) are recorded from cultured neurons, which may have very different properties (including changes in sodium channel complexes) from neurons in vivo that drive the development of seizure activity. 

      We agree and acknowledge the important differences between neurons examined in culture and in vivo, yet the in vitro vs in vivo preparations were necessitated by the specific experiments. While these differences are important, previous gene profiling studies comparing primary hippocampal neurons with developing mouse hippocampus have found that although gene expression is accelerated in vitro, gene expression profiles in vitro and in vivo are similar (PMID: 11438693). Moreover, the relative immaturity of the cultured neurons is balanced at least in part because the in vivo experiments were performed on very young animals (~P12), which also have relatively immature neurons. Thus, we predict that sodium channel complexes studied in vitro are informative for the in vivo aspects of this investigation.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors address three primary questions: 

      (1) how FGF13 variants confer seizure susceptibility, 

      (2) the specific cell types involved, and 

      (3) the underlying mechanisms, particularly regarding Nav dysfunction. 

      They use different Cre drivers to generate cell type-specific knockouts (KOs). First, using Nestin-Cre to create a whole-brain Fgf13 KO, they observed spontaneous seizures and premature death. While KO of Fgf13 in excitatory neurons does not lead to spontaneous seizures, KO in inhibitory neurons recapitulates the seizures and premature death observed in the Nestin-Cre KO. They further narrow down the critical cell type to MGE-derived interneurons (INs), demonstrating that MGE-neuron-specific KO partially reproduces the observed phenotypes. "All interneuron" KOs exhibit deficits in synaptic transmission and interneuron excitability, not seen in excitatory neuron-specific KOs. Finally, they rescue the defects in the interneuron-specific KO by expressing specific Fgf13 isoforms. This is an elegant and important study adding to our knowledge of mechanisms that contribute to seizures. 

      Strengths 

      • The study provides much-needed cell type-specific KO models. 

      • The authors use appropriate Cre lines and characterize the phenotypes of the different KOs. 

      • The metabolomic analysis complements the rest of the data effectively. 

      • The study confirms and extends previous research using improved approaches (KO lines vs. in vitro KD or antibody infusion). 

      • The methods and analyses are robust and well-executed. 

      Weaknesses 

      • One weakness lies in the use of the Nkx2.1 line (instead of Nkx2.1CreER) in the paper. As a result, some answers to key questions are incomplete. For instance, it remains unclear whether the observed effects are due to Chandelier cells or NGFCs, potentially both MGE and CGE derived, explaining why Nkx2.1 alone does not fully replicate the overall inhibitory KO. Using Nkx2.1CreER could have helped address the cell specificity. With the Nkx2.1 line used in the paper, the answer is partial. 

      We agree that while our data is consistent with the possibility of a role for Fgf13 in chandelier function, the current Cre driver does not provide sufficient direct evidence. We performed preliminary experiments (unpublished) using a Nkx2.1CreER driver, with late embryonic induction with a tamoxifen dosage validated for sparse labeling of chandelier cells (30846310). While we successfully replicated sparse labeling of neocortical chandelier cells (using a Cre-dependent Ai9 reporter), we were unable to determine if there was a significant loss of FGF13 as measured by immunohistochemistry since FGF13+ cells are only a small subset of the already sparse cells. Because multiple snRNA-seq studies identified Fgf13 as a marker for chandelier cells, we speculated—now more carefully circumspect—about the role of chandelier cells vs NGFCs.

      • While the mechanism behind the reduced inhibitory drive in the IN-specific KO is suggested to be presynaptic, the chosen method does not allow them to exactly identify the mechanisms (spontaneous vs mEPSC/mIPSC), and whether it is a loss of inhibitory synapses (potentially axo-axonic) or release probability. 

      We agree that this is an important limitation of our work, and that we are unable to identify the exact mechanism behind the reduced inhibitory drive. We are continuing to explore this question in a follow-up study.

      • Some supporting data (e.g. Supplemental Figure 7 and 8) appear to come from only one (or two) WT and one (or two) KO mice. Supplementary data, like main data, should come from at least three mice in total to be considered complete/solid (even if the statistical analysis is done with cells). 

      All panels in the manuscript, including supplementary data, except supplementary 7D and 8A, have N(mouse)≥3. Time limitations (graduating student) prevented us from obtaining a larger N. Because those supplementary data are not critical for supporting our conclusions, we removed them.

      General Assessment 

      The general conclusions of this paper are supported by data. As it is, the claim that "these results enhance our understanding of the molecular mechanisms that drive the pathogenesis of Fgf13-related seizures" is partially supported. A more cautious term may be more appropriate, as the study shows the mechanism is not Nav-mediated and suggests alternative mechanisms without unambiguously identifying them. The conclusion that the findings "expand our understanding of FGF13 functions in different neuron subsets" is supported, although somewhat overstated, as the work is not conclusive about the exact neuron subtypes. However, it does indeed show differential functions for specific neuronal classes, which is a significant result. 

      Impact and Utility 

      This paper is undoubtedly valuable. Understanding that excitatory neurons are not the primary contributors to the observed phenotypes is crucial. The finding that the effects are not MGE-unique is also important. This work provides a solid foundation for further research and will be a useful resource for future studies. 

      Reviewer #3 (Public Review): 

      Summary: 

      The authors aimed to determine the mechanism by which seizures emerge in Developmental and Epileptic Encephalopathies caused by variants in the gene FGF13. Loss of FGF13 in excitatory neurons had no effect on seizure phenotype as compared to the loss of FGF13 in GABAergic interneurons, which in contrast caused a dramatic proseizure phenotype and early death in these animals. They were able to show that Fgf13 ablation and consequent loss of FGF13-S and FGF13-VY reduced overall inhibitory input from Fgf13-expressing interneurons onto hippocampal pyramidal neurons. This was shown to occur not via disruption to voltage-gated sodium channels but rather by reducing potassium currents and action potential repolarisation in these interneurons. 

      Strengths: 

      The authors employed multiple well-validated, novel mouse lines with FGF13 knocked out in specific cell types including all neurons, all excitatory cells, all GABAergic interneurons, or a subset of MGE-derived interneurons, including axo-axonic chandelier cells. The phenotypes of each of these four mouse lines were carefully characterised to reveal clear differences with the most fundamental being that Interneuron-targeted deletion of FGF13 led to perinatal mortality associated with extensive seizures and impaired the hippocampal inhibitory/excitatory balance while deletion of FGF13 in excitatory neurons caused no detectable seizures and no survival deficits. 

      The authors made excellent use of western blotting and in situ hybridisation of the different FGF13 isoforms to determine which isoforms are expressed in which cell types, with FGF3-S predominantly in excitatory neurons and FGF13-VY and FGF13-V predominantly in GABAergic neurons. 

      The authors performed a highly detailed electrophysiological analysis of excitatory neurons and GABAergic interneurons with FGF13 deficits using whole-cell patch clamp. This enabled them to show that FGF13 removal did not affect voltage-gated sodium channels in interneurons, but rather reduced the action of potassium channels, with the resultant effect of making it more likely that interneurons enter depolarisation block. These findings were strengthened by the demonstration that viral re-expression of different Fgf13 splice isoforms could partially rescue deficits in interneuron action potential output and restore K+ channel current size. 

      Additionally, the discussion was nuanced and demonstrated how the current findings resolved previous apparent contradictions in the field involving the function of FGF13. 

      These findings will have a significant impact on our understanding of how FGF13 causes seizures and death in DEEs, and the action of different FGF13 isoforms within different neuronal cell types, particularly GABAergic interneurons. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The limitations of the KO model should be fully discussed in the discussion. It should be clear that knocking out FGF13 does not provide insight into how missense mutations such as R11C may alter excitatory and/or inhibitory neuron excitability. 

      We agree with this overall interpretation of our data and have updated our language in the Discussion to make the distinction between mechanisms attributable to a knockout compared to a missense variant. We note, however, that the proposed mechanism by which missense variants (e.g., R11C) drive seizures is through loss of long-term inactivation in excitatory neurons and our excitatory knockout model shows loss of long-term inactivation in excitatory neurons. Thus, our knockout model demonstrates that the mechanism(s) by which the missense variants alter neuronal excitability in excitatory neurons must exclude long-term inactivation, thereby providing some clarity regarding the proposed mechanism for those missense variants.

      It is important to know what sodium channel isoforms are expressed in the cultured neurons used in the experiments for Figures 2 and 5. Are Nav1.1, Nav1.2, Nav1.3, and Nav1.6 expressed at appropriate levels in the cultures? 

      We agree it is important to know that the sodium channel isoforms expressed in our hippocampal neurons are expressed at physiologically relevant levels, for further validation of our primary culture system. We have added RT-qPCR data from our hippocampal neuron cultures (Supplemental Figure 2B) showing the relative levels of SCN1A, SCN2A, SCN3A, and SCN8A, which are similar to the relative levels of voltage-gated sodium channel isoforms found in rodent and human forebrain in early development (Figure 1 in PMID: 35031483).

      The electrophysiological experiments are intriguing but limited. One, it would be helpful to report if there were any changes in resting membrane potential for the cells reported in Figure 5. It is also inappropriate to unequivocally state that "Nav currents were not significantly affected by Fgf13 knockout in Gad2Fghf13 KO neurons" as only a sampling of properties was investigated. Recovery from inactivation and persistent current amplitudes were not evaluated. Furthermore, while it looks like long-term inactivation is not altered, only one specific protocol was used and currents measured from cultured neurons may not be fully representative of neuronal properties in vivo. 

      We agree that we performed a selective analysis of Nav currents—selected because those are the major parameters that have been associated with FGF13 modulation. Because we did not observe significant differences in NaV currents, we therefore hypothesized that FGF13 affected other currents, as previously observed, and consequently assessed potassium currents, for which we did observe a difference. Further, we note that our sodium current and potassium current results are consistent with, and supportive of, our action potential data in which we find no deficit in AP initiation, but rather a deficit in AP repolarization. We revised the text to reflect the more limited analysis of Nav currents. Regarding long-term inactivation, we also agree that measurements in cultured neurons may not fully represent neuronal properties in vivo; however, we note that regulation of long-term inactivation by FGF13 has previously been assessed only in cultured cells (and not in neurons). Thus, our protocols were designed to query that modulation previously reported.

      The first sentence of the results section is misleading: "To determine how FGF13 variants contribute to seizure disorders, we developed genetic mouse models that eliminate Fgf13 in specific neuronal cell types." The knockouts do not target specific splice isoforms and do not help determine how missense variants contribute to DEE. This should be modified to reflect better what is actually being tested. 

      We agree and have revised our text to state that our goal was to assess how FGF13 contributes to neuronal excitability and thereby accurately reflect the cell type-specific, but not isoform specific, targeting.

      Reviewer #2 (Recommendations For The Authors): 

      • The sentence in the introduction stating "an unusual example of differential expression of an alternatively spliced neuronal gene in excitatory vs. inhibitor neurons" is factually incorrect, especially for transcripts regulating intrinsic properties like FGF13. Refer to PMID: 31451803 for more details and consider rephrasing this statement. 

      We updated our text to reflect the similarity of Fgf13’s cell type-specific alternative splicing to other genes known to control synaptic interactions and neuronal architecture and added the suggested reference.

      • Consistency is needed in the manuscript regarding the term "BASEscope" or "basescope"; the correct version is "BaseScope." 

      We corrected the text accordingly.

      • In the discussion, the term "reduced overall inhibitory drive" might be more appropriate than "input." 

      We updated the text accordingly.

      • The authors should refer to the Fgf13 data in the database from Furlanis et al., which complements their findings: https://scheiffele-splice.scicore.unibas.ch/

      We agree and now incorporate this reference.

      • The phrase "Fgf13 silencing in Nkx2.1 expressing neurons" should be clarified to include the use of CreER, which was crucial and effectively resulted in the labeling of a different subtype of interneurons, see PMID: 23180771. 

      We agree and have updated our text accordingly.

      • Be more cautious when discussing the role of FGF13 in chandelier function; while it seems probable, the current Cre driver used provides no direct evidence. 

      We agree (as noted above) that while our data are consistent with the possibility of a role for Fgf13 in chandelier function, the current Cre driver used is insufficient to offer direct evidence and therefore updated our text in the discussion.

      • The gene dosage effect is interesting, it would be interesting to explore it further in the future. 

      We agree. Because our data suggest that seizures result from loss of inhibitory neuron input, we hypothesize that the gene dosage effect derives from further loss of inhibitory neuron input and thus more hyperexcitability.

      • Another critical aspect not addressed here and of interest for the future is the distinction between the role of FGF13 in interneuron development versus general maintenance. Using Nkx2.1CreER could have helped address both cell specificity and developmental roles. 

      We agree that there may be an interesting distinction between the role of Fgf13 in development versus general maintenance. We have piloted an Nkx2.1-CreER targeted deletion of Fgf13 from cortical interneurons but have been unsuccessful with significant deletion of Fgf13, likely because the Nkx2.1-CreER strategy targets only a sparse subset of interneurons and FGF13 is expressed in only a subset of total interneurons. Thus, use of the Nkxs.1-CreER strategy is challenging. We are looking for ways to optimize.

      Reviewer #3 (Recommendations For The Authors): 

      This was a truly fabulous paper, with an exceptional quantity of beautiful data. I would like to congratulate the authors on their superb work. 

      In the discussion, the authors correctly draw attention to the fact that the clear pro-seizure phenotype they see when FGF13 was knocked out more specifically in a subset of interneurons including chandelier cells, adds to our understanding of the role of FGF13 in chandelier cells. More than that though, given that FGF13 is reducing excitability in these cells AND this results in a strong pro-seizure phenotype, they may want to postulate that this lends further weight to the argument that chandeliers cells are likely powerful regulators of network excitability despite suggestions in the field that they could potentially have a proexcitatory function (see Szabadics et al. Science 2006). 

      We agree this is interesting and have elaborated on our discussion of chandelier cells to include this point while also addressing the important caveats noted by reviewer 2.

      A minor point: 

      On page 26 the sentence: 

      "Here, we were able to assess FGF13-S and FGF13-VY, chosen because they are most abundantly expressed isoforms in the adult mouse brain, but the inability to rescue electrophysiological consequences completely with either isoform alone leaves open the possibility that other isoforms (e.g., FGF13-U, FGF13-V, and FGF13-VY) also make critical contributions." Should the last "FGF13-VY" be removed? 

      We thank the reviewer for noticing the error and have updated the text accordingly.

    1. eLife Assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals wait to reach a reward, with this mapping remaining consistent across days. While most claims are supported by solid evidence, the study could have benefitted from an improved experimental design to more clearly disambiguate correlations between neuronal patterns and not only time but also stereotypical behaviors and restraint from impulsive decisions. This study will be of particular interest to neuroscientists focused on decision-making and motor control.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis of incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity.

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals.

      Weaknesses:

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials. In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions."

      Comments on revisions:

      The authors responded properly to my initial comments. However, I have three additional recommendations for the reviewed manuscript.

      First, the paper urgently needs proofreading by a professional English editor. Second, Figure 4 must be divided in 2, it has too many panels and the resolution of the figure is low. Finally, please consider that what is called scaling factor in Figure 4G should be called something like neural sequence position index. A scaling factor in the timing literature implies that the pattern of activation of a cell contracts or expands according to the timed interval.

    3. Reviewer #2 (Public review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nose-poking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions.

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflect their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that casts doubt on most of the conclusions of the study, there are also several major statistical issues.

      Main Concerns

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precise time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annihilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available).

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3) The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke.

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task.

      Comments on the revised version

      I have read the revised version of the manuscript and the rebuttal letter. My major concern was that the task used is not a time estimation task but primarily taps into impulse control and that animals are not immobile during the nose-poking epoch. I provided factual evidence for this (the animal's timing performance is poor and, on average, animals struggle to wait long enough), and I pointed to a review that discusses the results of many studies congruent with the importance of movement/motivation, not only in constraining the timing of reward-oriented actions during so-called time estimation tasks but also in powerfully modulating neuronal activity.

      The authors' responses to my comments are puzzling and unconvincing. First, on the one hand, they acknowledge in their rebuttal letter the difficulty of demonstrating a neuronal representation of explicit internal estimation of time. Then, they seem to imply that this issue is beyond the scope of their study and focus in the rebuttal on whether the neuronal activity they report shows signs of being sensitive to movement and motivation, which they claim is independent of movement and motivation. This leads the authors to make no major changes in their manuscript. Their title, abstract, introduction, and discussion are largely unchanged and do not reflect the possibility that there are major confounding factors in so-called time estimation (rodents are not disembodied passive information processors) that may well explain some of the neuronal patterns. Evidently, the dismissive treatment by the authors is not satisfying. I will briefly restate my comments and reply to their responses and their new figure, which not only is unconvincing but raises new questions.

      My comments were primarily focused on the behavioral task. The authors replied: "Studying the neural representation of any internal state may suffer from the same ambiguity [by ambiguity they meant that it is difficult to know if animals are explicitly estimating time]. With all due respect, however, we would like to limit our response to the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist." The authors imply that my comments are beyond the scope of their study. That is not true. My comments were targeted at the behavior of the animals, behavior they rely on to title their study: "Stable sequential dynamics in prefrontal cortex represents a subjective estimation of time." When I question whether the task and behavioral data presented are congruent with "subjective estimation of time," my comments are not beyond the scope of the study-they directly tackle the main point of the authors. Other researchers will read the title and abstract of this manuscript and conclude: "Here is a paper that provides evidence of a mechanism for animals estimating duration internally (because subjective time perception is assumed to be different from using clocks)." Still, there is a large body of literature showing that the behavior of animals in such tasks can be entirely explained without invoking subjective time perception and internal representation. How can the authors acknowledge that they can't be sure that mice are estimating time and then have such an affirmative title and abstract?

      In my opinion, science is not just about forcing ideas (often reflecting philosophical preconceptions) on data and dismissing those who disagree. It is about discussing alternative possibilities fairly and being humble. In their revised version, I see no effort by the authors to investigate the importance of movement and motivation during their task or seriously engage with this idea. It's much easier to dismiss my comments as being beyond the scope of their results. According to the authors, it seems that movements and motivations play no role in the task. Still, the animals are water-restricted, and during the task, they will display decreased motivation (due to increased satiety), and their history of rewarded vs. non-rewarded trials will affect their behavior. This is one of the most robust effects seen across all behavioral studies. Moreover, the animals are constantly moving. Maybe the authors used a special breed of mice that behave like some kind of robots? I acknowledge that this is not easy to investigate, but if the authors did not use high-quality video recording or an experimental paradigm that allows disentangling motivational confounds, then they should refrain from using big words such as subjective time estimation and discuss alternative representations by acknowledging the studies that do find that movement and motivation are present during reward-based timing tasks and do in fact modulate neuronal activity, even in associative brain regions.

      To sustain their claim that what they reported is movement-independent, the authors provided a supplementary figure in which they correlated neuronal activity and head movement tracked using DeepLabCut. I have to say that I was particularly surprised by this figure. First, in the original manuscript, there was absolutely no mention of video recording. Now it appears in the methods section, but the description is very short. There is no information on how these video recordings were made. The quality of the images provided in Figure S2 is far from reassuring. It is unclear whether the temporal and spatial resolution would be good enough to make meaningful correlations. Fast head/orofacial movements that occur during nose-poking can be on the order of 20 Hz. To be tracked, this would require at least a 40 Hz sampling rate. But no sampling information is provided. The authors should explain how they synchronized behavioral and neuronal data acquisition. Could the authors share behavioral videos of the 5 sessions shown in Figure S2 so we can judge the behavior of the animals, the quality of the video, and the possibility of making correlations?

      Figure S2A-F: I am not sure why the authors correlated nose-poking duration (time estimation) and the duration between upper and lower nose-pokes (reward-oriented movement). It is not relevant to the issue I raised. Without any information about video acquisition frame rate, the y-axis legend (frame) is not very informative. Still, in Figure S2A-F, Rat 5 shows a clear increase in nose-poke duration, which is congruent with decreased impulsivity. Is the time coding different in this rat compared to other rats? There are some similar trends in other animals (Rat 1 and maybe Rat 3), but what is surprising is the huge variability (big downward deflections in the nose-poke duration). I would not be surprised if those deflections occurred after a long pause in activity. Could the authors plot trial time instead of trial number? How do the authors explain such a huge deflection if the animals are estimating time?

      Regarding Figure S2H: I don't see how it addresses my concern. My concern is that some of the Ca activity recorded during nose-poking reflects head movements. The authors need to show if they can detect head movement during nose-poking. Aligning the Ca data relative to head movement should give the same result as when aligning the data relative to the time at which the animals pull out of the upper nose-poke.

      Minor comments:

      In their introduction, the authors wrote: "While these findings [correlates of time perception] provide strong evidence for a neural mechanism of time coding in the brain, true causal evidence at single-cell resolution remains beyond reach due to technical limitations. Although inhibiting certain brain regions (such as medial prefrontal cortex, mPFC,22) led to disruption in the performance of the timing task, it is difficult to attribute the effect specifically to the ramping or sequential activity patterns seen in those regions as other processes may be involved. Lacking direct experimental evidence, one potential way of testing the causal involvement of 'time codes' in time estimation function is to examine their correlation at a finer resolution."<br /> This statement is inaccurate at two levels. First, very good causal evidence has been obtained on this topic (see Monteiro et al., 2023, Nature Neuroscience), and see my News & Views on the strengths and weaknesses of this paper. Second, their proposal is inaccurate. Looking at a finer correlation will still be a correlative approach, and the authors will not be able to disentangle motor/motivation confounds.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control.

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we have performed additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity.

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. Our imaging data generally yielded 50-150 cells in each session. The 18 neurons mentioned by the reviewer are from the duration cell category. We have now provided the number of imaged cells from each rat in the new Supplementary figure 1D. In addition, we have plotted the duration cells’ sequential activity of individual trials for each rat in new Supplementary figure 1B and 1C. These data demonstrate robust sequential activities from the duration cells.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We have now performed analyses of the neural population trajectories as the reviewer suggested. We have calculated the neural population trajectories using the first two principal components of the neural activities during nose poke events. While both correct and incorrect trials show similar shapes of the trajectories, correct trials show more expanded paths, with longer lengths on average. These new results are now updated in Figure 4. Since type I or type II errors would likely generate trajectories not following the general direction which is different from our observations, these results are consistent with our conclusion that scaling errors contribute to the incorrect behavior timing in these rats.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      To clarify the original Figure 4G, the correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggests a possible use of this neural mechanism to time the action of the rats.

      In addition, we have performed the analysis suggested by the reviewer in our revision. We calculated two types of scaling factors. On individual cell level, we computed the peak position of individual trials to the expected positions from averaged template. And on neural population level, we searched for a scaling multiplier to resample the calcium activity data and minimized the differences between scaled activity and the expected template. Using these two factors, we found that correct trials show significantly larger scaling compared to incorrect trials, consistent with our original interpretation that behavior errors are primarily correlated with scaling errors in the neural activities (type III error). These new results are now incorporated in Figure 4 and we have also updated the main text for the descriptions.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer, and have now modified this sentence in the abstract.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions.

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues.

      Main Concerns

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together, since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of discussions go beyond the scope of this study, and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to be answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’sarticle, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response to the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we have now performed a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the experimental rats during nose poke and analyzed its periodicity among different trials. We found that the coding cells (including duration, start and end cells) activities were not modulated by these motions, arguing against this possibility. These data are now included in the new Supp. Figure 2, and we have added corresponding texts in the manuscript.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should be linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see graph below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation.

      In order to further test the relationship to motivation, we have measured the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We found that this reward-seeking time was positively correlated with the trial durations, suggesting that the durations were correlated with motivation to some degree. And when we scaled the activities of the duration cells by this reward-seeking time, we found that the patterns of the sequential activities were largely diminished, and showed a significantly lower peak entropy compared to the same activities scaled by trial durations. The remaining sequential pattern may be due to the correlation between trial durations and motivation (Supp. Figure 2), and the sequential pattern reflects timing more prominently. These analyses provide further evidence that the sequential activities were not coding motivations. These data are included in Figure 2F, 2K and supp. Figure 3 in revised manuscript.

      Author response image 1.

      Regarding whether the scaling sequential activity we report represents behavioral timing or true time estimation, we did not have evidence on this point. However, a previous study has shown that PFC silencing led to disruption of the mouse’s timing behavior without affecting the execution of the task (PMID: 24367075), arguing against the behavior timing interpretation. The main surprising finding of our present study is that these duration cells are different from the start and end cells

      in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clues regarding whether they are connected with reward-related or motion-related brain regions. This may help partially resolve the “time” vs.

      “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3) The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. It appears that the reviewer requires we conduct our analysis using each rat individually. In our revised manuscript, we have conducted and reported analyses with individual rat in the original Figure 1C, Figure 2C, G, K, Figure 4F.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We have now incorporated more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We have now modified the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We have now cited and discussed the study in the discussion section of the revised manuscript.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We have now provided this information as requested. The numbers of rats are also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further.

      We have now labeled the analyzed sessions in Figure 1B with red color in the revised manuscript.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells.

      We thank the reviewer for the suggestion and have now modified the figure accordingly in the revised manuscript.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC.

      We thank the reviewer for the question. In our experience, mice with lens implanted in the mPFC did not show observable difference with mice without surgery in the acquisition of the task and the distribution of the nose-poke durations. In our dataset, rats with the lens implantation showed similar nose-poking behavior as those without lens implantation (Figure 1B). Thus, it seems that the effect of ablation, if any, was quite limited, in the scope of our task.

    1. eLife Assessment:

      This study presents valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. Details of the specific threshold must be taken with caution and are necessarily incomplete, but may be supported by additional experiments with higher resolution in space and time in the future.

    2. Reviewer #2 (Public review):

      Summary

      Lines et al investigate the integration of sensory-evoked calcium signals in astrocytes of the primary somatosensory cortex in anesthetized mice. More precisely, their goal is to better characterize the mechanisms that govern the emergence of whole-cell events in astrocytes, here referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial and temporal integration of calcium signals in astrocytes and the mechanisms governing these phenomena is of tremendous importance to deepen our understanding of signal processing in the central nervous system. In line with previous reports in the field, the authors find that most signals originate in the arborization of astrocytes, occasionally leading to somatic and whole-cell events. On average, the latter occur following domain activity closer to the soma, suggesting a centripetal propagation of signals leading to somatic events. Moreover, they observe that the distance from the soma to active domains increases with time after somatic events, suggesting a potential centrifugal propagation of signals post-somatic activity. The results suggest that most calcium surges depend on the expression of IP3R2, the main calcium channel in astrocytes, located at the membrane of the endoplasmic reticulum. Finally, they report a correlation between the percentage of active domains in the astrocyte "arbor", the emergence of a somatic event, and the frequency of slow inward currents from neighboring neurons. The main claim of this manuscript is that there would be a spatial threshold inherent to astrocytes of ~23% of domain activation above which a calcium surge is observed. Although the study provides data and concepts that are important for the glia field, the conclusions seem a little too assertive and general with respect to what can be deduced from the data and methods used.

      Strengths

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulate the percentage of active domains in the astrocyte arborization by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration). The question investigated is important as the mechanisms governing signal integration in astrocytes and its effect on neighboring cells are poorly understood.

      Weaknesses

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data.

      Although the revised version includes more discussion on the experiments that could be done to extend the results from this study, more discussion would be needed to clarify the limitations on what can be deduced from the proposed experimental and analytical design. Notably, the analysis pipeline seems biased by the assumption of the existence of a spatial threshold dictating the emergence of global calcium events in astrocytes. Although there is a clear linear correlation between the percentage of active somas and the percentage of active domains in the arborization (Figure 2 panel F), concluding on the existence of an inherent threshold of domain activity is not completely supported by the data (see e.g. Figure 2 panel F or Figure 4 panel E). It would probably be more accurate to report that most somatic events occur when the percentage of arbor domains being active is above 21-24% (95% confidence interval of the reported threshold). Thus, some of the conclusions from the manuscript, such as p.14 l.34-35 " spatial threshold of domains that needs to be reached in order to lead to soma activation", seem a bit too assertive as some astrocytes did display soma activation with a much smaller percentage of active domains or on the contrary, no somatic event despite domain activity way above the threshold. Similarly, as Figure 6 demonstrates a strong effect of IP3R2 knock-out on somatic activation but reports a non-zero probability of soma activity in IP3R2 -/- mice (panel F), the conclusion that IP3R2 are necessary to trigger an astrocytic calcium surge seems a bit too strong. Finally, the results reported in Figure 7 demonstrate the existence of a strong correlation between SICs, the percentage of active astrocyte domains on, and somatic activation, so that the conclusion "These results indicate that spatial threshold of the astrocyte calcium surge has a functional impact on gliotransmission" (l.4&-48 page 13) also seems a bit too assertive.

    3. Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents.

      The revised manuscript is improved compared to the first iteration. While some concerns have been addressed, my main critique pertaining to ROI approach/sampled area, statistical analyses and anesthesia are in my view still important caveats of the study that I think should have been even more clearly addressed in the manuscript.

      Strengths:<br /> The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      Authors reply: In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell or mostly comprised of extracellular space, and we chose the conservative approach to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We have included a paragraph in the discussion to address this limitation in our study on P15, L23-27:<br /> "The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here.

      Comments on revisions: It is good that 3D imaging aspects are mentioned as a limitation, and I agree that Bindocci et al. do not necessarily suggest that results in this manuscript would have been different if also the third spatial dimension was included in the analyses. However, the way I see it, the added analyses and text changes throughtout still do not adequately address my concern pertaining to basing a spatial threshold on a fraction of the astrocyte territory.

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      Authors reply: We agree with the reviewer and have added to the paper a discussion for our justification on the use of the Heaviside step function, and have included this in the methods section. We chose the Heaviside step function to represent the on/off situation that we observed in the data that suggested a threshold in the biology. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a different statistical model describing the data would be more convincing and confirmed the spatial threshold with the use of a confidence interval in the text and supported the use of percent domains active for this threshold over other properties such as spatial or temporal clustering using a general linear model. P18-19, L34-2:<br /> "Heaviside step function<br /> The Heaviside step function below in equation 4 is used to mathematically model the transition from one state to the next and has been used in simple integrate and fire models (Bueno-Orovio et al., 2008; Gerstner, 2000).<br /> 𝐻(𝑎) ∶=<br /> 0, 𝑎 < 𝑎T<br /> {<br /> 1, 𝑎 {greater than or equal to} 𝑎T<br /> (4)<br /> The Heaviside step function 𝐻(𝑎) is zero everywhere before the threshold area (𝑎T) and one everywhere afterwards. From the data shown in Figure 4E where each point (𝑆(𝑎)) is an individual astrocyte response with its percent area (𝑎) domains active and if the soma was active or not denoted by a 1 or 0 respectively. To determine 𝑎T in our data we iteratively subtracted 𝐻(𝑎) from 𝑆(𝑎) for all possible values of 𝑎T to create an error term over 𝑎. The area of the minimum of that error term was denoted the threshold area.

      Comments on revisions: Even with the added explanations, I am still not sure that the data show a specific threshold, or that the statistical model enforce a threshold onto the data. The data in Fig. 4G does not in my view clearly show a clear threshold as suggested. The analyses are strengthened with an added statistical modeling, however, the details of the modeling is not presented in the manuscript as far as I can see. As a bare minimum the statistical packages/tools used, the model details and goodness of fit as residual plots must be shown/commented.

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      Authors reply: We have increased the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

      Comments on revisions: Bath temperature for slice experiments, or cutting conditions are still not reported. For the in vivo experiments, it must be commented that this level of physiological monitoring for acute in vivo brain physiology experiments (self breathing, no control of O2/CO2) is barely adequate and could represent a considerable caveat of the study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence. 

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge. 

      We respect the thoughtfulness of the reviewers and editors towards improving the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      (1) Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We have provided a more detailed description of the methods and results to clarify the spatiotemporal relationships between domain activation and spatiotemporal clustering, to centripetal and centrifugal calcium propagation in relation to soma activation.

      (2) Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We have expanded upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      (3) Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We have provided additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      (4) Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We have enhanced our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes. 

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge. 

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge. We have now included a paragraph in the Discussion section on this subject on P15, L16-22:

      “We were able to discover this general phenomenon of astrocyte physiology through the use of a novel computational tool that allowed us to combine almost 1000 astrocyte responses. Variation is rife in biological systems, and there are sure to be eccentricities within astrocyte calcium responses. Here, we focused on grouped data to better understand what appears to be an intrinsic property of astrocyte physiology. We used different statistical examinations and tested our hypothesis in vivo and in situ, and all these methods together provide a more complete picture of the existence of a spatial threshold for astrocyte calcium surge.“

      The specialized work of Stobart et al. 2018, was focused more on the fast activation of microdomain subpopulations than the induction of later somatic activation. Indeed, Stobart et al. 2018 and Wang et al. 2006 also found that somatic responses of astrocytes were delayed in the range of seconds. Importantly, Wang et al., 2006 describe that the activation of astrocytes is frequency dependent, that is, the higher the frequency, the faster and higher the activation. In the present, work we stimulated at just 2 Hz to better investigate the spatial threshold. Excitingly, the results showed by Stobart et al., 2018 agree with ours, Rupprecht et al. 2024 and Fedotova et al. 2023, that there is a sequence of activation from the domains to the somas, which could be due to the time that is required for the summation of the initial microdomain signal to reach a threshold capable to activate the soma. These above referenced studies have many similarities with our own but are different in the underlying scientific question that led to diverging methodology, however we want to stress that we agree with the reviewers that our methods provide sufficient evidence for the cell-scale scientific phenomenon that we are studying, which is the spatial threshold for astrocyte calcium surge. Finally, we have included an additional figure (new Figure 5) that only looks at the calcium dynamics of early responding cells and found no significant difference in the spatial threshold in this population compared to our original quantification.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be

      constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.  

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we address this by the inclusion of a novel analysis shown in the new figure (new Figure 5) in the revised version of the manuscript. In this new analysis, we demonstrate that the average distance between domain activation is not significantly different between subthreshold activity and the activity that precedes or follows the suprathreshold cellular activation. In contrast, we do find a significant difference in the average time between domain activation between subthreshold activity and activity that precedes and follows suprathreshold activation. We go further with a generalized linear model to show that percent area of active domains and temporal clustering is related to soma activation and not spatial clustering. This suggests that domain activation doesn’t need to be spatially clustered together to induce soma activation and subsequent calcium surge, but more importantly, domain activation must be over the spatial threshold and occur within a timeframe. This has been added to the Results on P10, L2-40:

      “Our results demonstrate the relationship between the percentage of active domains and soma activation and subsequent calcium surge. Next, we were interested in the spatiotemporal properties of domain activity leading up to and during calcium surge. Because we imaged groups of astrocytes, we were able to constrain our analyses to fast responders (onset < median population onset) in order to evaluate astrocytes that were more likely to respond to neuronal-evoked sensory stimulation and not nearby astrocyte activation (Figure 5A). In this population the spatial threshold was 23.8% within the 95% confidence intervals of [21.2%, 24.0%]. First, we created temporal maps, where each domain is labeled as its onset relative to soma activation, of individual astrocyte calcium responses to study the spatiotemporal profile of astrocyte calcium surge (Bindocci et al., 2017; Rupprecht et al., 2024) (Figure 5B). Using temporal maps, we quantified the spatial clustering of responding domains by measuring the average distance between active domains. We found that the average distance between active domains in subthreshold astrocyte responses were not significantly different from pre-soma suprathreshold activity (16.3 ± 0.4 µm in No-soma cells versus 16.2 ± 0.3 µm in Pre-soma cells, p = 0.75; n = 286 No-soma vs n = 326 Pre-soma, 30 populations and 3 animals; Figure 5C). Following soma activation, astrocyte calcium surge was marked with no significant change in the average distance between active domains (16.0 ± 0.3 µm in Post-soma cells versus 16.3 ± 0.4 µm in No-soma cells, p = 0.57 and 16.2 ± 0.3 µm in Presoma cells, p = 0.31; n = 326 soma active and n = 286 no soma active, 30 populations and 3 animals; Figure 5C). Taken together this suggests that on average domain activation happens in a nonlocal fashion that may illustrate the underlying nonlocal activation of nearby synaptic activity. Next, we interrogated the temporal patterning of domain activation by quantifying the average time between domain responses, and found that the average time between domain responses was significantly decreased in pre-soma suprathreshold activity compared to subthreshold activities without subsequent soma activation (9.4 ± 0.3 s in No-soma cells versus 4.4 ± 0.2 s in Pre-soma cells, p < 0.001; n = 326 soma active vs n = 286 not soma active, 30 populations and 3 animals; Figure 5D). The average time between domain activation was even less after the soma became active during calcium surge (2.1 ± 0.1 s in Post-soma versus 9.4 ± 0.3 s in No-Soma cells, p < 0.001 and 4.4 ± 0.1 s in Pre-soma cells, p < 0.001; n = 326 soma active and n = 286 not soma active, 30 populations and 3 animals; Figure 5D). This corroborates our findings in Figure S2 and highlights the difference in temporal profiles between subthreshold activity and astrocyte calcium surge. 

      We then tested the contribution of each of our three variables describing domain activation (percent area, average distance and time) to elicit soma activation by creating a general linear model. We found that overall, there was a significant relationship between these variables and the soma response (p = 5.5e-114), with the percent area having the largest effect (p = 3.5e-70) followed by the average time (p = 3.6e-7), and average distance having no significant effect (p = 0.12). Taken together this suggests that the overall spatial clustering of active domains has no effect on soma activation, and the percent area of active domains within a constrained time window having the largest effect.”

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we have included text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicate an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension. We have added to the Discussion P16, L15-30:

      “Astrocyte calcium activity induces multiple downstream signaling cascades, such as the release of gliotransmitters (Araque et al., 2014; de Ceglia et al., 2023). Using patch-clamp recordings of a single nearby neuron we showed that a nearby population of astrocyte calcium surge is also correlated to the increase in slow inward currents (SICs), previously demonstrated to be dependent on astrocytic vesicular release of glutamate (Araque et al., 2000; Durkee et al., 2019; Fellin et al., 2004). The increase of SICs we observed from patching a single neuron is likely the integration of gliotransmitter release onto synapses from a group of nearby astrocytes. Indeed, subthreshold astrocyte calcium increases alone can trigger activity in contacted dendrites (Di Castro et al., 2011). An exciting avenue of future research would be to observe the impact of a single astrocyte calcium surge on nearby neurons (Refaeli and Goshen, 2022). How many neurons would be affected, and would this singular event be observable through patch clamp from a single neuron? The output of astrocyte calcium surge is equally important to network communication as the labeling of astrocyte calcium surge, as it identifies a biologically relevant effect onto nearby neurons. Many downstream signaling mechanisms may be activated following astrocyte calcium surge, and the effect of locally concentrated domain activity vs astrocyte calcium surge should be studied further on different astrocyte outputs.”

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. However, our finding in Figure 5E suggests a directionality of centripetal propagation from the arborization to the soma to elicit calcium surge that leads to centrifugal propagation. In the Results on P10-11, L41-8:

      “Recent work studying astrocyte integration has suggested a centripetal model of astrocyte calcium, where more distal regions of the astrocyte arborization become active initially and activation flows towards the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we confirm this finding, where activated domains located distal from the soma respond sooner than domains more proximal to the soma (linear correlation: p < 0.05, R2 = 0.67; n = 30 populations, 3 animals; Figure 4E). Next, we build upon this result to also demonstrate that following soma activation, astrocyte calcium surge propagates outward in a centrifugal pattern, where domains proximal to the soma become active prior to distal domains (linear correlation: p < 0.01, R2 = 0.89; n = 30 populations, 3 animals; Figure 4E). Together these results detail that intracellular astrocyte calcium follows a centripetal model until the soma is activated leading to a calcium surge that flows centrifugally. This suggests that astrocytes have the capabilities to integrate the nearby local synaptic population, and if this activity exceeds the spatial threshold then it leads to a whole-cell response that spreads outward.” 

      And in the Discussion P15, L3-15:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation. We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2023). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items have been discussed and clarified in the revised version of the manuscript on P5, L17-19:

      “The concept of domain to define all subcellular domains in the astrocyte arborization should not be confused with the concept of microdomain, that usually refers to the distal subcellular domains in contact with synapses.”

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we have included a new figure (new Figure 5) that analyzes early responders.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Also, SR101-negative domains could encapsulate an area that is only partially that of astrocyte territory, including also extracellular space. Here we take a conservative approach to constrain ROIs to SR101positive astrocyte territory outlines without invading neighboring cells or extracellular space in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results. We have since included on the limitations of the study in the Discussion P15, L3137:

      “In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses. Indeed, we have performed the spatial threshold analysis on early responders (first half of responding cells), and found the spatial threshold in that population (23.8%) is within the 95% confidence interval [21.2%, 24.0%]. Additionally, the slow responders were also within the confidence interval (22.6%).

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data would be interesting, and we provide the results of the suggested analysis within the new figure (new Figure 5) in the revised version of the manuscript. In this analysis we show that subthreshold, pre-soma and post-soma dynamics are significantly different in time. These added results of including temporal maps strengthen our claim of a spatial threshold, by quantifying the distinct temporal and spatial dynamics of domain activation before and after the spatial threshold is met (i.e. soma activation), and highlights differences in subthreshold and suprathreshold activity.

      (1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses. The additional analysis we provide based on temporal maps (new Figure 5) shows a very interesting result in that there is no significant difference between the spatial clustering of, or average distance between, activated domains in subthreshold and pre-soma suprathreshold activity. This result, along with the General Linear Model, suggests that there is not another subcellular potential spatial threshold, as the activity is the same. Instead, the main difference between activity in the domains that leads to soma activation or not is the overall percentage of domains active and not necessarily how that spatial activity is organized. We have also added this point in the Discussion section to highlight the importance of this result. P15, L3-8:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation.”

      (2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome).

      The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal when considering activity leading up to soma activation. Indeed, we have found arborization activity precedes soma activity (Figure 3), soma activity appears to rely on the percent area of domain activity (Figure 4), and pre-soma domain activity comes online earlier in domains distal from the soma (new Figure 5). However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies. Our new results in the new Figure 5 demonstrating that subthreshold activity has a spatial organization that is not significantly different than pre-soma activity in suprathreshold cases argues in favor of a general excitability threshold hypothesis. However, we do not see these hypotheses as mutually exclusive. Excitingly, we have also found that following soma activation, calcium surge appears to follow a centrifugal propagation. We have since added the topic of a centripetal-centrifugal experimental model to the Discussion P15, L8-15:

      “We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      (3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and our new spatiotemporal analysis found in the new figure (new Figure 5) aims to shed some light on this and is answered above. To our knowledge, there is no mechanism in astrocytes to impose directionality on calcium propagation, like rectifying voltage-gated sodium channels in neuronal voltage propagation. We found that the delay of domain activation compared to soma onset is significantly correlated to the distance from the soma (new Figure 5E). In addition, spatial clustering is not significantly different compared in pre-soma vs. non responders or post-soma. Together this suggests that centripetal propagation may be occurring throughout the entire cell and not in a local clustered way. Our findings also suggest that following soma activation astrocyte calcium surge follows a mostly centrifugal pattern (new Figure 5E).

      (4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      Please see above comments.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant, STARDUST, AQuA or AQuA2. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion. We have now included a paragraph outlining the limitations of the study in the Discussion P15, L23-37:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here. To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step. In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents. Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell or mostly comprised of extracellular space, and we chose the conservative approach to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We have included a paragraph in the discussion to address this limitation in our study on P15, L23-27:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here.”

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we acknowledge this is in the Discussion P15, L27-31:

      “To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step.”

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer and have added to the paper a discussion for our justification on the use of the Heaviside step function, and have included this in the methods section. We chose the Heaviside step function to represent the on/off situation that we observed in the data that suggested a threshold in the biology. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a different statistical model describing the data would be more convincing and confirmed the spatial threshold with the use of a confidence interval in the text and supported the use of percent domains active for this threshold over other properties such as spatial or temporal clustering using a general linear model. P18-19, L34-2:

      “Heaviside step function

      The Heaviside step function below in equation 4 is used to mathematically model the transition from one state to the next and has been used in simple integrate and fire models (Bueno-Orovio et al., 2008; Gerstner, 2000).

      The Heaviside step function 𝐻(𝑎) is zero everywhere before the threshold area (𝑎 ) and one everywhere afterwards. From the data shown in Figure 4E where each point (𝑆(𝑎)) is an individual astrocyte response with its percent area (𝑎) domains active and if the soma was active or not denoted by a 1 or 0 respectively. To determine 𝑎 in our data we iteratively subtracted 𝐻(𝑎) from  𝑆(𝑎) for all possible values of 𝑎 to create an error term over 𝑎. The area of the minimum of that error term was denoted the threshold area.”

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We have increased the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      (1) We think it would improve the paper if the authors provided a frame-by-frame example over (for example) 10-15 frames showing the spatiotemporal evolution of responses, where each frame represents 1s or 2s. This could be included with the temporal maps we proposed above.

      We agree that this is a useful example and have included it in our new figure (new Figure 5, specifically see Figure 5A) that uses temporal maps to analyze the spatiotemporal properties of calcium dynamics (Figure 5B).

      (2) Concerning the evidence in the present manuscript, we are not clear on what "populations" means. Can the authors clarify in methods? It is our understanding that 987 astrocytes from 30 populations from 3 mice were the source for the core data in the paper. What are the 30 populations, and how were the 987 astrocytes distributed across the populations? Are they roughly 10 FOVs per mouse? If so, please clarify roughly how far apart FOVs from the same mouse were, and how much delay between stim protocol application there was when a FOV was changed to a new FOV. Also, if for example, the 10th FOV from mouse 1 "saw" 9 rounds of stimulation before recording the response to the 10th stim round. To this point, was there any indication of response differences in populations that were recorded earlier vs later in the experimental sequence for each mouse?

      Descriptions of data will be included with the uploaded datasets following acceptance.

      (3) The description of the results on page 6 is a bit confusing for us. In lines 1-4, are the authors saying that 57.7% of astrocytes in a FOV exhibited responses within their soma and arborization, while 15.1% had responses only in arborization? If so, this is not clear to us from Figure 2C, where we count ~25 astrocytes in the FOV, maybe 8 or 9 astrocytes with activity in the arborization + soma (after stimulation), and 8 or 9 astrocytes with responses only in arborization. Is there something we do not understand, or is the second panel simply not representative of the group data?

      Figure 2D is representative of the group data and does indeed show 57.7% of the population responds within the soma and arborization, and a 15.1% of astrocytes with responses in only their arborizations. It is unable to observe in this image whether arborizations are active or just increases in one or a few domains, as may not be enough activity to be detected when sampling over the entire arborization.

      (4) In the second part of page 6 - when the authors apply linear regression - are they saying that there is a linear relationship between the amount (area) of activity measured in the arborization versus the soma, where populations of astrocytes with 50% activation of the arborization also tend to have 50% activation in their somas? If so, then this is not apparent by the map provided in Figure 2C, where it looks like soma activation (within the subpopulation) is 100% irrespective of the apparent activity in the arborization. This needs to be clarified. If not, and what they mean is that the probability of finding an active soma is related to the amount of activation within the arborization, this needs to be stated more clearly.

      When testing the linear relationship between somas active vs arborizations active, we find a significant linear correlation (p < 0.001, R2 = 0.90).

      (5) In the experiments where stimulation duration, frequency, and intensity were varied to determine the percentage of domains that were on, it would be helpful to better understand the protocol in terms of sequence. In the methods it seems that hindpaw stimulation intensity was first pseudo-randomly varied at 2Hz for 10s, followed by pseudorandomly varied stimulation frequency and then pseudo-randomly varied duration - both at 2mA for 10s. Is this correct?

      We have since updated the methods section to better describe the experimental protocol.

      (6) In Figure 3E the alignment of the "arbor" to the somatic response is a bit misleading. The signals being averaged for the "arbor" are composed of temporally heterogeneous sources (from distal and proximal domains) and when averaged will produce an artificially slow rise time. In contrast, the averaged somatic signals are composed of much more homogenous sources (arising from a more singular event) and therefore have a sharp rise time. It would make more sense to align their kinetics relative to the stimulus onset. It would also make more sense to compare the somatic response of astrocytes to the "arbor" of astrocytes which respond rapidly vs slowly to the foot-shock.

      Aligning the responses to the stimulus onset would exacerbate the artificially slow rise time for the soma and arborization as not all cells come online at the same time from stimulus onset.

      Reviewer #2 (Recommendations For The Authors):

      Data availability

      It seems that the data is not shared on a public repository, while it appears to be necessary according to eLife's general principles (see https://elife-rp.msubmit.net/html/eliferp_author_instructions.html#dataavailability).

      We will upload raw data to a repository upon acceptance of the manuscript.

      Data analysis

      - Why did the authors choose the heaviside step function to characterize conditions for somatic event initiation? It seems that this approach is averaging very heterogeneous data (some cells do not display somatic events even with ~50% domains active while some display somatic events with < 5 it seems).

      Please see discussion to variability in the responses to the public reviews. We have since included more discussion on the use of the Heaviside step function in the Methods section.  

      - Averaging of the data. It seems that the approach chosen to quantify calcium activity overlooks the variability of the signals measured ("Astrocyte calcium quantifications were averaged over all astrocytes of a single video and these values were used in statistical testing.", l.22-23, page 15). What is the variability of the measured features between different astrocytes? Between different animals? To what extent does this averaging strategy overlook the variability of the signals/how much information do we expect to lose? The manuscript would probably benefit from a more advanced statistical approach to analyze the data.

      Is it possible to extract information from the data that would indicate mechanisms allowing somatic activity when the percentage of domain activation was lower than the threshold? How about the opposite (i.e when no global event was triggered even when the percentage of domain activation was high)?

      We are indeed combining the responses from many different diverse astrocyte responses, and we see this as a strength of the paper. Variation is a hallmark of biology, and we have added this to the discussion. In the rare cases where astrocyte somas do not come online when the percent of arborizations is over threshold, or the opposite when somas activate with little domain activation, we would say this is most likely due to imaging 2D instead of the entire 3D cell. We have also added this into our discussion.

      - Here are a few suggestions for additional analysis that might be of interest to the community:

      - Measuring calcium activity in domains depending on their distance from the soma. This would allow us to better understand the spatial integration of the signals and notably answer the following question: Does the emergence of somatic events depend on the spatial distribution of active domains? (and does a smaller domain-soma distance facilitate the emergence of a calcium surge with a lower percentage of active domains?) These measurements could be visualized with plots of xy position of the domains (domain-soma distance) = f(time) with a colormap reflecting dF/F0, for example, at different times pre- and post-somatic events. Instead of DF/F0, these plots could also display the correlation between domain activities.

      We have performed this analysis, and it is now in the new figure (new Figure 5).

      - Adding temporality to the data analysis. It seems that calcium activity is "concatenated" during the whole duration prior to the somatic event (pre-soma) and after (post-soma). However, it is unclear how long the domains remained active and how many domains were still active at the onset of the somatic event. Adding a finer temporal analysis might help answer questions such as the potential need for some degree of synchronization of domain activity to trigger calcium surges.

      It could notably be interesting to measure the level of synchrony of events as a function of their distance from the soma and to analyze how it correlates with the properties of the somatic event.

      We have now included temporal analysis of astrocyte calcium surge in our new figure (new Figure 5). While we did see examples of spatially clustered domain activation in our data, those examples usually included other non-clustered domain activities and when including all of the active domains within an astrocytes arborization, we found no difference between the distance between activated domains before and after soma activation, even when comparing to subthreshold domain activity.

      Experiments

      - Would it be possible to apply different levels of stimulation to a given cell in order to discriminate whether the "no-soma" cells can display somatic events when neuronal activity is enhanced?

      Increased sensory stimulation does increase soma activity (Please see Lines et al., Nature Communications, 2020). An example of increased stimulation leading to somatic activation where it was not present in lower stimuli can be seen in Figure 4A-C.

      - Why choose a stimulation of 2 mA, 2 Hz for 20 sec in the experiments on IP3R2-/- mice?

      Has the same set of various stimulation protocols featured in Figure 4 been applied to IP3R2-/- mice? If so, were more domains activated as stimulation intensity (amplitude; duration, or frequency) increased? Could it trigger somatic events? This information seems necessary to be able to assert that calcium surges rely on the IP3R2 pathway.

      These experiments were not performed.

      -  Adding intermediary values of ATP pulse duration to Figure 6 (e.g. 50 ms and 75 ms) might strengthen the claim that the linear increase of SIC frequency with ATP application duration is only observed above the ~23% threshold.

      Agreed, however these experiments were not performed.

      Minor corrections to the text and figures.

      Methods

      The reader might benefit from a little more detail regarding the analysis of calcium signals. Notably, what was the duration of the calcium recordings? Was it constant across the different conditions tested in the study? Was it different in slice experiments versus in vivo experiments? What were the durations of the pre- and post- soma recordings and their variability? Was the calcium activity normalized for each astrocyte or animal? If not, why not consider normalizing the post-stimulation activity with pre-stimulation baseline activity?

      Similarly, some information on the stimulation protocol seems to be lacking: what was the frequency and intensity of the stimulus in the experiments where stimulus duration varied? Concurrently, what were the duration and intensity when frequency varied? What were the duration and frequency when the intensity varied?

      It might be beneficial to add further information on the algorithm of the Calsee software. What is it performing? How was it tested? Why is it referred to as "semi"-automatic, i.e. what might the user be needing to do manually? The segmentation seems to be omitting some branches connecting distal ROIs to the soma (see e.g. Fig S1.E). How would this influence the analysis and results?

      Results

      - Some assessments in the manuscript seem a bit too assertive/general compared to what can be deduced from the evidence presented in the figures. It could be beneficial to the reader to rephrase the latter. Some examples are listed below:

      - "These results indicate that astrocyte responses occurred initially in the arborizations, which is consistent with the idea that synapses are likely to be accessed at the astrocyte arborization ", l.11-12 page 7. The fact that the time to peak is lower in the arborization does not necessarily mean that signals initiate there. It could be because the kinetics/pathways in those compartments are different or there could be a dilution effect in the soma. Indeed, an influx of the same amount of calcium ions in the soma vs in a small domain will not correspond to the same DF/F0 in those compartments and might thus remain undetected in the soma.

      - "Using transgenic IP3R2-/- mice, we found that the activation of type-2 IP3 receptors is necessary for the generation of astrocyte calcium surge" (page 4, line 1-2), "present data further demonstrate that IP3R2 are necessary for the propagation of astrocyte calcium surge." (l. 18-19 page 13) -> As discussed above, the evidence does not seem to be strong enough to assert that IP3R2 is necessary to trigger somatic events. The results indicate that the IP3R2 pathway seems to facilitate the emergence of somatic events. As astrocytes differ strongly in terms of morphology and expression profiles depending on physiological conditions, the conclusions of this study might only apply to the specific experimental conditions used: region studied, age of the animal, type of sensory stimuli performed, and so on.

      - "These results indicate that spatial threshold of the astrocyte calcium surge has a functional impact on gliotransmission, which have important consequences on the spatial extension of the astrocyte-neuron communication and synaptic regulation", l.41-48 page 11. Figure 6 seems to indicate a correlation between the proportion of astrocyte domains activated and the frequency of SICs. The data seems insufficient to conclude that there is a causal relationship between calcium surge in the astrocyte and gliotransmission or SIC frequency.

      -" These results indicate that, on average, subcellular calcium events located in astrocyte arborizations are related to soma activation.", page 6 l 15-16. It may be more informative to specify the correlation measured: i.e the larger the arborization activity, the larger the percentage of active somas.

      Figures

      Figure 2: Adding more details in the figure legend explaining how the different parameters are calculated might be useful to the reader. Notably, what does soma active (%) refer to?

      Figure 3: Could it be possible to add individual traces of calcium activity in the soma and arborization of individual cells to provide a glimpse of the variability of the signals measured?

      Fig4. B-C: Could it be possible to add in the legend information on the timeline between stimulation and calcium signal recording? (and the duration of the latter).

      Fig4 D-E: Why is the maximum number of active domains in panel D ~50-60% but goes up to ~100% in panel E? Could it be that plotting SEM rather than STD might misrepresent the variability in the percentage of active domains for each stimulus property?

      Fig4F: It seems that the threshold changes with the frequency of the stimulus: e.g. at 10 Hz, the threshold seems larger than 22.6%. What would that mean?

      Fig4G: - Why do some data points display a soma amplitude < 0 DF/F0 ?

      - Why choose a sigmoid fit? What are the statistics associated to the fit? Is it in accordance with the threshold of 23%? Would a linear fit provide a good fit?

      Fig5F: - It seems that a few IP3R2-/- astrocytes displayed somatic events? If so, it might be interesting to mention this in the discussion section and to speculate on why that might be. - It seems that panel 5F displays the average percentage of somas that got activated rather than the probability of somatic events.

      - Is it possible that the effect seen in domains vs arborization is due to statistical effects (as n=2450 vs 112)?

      Fig S1: Panel D legend: double labeling of the radius used for each plot might be useful, notably for colorblind readers as the colors might be hard to see.

      Discussion

      - The discussion section might benefit from a discussion on the similitude between the data presented here and previous reports that reported similar results, i.e that most calcium signals in astrocytes were located in the distal processes, forming microdomains that rarely propagated to the soma. These include Bindocci et al 2017 Science (DOI:10.1126/science.aai8185) and Georgiou et al, Science Advances, 2022 (DOI: 10.1126/sciadv.abe5371).

      Thank you for the suggestions. We have now changed portions of the Methods, Results  and Discussion sections.

      Reviewer #3 (Recommendations For The Authors):

      The text could potentially be improved somewhat.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      There is a long-standing idea that choices influence evaluation: options we choose are re-evaluated to be better than they were before the choice. There has been some debate about this finding, and the authors developed several novel methods for detecting these re-evaluations in task designs where options are repeatedly presented against several alternatives. Using these novel methods the authors clearly demonstrate this re-evaluation phenomenon in several existing datasets.

      Strengths:

      The paper is well-written and the figures are clear. The authors provided evidence for the behaviour effect using several techniques and generated surrogate data (where the ground truth is known) to demonstrate the robustness of their methods.

      Weaknesses:

      The description of the results of the fMRI analysis in the text is not complete: weakening the claim that their re-evaluation algorithm better reveals neural valuation processes.

      We appreciate the reviewer’s comment regarding the incomplete account of the fMRI results. In response, we implemented Reviewer #2's suggestion to run additional GLM models for a clearer interpretation of our findings. We also took this opportunity to apply updated preprocessing to the fMRI data and revise the GLM models, making them both simpler and more comprehensive. The results section is thus substantially revised, now including a new main figure and several supplemental figures that more clearly present our fMRI findings. Additionally, we have uploaded the statistical maps to NeuroVault, allowing readers to explore the full maps interactively rather than relying solely on the static images in the paper. The new analyses strengthen our original conclusion: dynamic values (previously referred to as revalued values, following the reviewer’s suggestion) better explain BOLD activity in the ventromedial prefrontal cortex, a region consistently associated with valuation, than static values (values reported prior to the choice phase in the auction procedure).

      Reviewer #2 (Public Review):

      Summary:

      Zylberberg and colleagues show that food choice outcomes and BOLD signal in the vmPFC are better explained by algorithms that update subjective values during the sequence of choices compared to algorithms based on static values acquired before the decision phase. This study presents a valuable means of reducing the apparent stochasticity of choices in common laboratory experiment designs. The evidence supporting the claims of the authors is solid, although currently limited to choices between food items because no other goods were examined. The work will be of interest to researchers examining decision-making across various social and biological sciences.

      Strengths:

      The paper analyses multiple food choice datasets to check the robustness of its findings in that domain.

      The paper presents simulations and robustness checks to back up its core claims.

      Weaknesses:

      To avoid potential misunderstandings of their work, I think it would be useful for the authors to clarify their statements and implications regarding the utility of item ratings/bids (e-values) in explaining choice behavior. Currently, the paper emphasizes that e-values have limited power to predict choices without explicitly stating the likely reason for this limitation given its own results or pointing out that this limitation is not unique to e-values and would apply to choice outcomes or any other preference elicitation measure too. The core of the paper rests on the argument that the subjective values of the food items are not stored as a relatively constant value, but instead are constructed at the time of choice based on the individual's current state. That is, a food's subjective value is a dynamic creation, and any measure of subjective value will become less accurate with time or new inputs (see Figure 3 regarding choice outcomes, for example). The e-values will change with time, choice deliberation, or other experiences to reflect the change in subjective value. Indeed, most previous studies of choice-induced preference change, including those cited in this manuscript, use multiple elicitations of e-values to detect these changes. It is important to clearly state that this paper provides no data on whether e-values are more or less limited than any other measure of eliciting subjective value. Rather, the paper shows that a static estimate of a food's subjective value at a single point in time has limited power to predict future choices. Thus, a more accurate label for the e-values would be static values because stationarity is the key assumption rather than the means by which the values are elicited or inferred.

      Thank you for this helpful comment. We changed the terminology following the reviewer’s suggestion. The “explicit” values (e-values or ve) are now called “static” values (s-values or vs). Accordingly, we also changed the “Reval” values (r-values or vr) to “dynamic” values (d-values or vd).

      We also address the reviewer's more general point about the utility of item ratings/bids (s-values) and whether our results are likely to hold with other ways of eliciting subjective values. We added a new sub-section in Discussion addressing this and other limitations of our study. To address the reviewer’s point, we write:

      “One limitation of our study is that we only examined tasks in which static values were elicited from explicit reports of the value of food items. It remains to be determined if other ways of eliciting subjective values (e.g., Jensen and Miller, 2010) would lead to similar results. We think so, as the analysis of trials with identical item pairs (Fig. 3) and the difference between forward and backward Reval (Fig. 7) are inconsistent with the notion that values are static, regardless of their precise value. It also remains to be determined if our results will generalize to non-food items whose value is less sensitive to satiety and other dynamic bodily states. Perceptual decisions also exhibit sequential dependencies, and it remains to be explored whether these can be explained as a process of value construction, similar to what we propose here for the food-choice task (Gupta et al., 2024; Cho et al., 2002; Zylberberg et al., 2018; Abrahamyan et al., 2016).”

      There is a puzzling discrepancy between the fits of a DDM using e-values in Figure 1 versus Figure 5. In Figure 1, the DDM using e-values provides a rather good fit to the empirical data, while in Figure 5 its match to the same empirical data appears to be substantially worse. I suspect that this is because the value difference on the x-axis in Figure 1 is based on the e-values, while in Figure 5 it is based on the r-values from the Reval algorithm. However, the computation of the value difference measure on the two x-axes is not explicitly described in the figures or methods section and these details should be added to the manuscript. If my guess is correct, then I think it is misleading to plot the DDM fit to e-values against choice and RT curves derived from r-values. Comparing Figures 1 and 5, it seems that changing the axes creates an artificial impression that the DDM using e-values is much worse than the one fit using r-values.

      We agree with the reviewer that this way of presenting the DDM fits could be misleading. In the previous version of the manuscript, we included the two fits in the same figure panel to make it clear that the sensitivity (slope) of the choice function is greater when we fit the data using the r-values (now d-values) than when we fit them using the e-values (now s-values). In the revised version of Figure 5, we include the data points already shown in Figure 1, so that each DDM fit is shown with their corresponding data points. Thus we avoid giving the false impression that the DDM model fit using the s-values is much worse than the one fit using the d-values. This said, the fit is indeed worse, as we now show with the formal model comparison suggested by the reviewer (next comment).

      Relatedly, do model comparison metrics favor a DDM using r-values over one using e-values in any of the datasets tested? Such tests, which use the full distribution of response times without dividing the continuum of decision difficulty into arbitrary hard and easy bins, would be more convincing than the tests of RT differences between the categorical divisions of hard versus easy.

      We now include the model comparison suggested by the reviewer. The comparison shows that the DDM model using dynamic values explains the choice and response time data better than one using static values. One potential caveat of this comparison, which explains why we did not include it in the original version of the manuscript, is that the d-values are obtained from a fit to the choice data, which could bias the subsequent DDM comparison. We control for this in three ways: (1) by calculating the difference in Bayesian Information Criterion (BIC) between the models, penalizing the DDM model that uses the d-values for the additional parameter (δ); (2) by comparing the difference in BIC against simulations of a model in which the choice and RT data were obtained assuming static values; this analysis shows that if values were static, the DDM using static values would be favored in the comparison despite having one fewer parameter; (3) ignoring the DDM fit to the choices in the model comparison, and just comparing how well the two models explain the RTs; this comparison is unbiased because the δ values are fit only to the choice data, not the RTs. These analyses are now included in Figure 5 and Figure 5–Figure supplement 2.

      Revaluation and reduction in the imprecision of subjective value representations during (or after) a choice are not mutually exclusive. The fact that applying Reval in the forward trial order leads to lower deviance than applying it in the backwards order (Figure 7) suggests that revaluation does occur. It doesn't tell us if there is also a reduction in imprecision. A comparison of backwards Reval versus no Reval would indicate whether there is a reduction in imprecision in addition to revaluation. Model comparison metrics and plots of the deviance from the logistic regression fit using e-values against backward and forward Reval models would be useful to show the relative improvement for both forms of Reval.

      We agree with the reviewer that the occurrence of revaluation does not preclude other factors from affecting valuation. Following the reviewer’s suggestion we added a panel to Figure 6 (new panel B), in which we show the change in the deviance from the logistic regression fits between Reval (forward direction) and no-Reval. The figure clearly shows that the difference in deviance for the data is much larger than that obtained from simulations of choice data generated from the logistic fits to the static values (shown in red).

      Interestingly, we also observe that the deviance obtained after applying Reval in the backward direction is lower than that obtained using the s-values. We added a panel to figure 7 showing this (Fig. 7B). This observation, however, does not imply that there are factors affecting valuation besides revaluation (e.g.,”reduction in imprecision”). Indeed, as we now show in a new panel in Figure 11 (panel F), the same effect (lower deviance for backward Reval than no-Reval) is observed in simulations of the ceDDM.

      Besides the new figure panels (Fig. 6B, 7B, 11F), we mention in Discussion (new subsection, “Limitations...”, paragraph #2) the possibility that there are other non-dynamic contributions to the reduction in deviance for Backward Reval compared to no-Reval:

      “Another limitation of our study is that, in one of the datasets we analyzed (Sepulveda et al. 2020), applying Reval in the forward direction was no better than applying it in the backward direction (Fig. 10). We speculate that this failure is related to idiosyncrasies of the experimental design, in particular, the use of alternating blocks of trials with different instructions (select preferred vs. select non-preferred). More importantly, Reval applied in the backward direction led to a significant reduction in deviance relative to that obtained using the static values. This reduction was also observed in the ceDDM, suggesting that the effect may be explained by the changes in valuation during deliberation. However, we cannot discard a contribution from other, non-dynamic changes in valuation between the rating and choice phase including contextual effects (Lichtenstein and Slovic, 2006), stochastic variability in explicit value reporting (Polania et al., 2019), and the limited range of numerical scales used to report value.”

      Did the analyses of BOLD activity shown in Figure 9 orthogonalize between the various e-valueand r-value-based regressors? I assume they were not because the idea was to let the two types of regressors compete for variance, but orthogonalization is common in fMRI analyses so it would be good to clarify that this was not used in this case. Assuming no orthogonalization, the unique variance for the r-value of the chosen option in a model that also includes the e-value of the chosen option is the delta term that distinguishes the r and e-values. The delta term is a scaled count of how often the food item was chosen and rejected in previous trials. It would be useful to know if the vmPFC BOLD activity correlates directly with this count or the entire r-value (e-value + delta). That is easily tested using two additional models that include only the r-value or only the delta term for each trial.

      We did not orthogonalize the static value and dynamic value regressors. We have included this detail in the revised methods. We thank the reviewer for the suggestion to run additional models to improve our ability to interpret our findings. We have substantially revised all fMRI-related sections of the paper. We took this opportunity to apply standardized and reproducible preprocessing steps implemented in fmriprep, present whole-brain corrected maps on a reconstructed surface of a template brain, and include links to the full statistical maps for the reader to navigate the full map, rather than rely on the static image in the figures. We implemented four models in total: model 1 includes both static value (Vs) obtained during the auction procedure prior to the choice phase and dynamic value (Vd) output by the revaluation algorithm (similar to the model presented in the first submission); model 2 includes only delta = Vd - Vs; model 3 includes only Vs; model 4 includes only Vd. All models included the same confound and nuisance regressors. We found that Vd was positively related to BOLD in vmPFC when accounting for Vs, correcting for familywise error rate at the whole brain level. Interestingly, the relationship between delta and vmPFC BOLD did not survive whole-brain correction and the effect size of the relationship between Vd and vmPFC bold in model 4 was larger than the effect size of the relationship between Vs and vmPFC bold in model 3 and survived correction at the whole brain level encompassing more of the vmPFC. Together, these findings bolster our claim that Vd better accounts for BOLD variability in vmPFC, a brain region reliably linked to valuation.

      Please confirm that the correlation coefficients shown in Figure 11 B are autocorrelations in the MCMC chains at various lags. If this interpretation is incorrect, please give more detail on how these coefficients were computed and what they represent.

      We added a paragraph in Methods explaining how we compute the correlations in Figure 11B (last paragraph of the sub-section “Correlated-evidence DDM” in Methods):

      “The correlations in Fig. 11B were generated using the best-fitting parameters for each participant to simulate 100,000 Markov chains. We generate Markov chain samples independently for the left and right items over a 1-second period. To illustrate noise correlations, the simulations assume that the static value of both the left and right items is zero. We then and for each of the Markov chains (𝑥). Pearson's𝑥 correlation is computed between these 𝑡 calculate the difference in dynamic value ( ) between the left and right items at each time (𝑡) differences at time zero, 𝑥𝑖(𝑡 = 0), and at time 𝑥𝑖(𝑡 = τ), for different time lags τ. Correlations were calculated independently for each participant. Each trace in Fig. 11B represents a different participant.”

      The paper presents the ceDDM as a proof-of-principle type model that can reproduce certain features of the empirical data. There are other plausible modifications to bounded evidence accumulation (BEA) models that may also reproduce these features as well or better than the ceDDM. For example, a DDM in which the starting point bias is a function of how often the two items were chosen or rejected in previous trials. My point is not that I think other BEA models would be better than the ceDDM, but rather that we don't know because the tests have not been run. Naturally, no paper can test all potential models and I am not suggesting that this paper should compare the ceDDM to other BEA processes. However, it should clearly state what we can and cannot conclude from the results it presents.

      Indeed, the ceDDM should be interpreted as a proof-of-principle model, which shows that drifting values can explain many of our results. It is definitely wrong in the details, and we are open to the possibility that a different way of introducing sequential dependencies between decisions may lead to a better match to the experimental data. We now mention this in a new subsection of Discussion, “Limitations...” paragraph #3:

      “Finally, we emphasize that the ceDDM should be interpreted as a proof-of-principle model used to illustrate how stochastic fluctuations in item desirability can explain many of our results. We chose to model value changes following an MCMC process. However, other stochastic processes or other ways of introducing sequential dependencies (e.g., variability in the starting point of evidence accumulation) may also explain the behavioral observations. Furthermore, there likely are other ways to induce changes in the value of items other than through past decisions. For example, attentional manipulations or other experiences (e.g., actual food consumption) may change one's preference for an item. The current version of the ceDDM does not allow for these influences on value, but we see no fundamental limitation to incorporating them in future instantiations of the model.”

      This work has important practical implications for many studies in the decision sciences that seek to understand how various factors influence choice outcomes. By better accounting for the context-specific nature of value construction, studies can gain more precise estimates of the effects of treatments of interest on decision processes.

      Thank you!

      That said, there are limitations to the generalizability of these findings that should be noted.

      These limitations stem from the fact that the paper only analyzes choices between food items and the outcomes of the choices are not realized until the end of the study (i.e., participants do not eat the chosen item before making the next choice). This creates at least two important limitations. First, preferences over food items may be particularly sensitive to mindsets/bodily states. We don't yet know how large the choice deltas may be for other types of goods whose value is less sensitive to satiety and other dynamic bodily states. Second, the somewhat artificial situation of making numerous choices between different pairs of items without receiving or consuming anything may eliminate potential decreases in the preference for the chosen item that would occur in the wild outside the lab setting. It seems quite probable that in many real-world decisions, the value of a chosen good is reduced in future choices because the individual does not need or want multiples of that item. Naturally, this depends on the durability of the good and the time between choices. A decrease in the value of chosen goods is still an example of dynamic value construction, but I don't see how such a decrease could be produced by the ceDDM.

      These are all great points. The question of how generalizable our results are to other domains is wide open. We do have preliminary evidence suggesting that in a perceptual decision-making task with two relevant dimensions (motion and color; Kang, Loffler et al. eLife 2021), the dimension that was most informative to resolve preference in the past is prioritized in future decisions. We believe that a similar process underlies the apparent change in value in value-based decisions. We decided not to include this experiment in the manuscript, as it would make the paper much longer and the experimental designs are very different. Exploring the question of generality is a matter for future studies.

      We also agree that food consumption is likely to change the value of the items. For example, after eating something salty we are likely to want something to drink. We mention in the revised manuscript that time, choice deliberation, attentional allocation and other experiences (including food consumption) are likely to change the value of the alternatives and thus affect future choices and valuations.

      The ceDDM captures only sequential dependencies that can be attributed to values that undergo diffusion-type changes during deliberation. While the ceDDM captures many of the experimental observations, the value of an item may change for reasons not captured by the ceDDM. For example, food consumption is likely to change the value of items (e.g., wanting something to drink after eating something salty). The reviewer is correct that the current version of ceDDM could not account for these changes in value. However, we see no fundamental limitation to extending the ceDDM to account for them.

      We discuss these issues in a new subsection in Discussion (“Limitations...” paragraph #3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Summary

      The authors address assumptions of bounded accumulation of evidence for value-based decision-making. They provide convincing evidence that subjects drift in their subjective preferences across time and demonstrate valuable methods to detect these drifts in certain task designs.

      My specific comments are intended to assist the authors with making the paper as clear as possible. My only major concern is with the reporting of the fMRI results.

      Thank you, please see our responses above for a description of the changes we made to the fMRI analyses.

      Specific comments

      - In the intro, I would ask the authors to consider the idea that things like slow drift in vigilance/motivation or faster drifts in spatial attention could also generate serial dependencies in perceptual tasks. I think the argument that these effects are larger in value-based tasks is reasonable, but the authors go a bit too far (in my opinion) arguing that similar effects do not exist *at all* in perceptual decision-making.

      We added a sentence in the Discussion (new section on Limitations, paragraph #1) mentioning some of the literature on sequential dependencies in perceptual tasks and asking whether there might be a common explanation for such dependencies for perceptual and value-based decisions. We tried including this in the Introduction, but we thought it disrupted the flow too much.

      - Figure 1: would it not be more clear to swap the order of panels A and B? Since B comes first in the task?

      We agree, we swapped the order of panels A and B.

      - Figure 2: the label 'simulations' might be better as 'e-value simulations'

      Yes, we changed the label ‘simulations’ to ‘simulations with s-values’ (we changed the term explicit value to static value, following a suggestion by Reviewer #2).

      - For the results related to Figure 2, some citations related to gaps between "stated versus revealed preferences" seem appropriate.

      We added a few relevant citations where we explain the results related to Figure 2.

      - Figure 3: in addition to a decrease in match preferences over the session, it would be nice to look at other features of the task which might have varied over the session. e.g. were earlier trials more likely to be predicted by e-value?

      We do see a trend in this direction, but the effect is not significant. The following figure shows the consistency of the choices with the stated values, as a function of the |∆value|, for the first half (blue) and the second half (red) of the trials. The x-axis discretizes the absolute value of the difference in static value between the left and right items, binned in 17 bins of approximately equal number of trials.

      Author response image 1.

      The slope is shallower for the second half, but a logistic regression model revealed that the difference is not significant:

      ,

      where Ilate is an indicator variable that takes a value of 1 for the second half of the trials and zero otherwise.

      As expected from the figure β2 was negative (-0.15) but the effect was not significant (p-value =0.32, likelihood ratio test).

      We feel we do not have much to say about this result, which may be due to lack of statistical power, so we would rather not include this analysis in the revised manuscript.

      It is worth noting that if we repeat the analysis using the dynamic values obtained from Reval instead of the static values, the consistency is overall much greater and little difference is observed between the first and second halves of the experiment:

      Author response image 2.

      - The e-value DDM fit in Figure 1C/D goes through the points pretty well, but the e-value fits in 5A do not because of a mismatch with the axis. The x-axis needs to say whether the value difference is the e-value or the r-value. Also, it seems only fair to plot the DDM for the r-value on a plot with the x-axis being the e-value.

      Thank you for this comment, we have now changed Figure 5A, such that both sets of data points are shown (data grouped by both e-values and by r-values). We agree that the previous version made it seem as if the fits were worse for the DDM fit to the e-values. The fits are indeed worse, as revealed by a new DDM model comparison (Figure 5–Figure supplement 2), but the effect is more subtle than the previous version of the figure implied.

      - How is Figure 5B "model free" empirical support? The fact that the r-value model gives better separation of the RTs on easy and hard trials doesn't seem "model-free" and also it isn't clear how this directly relates to being a better model. It seems that just showing a box-plot of the R2 for the RT of the two models would be better?

      We agree that “model free” may not be the best expression, since the r-values (now d-values) are derived from a model (Reval). Our intention was to make clear that because Reval only depends on the choices, the relationship between RT and ∆vdynamic is a prediction. We no longer use the term, model free, in the caption. We tried to clarify the point in Results, where we explain this figure panel. We have also included a new model comparison (Figure 5–Figure supplement 2), showing that the DDM model fit to the d-values explains choice and RT better than one fit to the s-values.

      This said, we do consider the separation in RTs between easy and hard trials to be a valid metric to compare the accuracy of the static and dynamic values. The key assumption is that there is a monotonically decreasing relationship between value difference, ∆v, and response time. The monotonic relationship does not need to hold for individual trials (due to the noisiness of the RTs) but should hold if one were to average a large enough number of trials for each value of ∆v.

      Under this assumption, the more truthful a value representation is (i.e., the closer the value we infer is to the true subjective value of the item on a given trial, assuming one exists), the greater the difference in RTs between trials judged to be difficult and those considered easy. To illustrate this with an extreme case, if an experimenter’s valuation of the items is very inaccurate (e.g., done randomly), then on average there will be no difference between easy and difficult RTs as determined by this scoring.

      - Line 189: Are the stats associated with Eq 7, was the model fit subject by subject? Combining subjects? A mixed-effects model? Why not show a scatter plot of the coefficients of Δvₑ and Δvᵣ (1 point/subject).

      The model was not fit separately for each subject. Instead, we concatenated trials from all subjects, allowing each subject to have a different bias term (β0,i ).

      We have now replaced it with the analysis suggested by the reviewer. We fit the logistic regression model independently for each participant. The scatter plot suggested by the reviewer is shown in Figure 5–Figure supplement 1. Error bars indicate the s.e. of the regression coefficients:

      It can be seen that the result is consistent with what we reported before: βd is significantly positive for all participants, while βs is not.

      - I think Figure S1 should be a main figure.

      Thank you for this suggestion, we have now included the former Figure S1 as an additional panel in Figure 5.

      - Fig 9 figure and text (line 259) don't exactly match. In the text it says that the BOLD correlated with vᵣ and not vₑ, but the caption says there were correlations with vᵣ after controlling for vₑ. Is there really nothing in the brain that correlated with vₑ? This seems hard to believe given how correlated the two estimates are. In the methods, 8 regressors are described. A more detailed description of the results is needed.

      Thank you for pointing out the inconsistency in our portrayal of the results in the main text and in the figure caption. We have substantially revised all fMRI methods, re-ran fMRI data preprocessing and implemented new, simpler, and more comprehensive GLM models following Reviewer #2's suggestion. Consequently, we have replaced Figure 9, added Figure 9 — Figure Supplement 1, and uploaded all maps to NeuroVault. These new models and maps allow for a clearer interpretation of our findings. More details about the fMRI analyses in the methods and results are included in the revision. We took care to use similar language in the main text and in the figure captions to convey the results and interpretation. The new analyses strengthen our original conclusion: dynamic values better explain BOLD activity in the ventromedial prefrontal cortex, a region consistently associated with valuation, than static values.

      - It's great that the authors reanalyzed existing datasets (fig 10). I think the ΔRT plots are the least clear way to show that _reval_ is better. Why not a figure like Figure 6a and Figure 7 for the existing datasets?

      We agree with the reviewer. We have replaced Fig. 10 with a more detailed version. For each dataset, we show the ΔRT plots, but we also show figures equivalent to Fig. 6a, Fig. 7a, and the new Fig. 6b (Deviance with and without Reval).

      Reviewer #2 (Recommendations For The Authors):

      I assume that the data and analysis code will be made publicly and openly available once the version of record is established.

      Yes, the data and analysis code is now available at: https://github.com/arielzylberberg/Reval_eLife_2024

      We added a Data Availability statement to the manuscript.

    2. eLife Assessment

      This important study addresses key assumptions underlying current models of the formation of value-based decisions. The authors provide convincing evidence that the subjective values human participants assign to items change across sequences of multiple decisions. They establish methods to detect these changes in frequently used behavioral task designs.

    3. Reviewer #1 (Public review):

      Summary:

      There is a long-standing idea that choices influence evaluation: options we choose are re-evaluated to be better than they were before the choice. There has been some debate about this finding, and the authors developed several novel methods for detecting these re-evaluations in task designs where options are repeatedly presented against several alternatives. Using these novel methods the authors clearly demonstrate this re-evaluation phenomenon in several existing datasets and show that estimations of dynamic valuation correlate with neural activity in prefrontal cortex.

      Strengths:

      The paper is well-written and figures are clear. The authors provided evidence for the behaviour effect using several techniques and generated surrogate data (where the ground truth is known) to demonstrate the robustness of their methods. The author avoid over-selling the work, with a lucid description of limitations, and potential for further exploration of the work, in the discussion.

      Comments on revisions:

      The authors did a good job responding to the comments.

    4. Reviewer #2 (Public review):

      Zylberberg and colleagues show that food choice outcomes and BOLD signal in the vmPFC are better explained by algorithms that update subjective values during the sequence of choices compared to algorithms based on static values acquired before the decision phase. This study presents a valuable means of reducing the apparent stochasticity of choices in common laboratory experiment designs. The evidence supporting the claims of the authors is solid, although currently limited to choices between food items because no other goods were examined. The work will be of interest to researchers examining decision making across various social and biological sciences.

      Comments on revisions:

      We thank the authors for carefully addressing our concerns about the first version of the manuscript. The manuscript text and contributions are now much more clear and convincing.

    1. Reviewer #1 (Public review):

      Summary:

      The authors investigated if/how distractor suppression derived from statistical learning may be implemented in early visual cortex. While in a scanner, participants conducted a standard additional singleton task in which one location more frequently contained a salient distractor. The results showed that activity in EVC was suppressed for the location of the salient distractor as well as for neighbouring neutral locations. This suppression was not stimulus specific - meaning it occurred equally for distractors, targets and neutral items - and it was even present in trials in which the search display was omitted. Generally, the paper was clear, the experiment was well-designed, and the data are interesting. Nevertheless, I do have several concerns mostly regarding the interpretation of the results.

      (1) My biggest concern with the study is regarding the interpretation of some of the results. Specifically, regarding the dynamics of the suppression. I appreciate that there are some limitations with what you might be able to say here given the method but I do feel as if you have committed to a single interpretation where others might still be at play. Below I've listed a few alternatives to consider.

      (a) Sustained Suppression. I was wondering if there is anything in your results that would speak for or against the suppression being task specific. That is, is it possible that people are just suppressing the HPDL throughout the entire experiment (i.e., also through ITI, breaks, etc., rather than just before and during the search). Since the suppression does not seem volitional, I wonder if participants might apply a blanket suppression to HPDL until they learn otherwise. Since your localiser comes after the task you might be able to see hints of sustained suppression in the HPDL during these trials.

      (b) Enhancement followed by suppression. Another alternative that wasn't discussed would be an initial transient enhancement of the HPDL which might be brought on by the placeholders followed by more sustained suppression through the search task. Of course, on the whole this would look like suppression, but this still seems like it would hold different implications compared to simply "proactive suppression". This would be something like search and destroy however could be on the location level before the actual onset of the search display.

      (2) I was also considering whether your effects might be at least partially attributable to priming type effects. This would be on the spatial (not feature) level as it is clear that the distractors are switching colours. Basically, is it possible that on trial n participants see the HPDL with the distractor in it and then on trial n+1 they suppress that location. This would be something distinct from the statistical learning framework and from the repetition suppression discussion you have already included. To test for this, you could look at the trials that follow omission or trials. If there is no suppression or less suppression on these trials it would seem fair to conclude that the suppression is at least in part due to the previous trial.

    2. eLife Assessment

      This well-written report uses functional neuroimaging in human observers to provide convincing evidence that activity in the early visual cortex is suppressed at locations that are frequently occupied by a task-irrelevant but salient item. This suppression appears to be general to any kind of stimulus, and also occurs in advance of any item actually appearing. The work in its present form will be valuable to those examining attention, perception, learning and prediction, but with a few additional analyses could more informatively rule out potential alternative hypotheses. Further discussion of the mechanistic implications could clarify further the broad extent of its significance.

    3. Reviewer #2 (Public review):

      The authors of this work set out to test ideas about how observers learn to ignore irrelevant visual information. Specifically, they used fMRI to scan participants who performed a visual search task. The task was designed in such a way that highly salient but irrelevant search items were more likely to appear at a given spatial location. With a region-of-interest approach, the authors found that activity in visual cortex that selectively responds to that location was generally suppressed, in response to all stimuli (search targets, salient distractors, or neutral items), as well as in the absence of an anticipated stimulus.

      Strengths of the study include: A well-written and well-argued manuscript; clever application of a region of interest approach to fMRI design, which allows articulating clear tests of different hypotheses; careful application of follow-up analyses to rule out alternative, strategy-based accounts of the findings; tests of the robustness of the findings to detailed analysis parameters such as ROI size; and exclusion of the role of regional baseline differences in BOLD responses.

      The report might be enhanced by analyses (perhaps in a surface space) that distinguish amongst the multiple "early" retinotopic visual areas that are analysed in the aggregate here. Furthermore, the study could benefit from an analysis that tests the correlation over observers between the magnitude of their behavioural effects and their neural responses.

      The study provides an advance over previous studies, which identified enhancement or suppression in visual cortex as a function of search target/distractor predictability, but in less spatially-specific way. It also speaks to open questions about whether such suppression/enhancement is observed only in response to the arrival of visual information, or instead is preparatory, favouring the latter view. The theoretical advance is moderate, in that it is largely congruent with previous frameworks, rather than strongly excluding an opposing view or providing a major step change in our understanding of how distractor suppression unfolds.

    4. Author response:

      We thank the editor and the reviewers for the positive evaluation of our manuscript and the thoughtful comments. Below we provide a provisional reply to the reviewers’ comments, which we will address in more detail in the revised manuscript.

      Reviewer 1 highlights three important alternative interpretations of our results: (1) sustained suppression, (2) enhancement followed by suppression, and (3) priming. We believe that these alternatives need to be addressed to improve the conclusions we can draw from the available data.

      (1) Sustained suppression: As outlined by R1, it is possible that participants suppressed the HPDL throughout the entire experiment, instead of proactively instantiating suppression on each trial. While possible, we believe that this account is unlikely to explain the present results, given the utilized analysis approach, a voxel-wise GLM fit to the BOLD data per run (see Materials and Methods for details). Specifically, we derived parameter estimates from this GLM per location to estimate the relative suppression. Sustained suppression would modulate BOLD responses throughout the run, i.e. also during the implicit baseline period used to estimate the contrast parameter estimates. Hence, a sustained suppression should not result in a differential modulation between locations, as the BOLD response at the HPDL during the baseline period would be equally suppressed as during the trial. We will discuss this important aspect in the revised manuscript.

      (2) Enhancement followed by suppression: R1 correctly points out that BOLD data, given the poor temporal resolution, do not allow for the detection of potential transient enhancements at the HPDL followed by a later and more pronounced suppression (akin to “search and destroy”). We agree with this assessment. However, we would also argue that a transient enhancement followed by sustained suppression before search onset constitutes proactive suppression in line with our interpretation, because suppression would still arise proactively (i.e., before search and hence distractor onset). Whether brief enhancement precedes suppression cannot be elucidated by our data, but we believe that it constitutes an interesting avenue for future studies using time-resolved and spatially specific recording methods. We will address this important addition in the updated manuscript.

      (3) Priming: It is possible that participants particularly suppress locations which on previous trials contained a distractor. This account constitutes a different perspective than statistical learning integrating across many trials. We believe that it is likely that both accounts contribute to the observed effect to some degree, as both the distant (but often repeated) and the most recent past should inform our priors. Indeed, arguably recent trials should be particularly informative for our predictions as natural environments vary across time, and hence the statistical learning system should remain sensitive to potential changes in the environment. In short, we agree with R1 that the n-1 trial may impact suppression, and therefore charting the potential contributions of this type of priming compared to statistical learning is a relevant addition to the manuscript. We will perform the suggested analysis; however, we also note that dividing trials based on the n-1 trial will significantly reduce the reliability of the parameter estimates (e.g. only ~1/3 of trials follow omissions).

      Reviewer 2 had two valuable suggestions to advance the inferences we can draw from the available data. In particular, R2 proposed two additional analyses, which we will consider during revision.

      First, R2 suggests separating the utilized early visual cortex (EVC) ROI mask into the three retinotopic areas comprising EVC (V1, V2, V3) and to perform the key analyses in surface space for each ROI separately. We agree that exploring distractor suppression across V1, V2 and V3 separately is an interesting extension to our results. Our reasoning to combine early visual areas into one mask was two-fold: First, we did not have an a priori reason to expected distinct neural suppression between these early ROIs. Therefore, we did not acquire retinotopy data to reliably separate V1, V2 and V3, instead opting to increase the number of search task trials. The lack of retinotopy data naturally limits the reliability of the resulting cortical segmentation. However, we believe that separating EVC into its constituent areas using anatomical data is nonetheless a promising addition to our primary analyses. Therefore, during revision we will explore the main suppression analyses split into V1, V2, and V3.

      Second, R2 highlights that behavioral facilitation and neural suppression could be correlated across participants. The rationale is that should neural suppression in EVC relate to the facilitation of behavioral responses, we may expect a positive relationship between neural suppression at the HPDL and RTs across participants. We agree with R2’s suggestion and will perform the analysis accordingly. However, we note that any results should be interpreted with caution, as the present sample size of n=28 is small for an across participant correlation analysis involving neural and behavioral difference scores.

      In summary, we believe that addressing the reviewers' suggestions will substantially improve our manuscript, particularly regarding the interpretation and scope of our findings.

    1. eLife Assessment

      The study describes a useful tool for assessing microglia morphology in a variety of experimental conditions. The MorphoCellSorter provides a solid platform for ranking microglia to reflect their morphology continuum and may offer new insight into changes in morphology associated with injury or disease. While the study provides an alternative approach to existing methods for measuring microglia morphology, the functional significance of measured morphological changes remains unclear.

    2. Joint Public Review:

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem. Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models. The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a) L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      c) L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      d) L75: Is morphology truly "easy" to obtain?

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      (3) Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease).<br /> Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.<br /> In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability? Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia?

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others. The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others? On a note, Matlab is not open-access.<br /> This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      e) Parameter choices:

      L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes? Differences between circularity and roundness factors are not coming across and require further clarification. One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references. Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      L323: As before, it's not given that the first two components hold all the information.

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      h) Minor aspects:

      {section sign} % notation requires to include (weight/volume) annotation.

      {section sign} Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      {section sign} L125: The length of the single housing should be specified to ensure no variability in this context.

      {section sign} L673: Typo to the reference to the figure.

    3. Author response:

      Joint Public Review:

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      We would like to thank the reviewers for their careful reading and constructive comments and questions. While MorphoCellSorter currently does not rank cells functionally based on their morphology, its broad range of application, ease of use and capacity to handle large datasets provide a solid foundation. Combined with advances in single-cell transcriptomics, MorphoCellSorter could potentially enable the future prediction of cell functions based on morphology.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem. 

      We have access to the distance between cells through the Andrew’s score of each cell. However, the challenge is that these distances are relative values and specific to each dataset. While we believe that these distances could provide valuable information, we have not yet determined the most effective way to represent and utilize this data in a meaningful manner.

      Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models.The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      Thank you for these insightful comments. The discussion about alternative methods was already present in the discussion L586-598 but to answer the request of the reviewers, we have revised the introduction and discussion sections to more clearly address the limitations of current methods, as well as discussed the uniqueness of the pipeline. Additionally, we have reorganized Figure 1 to more effectively highlight the main caveats associated with clustering, the primary method currently in use.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a) L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      Thank you for this comment. Our use of the term "accurately" was intended to convey that the ranking was correct based on comparison with human experts, though we agree that it may have been overstated. We have removed "accurately" and propose to replace it with "properly" to better reflect the intended meaning.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      Thank you for raising this point to our attention. We removed evenly to be more inclusive on the various morphologies of microglia cells in this introductory sentence

      c) L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      Thank you for this comment, indeed we clarified to specify that we were talking about the metabolic challenge triggered by ischemia and added a reference as well.

      d) L75: Is morphology truly "easy" to obtain? 

      Yes, it is in comparison to other parameters such as transcripts or metabolism, but we understand the point made by the reviewer and we found another way of writing it.  As an alternative we propose: “morphology is an indicator accessible through…”

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      We apologize for this confusing writing, we reformulated the sentence as follows: “Artificial intelligence (AI) approaches such as machine learning have also been used to categorize morphologies (Leyh et al., 2021)”.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      We did not say that the contralateral is non-pathological but that the microglial cells have a non-pathological morphology which is slightly different. The contralateral side in ischemic experiments is classically used as a control (Rutkai et al 2022). Although It has been reported that differences in transcript levels can be found between sham operated animals and contralateral hemisphere in tMCAO mice (Filippenkov et al 2022) https://doi.org/10.3390/ijms23137308 showing that indeed the contralateral side is in a different state that sham controls, no report have been made on differences in term of morphology.

      We have removed “non-pathological” to avoid misinterpretations

      g) Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease). Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.

      As reported in the literature, we acknowledge the presence of sex differences in microglial cell morphology. Due to ethical considerations and our commitment to reducing animal use, we did not conduct dedicated experiments specifically for developing MorphoCellSorter. Instead, we relied on existing brain sections provided by collaborators, which were already prepared and included tissue from only one sex—either female or male—except in the case of newborn pups, whose sex is not easily determined. Consequently, we were unable to evaluate whether MorphoCellSorter is sensitive enough to detect morphological differences in microglia attributable to sex. Although assessing this aspect is feasible, we are uncertain if it would yield additional insights relevant to MorphoCellSorter’s design and intended applications.

      To address this, we have included additional references in Table 1 of the revised manuscript and clearly indicated the sex of the animals from which each dataset was obtained.

      c) In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability? 

      We could not spot any range in the text, we usually used 30µm thick sections in order to have entire or close to entire microglia cells.

      Although the thickness of the sections was identical for all the sections of a given dataset, only the plans containing the cells of interest were selected during the imaging for both of the ischemic stroke model. This explains why depending on how the cell is distributed in Z the range of the plans acquired vary.

      Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia? 

      We found that 30 µm sections provide an effective balance, capturing entire or nearly entire microglial cells (consistent with what we observe in vivo) while allowing sufficient antibody penetration to ensure strong signal quality, even at the section's center. In our segmentation process, we excluded microglia located near the section edges (i.e., cells with processes visible on the first or last plane of image acquisition, as well as those close to the field of view’s boundary). Although our analysis pipeline should also function with thicker sections (>30 µm), we confirmed that thinner sections (15 µm or less) are inadequate for detecting morphological differences, as tested initially on the AD model. Segmented, incomplete microglia lack the necessary structural information to accurately reflect morphological differences thus impairing the detection of existing morphological differences.

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others.

      The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others?

      The pre-processing steps depend on the quality of the images in each dataset. For example, in the AD dataset, images acquired with a wide-field microscope were considerably noisier compared to those obtained via confocal microscopy. In this case, reducing noise plane-by-plane was more effective than applying noise reduction on a Z-projection, as we would typically do for confocal images. Given that accurate segmentation is essential for reliable analysis in MorphoCellSorter, we chose to tailor the segmentation approach for each dataset individually. We recommend future users of MorphoCellSorter take a similar approach. This clarification has been added to the discussion.

      On a note, Matlab is not open-access, 

      This is correct. We are currently translating this Matlab script in Python, this will be available soon on Github. 

      https://github.com/Pascuallab/MorphCellSorter.

      This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      Because of what we have been explaining earlier, having a common segmentation process for very diverse types of acquisitions (magnification, resolution and type of images) is not optimal in terms of segmentation and accuracy in the analysis. Although we could feed MorphoCellSorter with all this data from a unique segmentation pipeline, the results might be very difficult to interprete.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      As noted earlier, segmentation is not the main focus of this paper, and we leave it to users to select the segmentation method best suited to their datasets. Although, we acknowledge that automated thresholding would be in theory ideal, we were confronted toimage acquisitions that were notuniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. We tested global and local algorithms to automatically binarize the cells but these approaches resulted often on imperfect and not optimized segmentation for every cell. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. This clarification has been added to the discussion.

      e) Parameter choices: L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      We do agree with the referee’s comment but the purpose of the k-mean we used was just to illustrate the fact that the clusters generated are artificial and do not correspond to the reality of the continuum of microglia morphology. In the course of the study we used the elbow score to determine the k means but this did not work well because no clear elbow was visible in some datasets (probably because of the continuum of microglia morphologies). Anyway, using whatever k value will not change the problem that those clusters are quite artificial and that the boundaries of those clusters are quite arbitrary whatever the way k is determined manually or mathematically.

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes?

      Thank you for raising this point. There is no specific rationale beyond our goal of being as exhaustive as possible, incorporating most of the parameters found in the literature, as well as some additional ones that we believed could provide a more thorough description of microglial morphology.

      Indeed, some of these parameters are correlated. Initially, we considered this might be problematic, but we quickly found that these correlations essentially act as factors that help assign more weight to certain parameters, reflecting their likely greater importance in a given dataset. Rather than being a limitation, the correlated parameters actually enhance the ranking. We tested removing some of these parameters in earlier versions of MorphoCellSorter, and found that doing so reduced the accuracy of the tool.

      Differences between circularity and roundness factors are not coming across and require further clarification. 

      These are two distinct ways of characterizing morphological complexity, and we borrowed these parameters and kept the name from the existing literature, not necessarily in the context of microglia. In our case, these parameters are used to describe the overall shape of the cell. The advantage of using different metrics to calculate similar parameters is that, depending on the dataset, one method may be better suited to capture specific morphological features of a given dataset. MorphoCellSorter selects the parameter that best explains the greatest dispersion in the data, allowing for a more accurate characterization of the morphology.

      One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      None of the parameters concern the cell body by itself. The cell body is always relative to another metric(s). Because these parameters and what they represent does not seem to be  very clear we will add a graphic representation of the type of measurements and measure they provide in the revised version of the manuscript.

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references.

      Thank you for this comment indeed the description of PCA may be too exhaustive, we will simplify the text. 

      Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      We are not sure in the case of segmented images the noise would represent most of the data, as by doing segmentation we also remove most of the noise, but maybe the reviewer is concerned about another type of noise? Nonetheless, we thank the reviewer for his comment and we propose the following change, that should solve this potential issue.

      “_PC_1 is the direction in which data is most dispersed.”

      L323: As before, it's not given that the first two components hold all the information.

      Thank you for this comment we modified this statement as follows: “The two first components represent most of the information (about 70%), hence we can consider the plan PC_1, PC_2 as the principal plan reducing the dataset to a two dimensional space”

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      Thanks a lot for these comments. We have made the changes in the text as proposed by the reviewer.

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      Sorry for the misunderstanding, we did use Spearman correlation which is monotonic, we thus changed linear by monotonic in the text. Thanks a lot for the careful reading.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      We are not entirely sure we fully understand the reviewer's comment. When data are similar or nearly identical, MorphoCellSorter performs comparably to human experts (see Table 1). However, the advantage of using MorphoCellSorter is that it ranks cells do.much faster while achieving accuracy similar to that of human experts AND gives them a value on an axis (andrews score), which a human expert certainly can't. For example, in the case of mouse embryos, MorphoCellSorter’s ranking was as accurate as that made by human experts. Based on this ranking, the distributions were similar, suggesting that the morphologies are generally consistent across samples.

      The algorithm itself does not detect anything—it simply ranks cells according to the provided parameters. Therefore, it is unlikely that sensitivity is an issue; the algorithm ranks the cells based on existing data. The most critical factor in the analysis is the segmentation step, which is not the focus of our paper. However, the more accurate the segmentation, the more distinct the parameters will be if actual differences exist. Thus, sensitivity concerns are more related to the quality of image acquisition or the segmentation process rather than the ranking itself. Once MorphoCellSorter receives the parameters, it ranks the cells accordingly. When cells are very similar, the ranking process becomes more complex, as reflected in the correlation values comparing expert rankings to those from MorphoCellSorter (Table 1). 

      Moreover, MorphoCellSorter does not only provide a ranking: the morphological indexes automatically computed offer useful information to compare the cells’ morphology between groups.

      h) Minor aspects:

      % notation requires to include (weight/volume) annotation.

      This has been done in the revised version of the manuscript

      Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      The reference of the mouse line has been added (RRID:IMSR_JAX:005582) to the revised version of the manuscript.

      L125: The length of the single housing should be specified to ensure no variability in this context.

      The mice were kept 24h00 individually, this is now stated in the text

      L673: Typo to the reference to the figure.

      This has been corrected, thank you for your thoughtful reading.

    1. eLife Assessment

      This important work advances our understanding of CHMP5's role in regulating osteogenesis through its impact on cellular senescence. The evidence supporting the conclusion is mostly convincing, although including additional experiments and discussions would further strengthen the study. This paper holds potential interest for skeletal biologists who study the pathogenesis of age-associated skeletal disorders.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a significant and rigorous investigation into the role of CHMP5 in regulating bone formation and cellular senescence. The study provides compelling evidence that CHMP5 is essential for maintaining endolysosomal function and controlling mitochondrial ROS levels, thereby preventing the senescence of skeletal progenitor cells.

      Strengths:

      The authors demonstrate that the deletion of Chmp5 results in endolysosomal dysfunction, elevated mitochondrial ROS, and ultimately enhanced bone formation through both autonomous and paracrine mechanisms. The innovative use of senolytic drugs to ameliorate musculoskeletal abnormalities in Chmp5-deficient mice is a novel and critical finding, suggesting potential therapeutic strategies for musculoskeletal disorders linked to endolysosomal dysfunction.

      Weaknesses:

      The manuscript requires a deeper discussion or exploration of CHMP5's roles and a more refined analysis of senolytic drug specificity and effects. This would greatly enhance the comprehensiveness and clarity of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors try to show the importance of CHMP5 for skeletal development.

      Strengths:

      The findings of this manuscript are interesting. The mouse phenotypes are well done and are of interest to a broader (bone) field.

      Weaknesses:

      The mechanistic insights are mediocre, and the cellular senescence aspect poor.

      In total, it has not been shown that there are actual senescent cells that are reduced after D+Q-treatment. These statements need to be scaled back substantially.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Zhang et al. reported that CHMP5 restricts bone formation by controlling endolysosome-mitochondrion-mediated cell senescence. The effects of CHMP5 on osteoclastic bone resorption and bone turnover have been reported previously (PMID: 26195726), in which study the aberrant bone phenotype was observed in the CHMP5-ctsk-CKO mouse model, using the same mouse model, Zhang et al., report a novel role of CHMP5 on osteogenesis through affecting cell senescence. Overall, it is an interesting study and provides new insights in the field of cell senescence and bone.

      Strengths:

      Analyzed the bone phenotype OF CHMP5-periskeletal progenitor-CKO mouse model and found the novel role of senescent cells on osteogenesis and migration.

      Weaknesses:

      (1) There are a lot of papers that have reported that senescence impairs osteogenesis of skeletal stem cells. In this study, the author claimed that Chmp5 deficiency induces skeletal progennitor cell senescence and enhanced osteogenesis. Can the authors explain the controversial results?

      (2) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cells in response to Chmp5-KO-induced senescent cells. In addition, the co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      (3) Many EVs were secreted from Chmp5-deleted periskeletal progenitors, compared to the rarely detected EVs around WT cells. Since EVs of BMSCs or osteoprogenitors show strong effects of promoting osteogenesis, did the EVs contribute to the enhanced osteogenesis induced by Chmp5-defeciency?

      (4) EVs secreted from senescent cells propagate senescence and impair osteogenesis, why do EVs secreted from senescent cells induced by Chmp5-defeciency have opposite effects on osteogenesis?

      (5) The Chmp5-ctsk mice show accelerated aging-related phenotypes, such as hair loss and joint stiffness. Did Ctsk also label cells in hair follicles or joint tissue?

      (6) Fifteen proteins were found to increase and five proteins to decrease in the cell supernatant of Chmp5Ctsk periskeletal progenitors. How about SASP factors in the secretory profile?

      (7) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Whether the effects of D+Q on bone overgrowth is because of the inhibition of bone resorption?

      (8) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis by affecting cell senescence.

      (9) Cell senescence with markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo.

      (10) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors. Maybe primary periskeletal progenitor cell is a better choice.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The manuscript presents a significant and rigorous investigation into the role of CHMP5 in regulating bone formation and cellular senescence. The study provides compelling evidence that CHMP5 is essential for maintaining endolysosomal function and controlling mitochondrial ROS levels, thereby preventing the senescence of skeletal progenitor cells. 

      Strengths: 

      The authors demonstrate that the deletion of Chmp5 results in endolysosomal dysfunction, elevated mitochondrial ROS, and ultimately enhanced bone formation through both autonomous and paracrine mechanisms. The innovative use of senolytic drugs to ameliorate musculoskeletal abnormalities in Chmp5-deficient mice is a novel and critical finding, suggesting potential therapeutic strategies for musculoskeletal disorders linked to endolysosomal dysfunction. 

      Weaknesses: 

      The manuscript requires a deeper discussion or exploration of CHMP5's roles and a more refined analysis of senolytic drug specificity and effects. This would greatly enhance the comprehensiveness and clarity of the manuscript. 

      We thank the reviewer for these insightful comments. The tissue-specific roles of CHMP5 and the specificity of quercetin and dasatinib treatments in Chmp5-deficient mice will be further discussed and clarified in the revised manuscript. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors try to show the importance of CHMP5 for skeletal development. 

      Strengths: 

      The findings of this manuscript are interesting. The mouse phenotypes are well done and are of interest to a broader (bone) field. 

      Weaknesses: 

      The mechanistic insights are mediocre, and the cellular senescence aspect poor. 

      In total, it has not been shown that there are actual senescent cells that are reduced after D+Q-treatment. These statements need to be scaled back substantially. 

      We thank the reviewer for these suggestive comments. Although multiple hallmarks of cell senescence were shown in CHMP5-deficient skeletal progenitors, we will detect and add additional markers of cell senescence in the revised manuscript. 

      In addition, the effects and specificity of the Q+D treatment will be further discussed and clarified with the revision.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, Zhang et al. reported that CHMP5 restricts bone formation by controlling endolysosome-mitochondrion-mediated cell senescence. The effects of CHMP5 on osteoclastic bone resorption and bone turnover have been reported previously (PMID: 26195726), in which study the aberrant bone phenotype was observed in the CHMP5ctsk-CKO mouse model, using the same mouse model, Zhang et al., report a novel role of CHMP5 on osteogenesis through affecting cell senescence. Overall, it is an interesting study and provides new insights in the field of cell senescence and bone. 

      Strengths: 

      Analyzed the bone phenotype OF CHMP5-periskeletal progenitor-CKO mouse model and found the novel role of senescent cells on osteogenesis and migration. 

      Weaknesses: 

      (1) There are a lot of papers that have reported that senescence impairs osteogenesis of skeletal stem cells. In this study, the author claimed that Chmp5 deficiency induces skeletal progennitor cell senescence and enhanced osteogenesis. Can the authors explain the controversial results? 

      Different skeletal stem cell populations in time and space have been identified and reported. This study shows that Chmp5 deficiency in periskeletal and endosteal skeletal progenitors causes cell senescence and aberrant bone formation. Although cell senescence during aging can impair osteogenesis of certain skeletal stem cells, which contributes to diseases with low bone mass such as osteoporosis, aging can also increase heterotopic mineralization/calcification in musculoskeletal soft tissues such as ligaments and tendons, which is consistent with our results in this study. These reflect out-of-order musculoskeletal mineralization during aging. We will expand the discussion and clarify the results of CHMP5-regulated cell senescence in osteogenesis in the revised manuscript.

      (2) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cells in response to Chmp5-KO-induced senescent cells. In addition, the co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      Increased osteogenesis of WT skeletal progenitors in the periskeletal lesion was shown to be a paracrine mechanism of abnormal bone formation in Chmp5Ctsk mice. The coculture experiment will help confirm the effect of Chmp5-deficient skeletal progenitors on the osteogenesis of neighboring WT skeletal progenitors.

      Notably, the cause and outcome of cell senescence are highly heterogeneous, and different causes of cell senescence can cause significantly different outcomes. Although the coculture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would be very interesting, these are beyond the scope of the current study.

      (3) Many EVs were secreted from Chmp5-deleted periskeletal progenitors, compared to the rarely detected EVs around WT cells. Since EVs of BMSCs or osteoprogenitors show strong effects of promoting osteogenesis, did the EVs contribute to the enhanced osteogenesis induced by Chmp5-defeciency? 

      The WT skeletal progenitor cells from Chmp5Ctsk mice have an increased capacity of osteogenesis compared to the corresponding cells from control animals, suggesting that the EVs of the Chmp5-deleted periskeletal progenitors could promote osteogenesis of the WT skeletal progenitors, which represents a paracrine mechanism of abnormal bone formation in Chmp5 deficient animals. We will discuss and clarify these results in the revised manuscript.

      (4) EVs secreted from senescent cells propagate senescence and impair osteogenesis, why do EVs secreted from senescent cells induced by Chmp5-defeciency have opposite effects on osteogenesis? 

      The question is similar to comment #1. The functional heterogeneity of cellular senescence will be discussed in further detail and clarified in the revised manuscript.

      (5) The Chmp5-ctsk mice show accelerated aging-related phenotypes, such as hair loss and joint stiffness. Did Ctsk also label cells in hair follicles or joint tissue? 

      This is an interesting question. Although we did not check the expression of CHMP5 in hair follicles, which is outside the scope of the present study, the result in Fig. 1E showed the expression of CHMP5 in joint ligaments. Notably, abnormal periskeletal bone formation occurs predominantly at the joint ligament insertion site in Chmp5Ctsk mice, which will be elucidated and discussed in the revised manuscript.

      (6) Fifteen proteins were found to increase and five proteins to decrease in the cell supernatant of Chmp5Ctsk periskeletal progenitors. How about SASP factors in the secretory profile? 

      As mentioned above, the SASP phenotype and related factors of senescent cells could be highly heterogeneous depending on inducers, cell types, and timing of senescence. Most of the proteins we identified in the secretome analysis have previously been reported in the secretory profile of osteoblasts. Although we were also interested in the change of some common SASP factors, such as inflammatory cytokines, the experiment did not detect these factors because of their small molecular weights and the technical limitations of mass spec analysis. 

      (7) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Whether the effects of D+Q on bone overgrowth is because of the inhibition of bone resorption? 

      Although in Chmp5Ctsk mice we cannot exclude the effect of D+Q on osteoclasts, the effect of D+Q on osteoblast lineage cells, which is the focus of the current study, was verified in Chmp5Dmp1 mice. We will expand the discussion and make these results clearer with the revision.

      (8) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis by affecting cell senescence. 

      We agree that additional experiments examining the role of VPS4A in cell senescence will provide more mechanistic insights. The focus of the current study is to report that CHMP5 restricts abnormal bone formation by preventing endolysosome-mitochondrion-mediated cell senescence. The roles of VPS4A in cell senescence and skeletal biology will be explored in separate studies.

      (9) Cell senescence with markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo. 

      We will examine additional markers of cell senescence, as the reviewers suggest, in the revised manuscript.

      (10) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors. Maybe primary periskeletal progenitor cell is a better choice. 

      We were aware that ATDC5 cells are typically used as a chondrocyte progenitor cell line. However, our previous study showed that ATDC5 cells could also be used as a reasonable cell model for periskeletal progenitors. Furthermore, the corresponding results from primary periskeletal progenitors were shown. We will further clarify this in the revision.

      In general, the comments of these reviewers will help clarify our results and further strengthen our conclusion. We will address these comments and questions point to point in more detail in the revised manuscript.

    1. eLife Assessment

      In this potentially valuable computational study, the authors conducted atomistic and coarse-grained simulations to probe the temperature-dependent phase behaviors of ELF3, a disordered component of the evening complex in plant. The results aim to highlight the role of polyQ tracts in modulating the temperature sensitivity. The level of evidence is considered incomplete, due to the lack of systematic calibration of the coarse-grained model and limited statistical uncertainty analysis, especially considering the relatively subtle nature of the differences due to temperature change.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the role of the Evening Complex (EC), specifically focusing on ELF3, a disordered protein component of the EC, and its temperature-dependent phase behavior. The study highlights the role of polyQ tracts in modulating temperature-sensitive condensate formation and provides a combination of computational approaches, including REST2 simulations and coarse-grained Martini simulations, to investigate how polyQ tract length and sequence context influence this behavior.

      Strengths:

      The study addresses a key question in plant biology - how temperature influences circadian clock-mediated growth regulation through protein phase behavior. The manuscript introduces the novel finding that polyQ tract length modulates the temperature-dependent formation of helices and condensates.

      Weaknesses:

      (1) Coarse-Grained Simulation Results Not Supported by Data:<br /> The results presented in Figure 6A of the manuscript do not seem to show a clear trend in the number of clusters formed as a function of polyQ tract length. This is particularly evident in the comparison between 0Q and 7Q polyQ lengths, which display statistically similar values in terms of the number of clusters. The lack of distinction between these values raises questions about the sensitivity of the coarse-grained simulations to polyQ tract length, which the authors claim as a key modulator of condensate formation. This discrepancy weakens the argument that polyQ length directly impacts the clustering behavior in the simulations.<br /> Suggested Analysis:<br /> - A more detailed statistical analysis should be performed to assess whether the observed differences between polyQ lengths are significant. This could involve hypothesis testing or the use of error bars in the graphs to better communicate the variability in the data.<br /> - Additionally, the authors should examine whether there are other features, such as cluster shape or internal structure, that might differentiate between different polyQ lengths, even if the total number of clusters is similar.

      (2) Inconsistency in Cluster Size Across Temperatures (Figure 6B):<br /> The results in Figure 6B show a striking difference in the size of the largest cluster between temperatures of 290K and 300K. This abrupt shift in behavior lacks a clear mechanistic explanation. Typically, phase transitions driven by temperature are more gradual, unless there is some underlying structural or chemical shift that the authors have not accounted for. Without a clear explanation, this sudden change in behavior reduces confidence in the simulation results.<br /> Suggested Analysis:<br /> - The authors should explore possible explanations for the dramatic difference in cluster size between 290K and 300K. For example, they could investigate whether specific interactions (such as the breaking or formation of hydrogen bonds or hydrophobic contacts) might explain the behavior at higher temperatures.<br /> - It is important to check whether the coarse-grained simulation model has been adequately parameterized and scaled for accurate temperature dependence. Atomistic simulations of monomers and dimers with varying polyQ tract lengths could be used to fine-tune the coarse-grained model, ensuring it accurately reflects molecular behavior. The gross estimate of a 10% scaling factor might be insufficient and could lead to inaccurate representations of cluster formation.

      (3) Scaling of Coarse-Grained Model with Atomistic Simulations:<br /> As mentioned, the coarse-grained model used in the study may not have been properly scaled against atomistic data. A simple scaling factor of 10% may not be appropriate for accurately capturing the behavior of polyQ tracts across different lengths, especially considering their sensitivity to subtle changes in temperature. Without rigorous validation against atomistic simulations, the coarse-grained model's predictions could be skewed.<br /> Suggested Analysis:

      (4) To address this, the authors should compare the coarse-grained model with atomistic simulations of monomeric and dimeric forms of ELF3 with different polyQ tract lengths. By comparing key structural parameters (e.g., radius of gyration, contact maps, and clustering propensity), the authors could adjust the coarse-grained model to more accurately reflect the atomistic behavior. The authors have wealth of atomistic simulation data that could afford such benchmarking and identification of scaling factor<br /> o Additionally, the authors should investigate whether the assumed scaling factor of 10% is appropriate for each polyQ length or whether it needs to be refined based on specific properties, such as the number of hydrophobic interactions or secondary structure stability.

      (5) Lack of Analysis for Liquid-Like Behavior in Phase Separation:<br /> The simulations presented in the manuscript do not analyze the liquid-like behavior of ELF3 condensates, which is a key characteristic of liquid-liquid phase separation (LLPS). In LLPS systems, condensates are often dynamic, with chains exchanging between clusters, indicating liquid-like rather than solid-like behavior. The authors fail to probe this crucial aspect, which is necessary to support the claim that ELF3 undergoes phase separation.<br /> Suggested Analysis:<br /> - The authors should conduct additional analyses to probe the liquid-like nature of the clusters formed by ELF3. One approach would be to analyze the dynamics of chain exchange between clusters, measuring how frequently chains leave one cluster and join another over time. This analysis would reveal whether the condensates behave as liquid-like, dynamic structures or more static, solid-like aggregates.<br /> - Additionally, the temperature dependence of these exchange dynamics should be investigated. In true liquid-liquid phase separation, the rate of chain exchange is often sensitive to temperature. Observing how this rate changes between 290K and 300K, for instance, could help explain the abrupt shift in cluster size seen in Figure 6B.<br /> - The authors should also analyze whether the internal structures of the condensates are consistent with a liquid-like phase. For example, radial distribution functions and contact lifetimes could be calculated to reveal whether the clusters exhibit liquid-like organization.

      (6) Lack of justification of polydispersity of polyQ:<br /> The authors don't provide any rationale for choice of different copies of polyQ used in the manuscript for their chain-growth simulation studies. It will be more apt if it can be motivated via some precedent experimental observations.

      (7) Lack of initiative to connect to Experiments:<br /> While the computational models and simulations provide robust theoretical insights, the absence of direct experimental validation weakens the overall impact of the manuscript. For example, experimental data on how specific mutations in the polyQ tract influence ELF3 behavior in vivo would significantly bolster the authors' claims. The manuscript would benefit from either citing existing experimental studies that corroborate these findings or from suggesting future experimental directions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to explore how a key protein in the circadian clock of plants, ELF3, responds to temperature changes by forming molecular condensates. They focused on understanding the role of a specific region of the protein, a polyQ tract, in promoting temperature-sensitive structural changes and regulating the formation of condensates. Through a series of computational simulations, they sought to uncover the molecular basis for ELF3's temperature responsiveness and its broader implications for plant growth and adaptation to environmental conditions.

      Strengths:

      The study's strength lies in its focus on an important biological question: how plants sense and respond to temperature changes at the molecular level. The authors employed a variety of computational techniques, including coarse-grained simulations, to explore the role of specific molecular features in this process. These methods provide a multi-scale view of protein behavior and offer valuable insights into how molecular structures may influence biological function.

      Weaknesses:

      However, there are notable weaknesses in the evidence provided. While the authors present trends in molecular changes, such as shifts in helical propensity and the formation of condensates, these results seem subtle and are not strongly substantiated by statistical analysis. The lack of error bars in the figures makes it difficult to distinguish between meaningful signals and potential noise in the data. Furthermore, the temperature-sensitive behavior appears to be influenced more by chain length than by sequence-specific effects of the polyQ region, raising questions about whether the findings truly capture the molecular mechanisms responsible for temperature sensing. Additionally, some simulations, particularly those related to the formation of condensates, do not appear fully converged, which casts further doubt on the robustness of the results.

      Additional Context for Readers:

      Readers should interpret the results with caution, especially regarding the molecular mechanisms proposed for temperature sensing. While the study presents interesting trends, the evidence is not definitive, and the findings may be more reflective of general protein behavior (such as the effect of chain length on condensate formation) than specific sequence-driven responses to temperature. Further experimental studies and more converged simulations will be necessary to fully understand the role of ELF3 in temperature regulation.

    4. Author response:

      We sincerely thank the reviewers for their constructive feedback and the editor for facilitating this thorough review. We found the suggestions insightful and valuable for refining our manuscript.  We would like to clarify a few points in an initial response before presenting the fully updated manuscript. First of all, we would like to emphasize the multi-scale nature of our approach, where we derived insights from both atomistic and coarse-grained simulations. Reviewers focused mostly on the coarse-grained simulations, the drawbacks of which we are aware and were a strong motivation for starting with the atomistic approach. Reviewer 1 mentioned a lack of a proposed mechanism for the increased condensate forming propensity at 300K vs. 290K, and we feel we had clearly pointed to the aromatic contacts as a mechanism for this, but we will make sure to clarify this further in the revision. Furthermore, reviewer 1 was critical of our use of the 10% adjustment to Martini protein-water interactions, which has previously been thoroughly presented and assessed in the literature (see for example Tesei et al JCTC 2022). Furthermore, for our specific system we were encouraged by the favorable comparison of our Martini simulations to the atomistic simulations, e.g. for radius of gyration, contact propensity, and solvent accessibility. We will make sure to emphasize this more clearly in the revision. Finally, we are grateful for the feedback from both reviewers and will use their comments as a guide to incorporate additional analyses and extended simulations to strengthen our conclusions in an upcoming revision.

    1. eLife Assessment

      This important study identifies species- and sex-specific neuronal cell types and gene expression in the medial preoptic area (MPOA) to help understand the evolutionary divergence of social behaviors. The evidence from single-nucleus RNA sequencing and immunostaining is convincing and suggests that cellular differences in the MPOA may contribute to behavioral variations such as mating and parental care that are apparent in two closely related deer mouse species. These rich observations provide an entry point for future hypothesis-driven experiments to demonstrate a causal role for these populations in sex- or species-variable behaviors in vertebrates. These data will be a resource that is of value to behavioral neuroscientists.

    2. Reviewer #1 (Public review):

      (1) Summary of the Paper:

      This paper by Chen et al. examines the cellular composition and gene expression of the hypothalamic medial preoptic area (MPOA) in two closely related deer mouse species (P. maniculatus and P. polionotus) that exhibit distinct social behaviors. Through single-nucleus RNA sequencing (snRNA-seq), Chen et al., identify sex- and species-specific neuronal cell types that likely contribute to differences in mating and parental care. By comparing monogamous and promiscuous species, the study provides insights into how neuronal diversity and gene expression changes in the MPOA might underlie the evolution of social behaviors.

      (2) Strengths of the Paper:

      The paper excels in several areas. First, the data presentation is clear and well-organized, making the complex findings easy to follow. The writing is straightforward and highly accessible, which enhances the overall readability. The experimental design is innovative, particularly in how they combined samples from different species into the same dataset and then used post-hoc identification to distinguish cell types by species. This dramatically controls for potential batch effects in my opinion. Additionally, the authors contextualize their findings within the framework of previously published studies on Mus musculus, providing a strong comparative analysis that enhances the significance of their work.

      3) Weaknesses of the Paper:

      The major limitation of the study is the absence of causal experiments linking the observed changes in MPOA cell types to species-specific social behaviors. While the study provides valuable correlational data, it lacks functional experiments that would demonstrate a direct relationship between the neuronal differences and behavior. For instance, manipulating these cell types or gene expressions in vivo and observing their effects on behavior would have strengthened the conclusions, although I certainly appreciate the difficulty in this, especially in non-musculus mice. Without such experiments, the study remains speculative about how these neuronal differences contribute to the evolution of social behaviors.

    3. Reviewer #2 (Public review):

      Summary:

      The authors report several interesting species and sex differences in cell type expression that may relate to species differences in behavior. The differential cell type abundance findings build on previously observed species/sex differences in behavior and brain anatomy. These data will be a valuable resource for behavioral neuroscientists. These findings are important but the manuscript goes too far in attributing causal influences to differences in behavior. A second important problem is that dissections used for the sequencing data include other neuropeptide-rich areas of the hypothalamus like the PVN. Although histology is included, the results in the main manuscript often do not include the mPOA making it hard to know if species/sex differences are consistent across different hypothalamic regions. The manuscript would benefit from more precise language.

      Strengths:

      The data are novel because cell-type atlases are available for only a few species.

      The authors have clearly defined appropriate steps taken to obtain trustworthy estimations of cell type abundance. Furthermore, the criteria for each cell type assignment were described in a way for readers to easily replicate. The rigor in comparing cell abundance provides convincing evidence that these species have differences in MPOA cellular composition.

      The authors have a good explanation for why 19 of the 53 neuron clusters were not classified (possible Mus/Peromyscus anatomical differences, some cell types don't have well-defined transcriptional profiles).

      Validated findings with histology

      Weaknesses:

      Some methodology could be further explained, like the decision of a 15% cutoff value for cell type assignment per cluster, or the necessity of a multi-step analysis pipeline for gene enrichment studies.

      The authors should exercise strong caution in making inferences about these differences being the basis of parental behavior. It is possible, given connections to relevant research, but without direct intervention, direct claims should be avoided. There should be clear distinctions of what to conclude and what to propose as possibilities for future research.

      Histology is not performed on all regions included in the sequencing analysis.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed snRNA-seq in the pre-optic area (POA), a heterogeneous brain region implicated in multiple innate behaviors, comparing two species of Peromyscus mice that possess strikingly different parenting behaviors. P. polionotus shows high levels of parental care from both sexes of parent, and P. maniculatus shows lower levels of care, predominantly displayed by dams rather than sires. The overall goal of understanding the genomic basis of behavioral variation is significant and of broad interest and comparative studies in POA in these two species is an excellent approach to tackle this question. The authors correctly point out that existing studies largely compare species that are highly divergent, such as mice and humans, which confounds the association of specific neuronal populations or gene expression patterns with distinct behaviors. They identify neuronal populations with differential abundance between species and sexes and additionally report sex and species differences in gene expression within each transcriptomic cell type. Their cell type classification is aided by mapping their Peromyscus cells onto a previously existing POA single-cell dataset generated in lab mice. However, a significant fraction of the cells cannot be assigned to Mus types, which confounds their analysis. The detection and validation of previously observed sex differences in the Gal/Moxd1 cell type and species differences in Avp expression provide additional support that their data are solid. This study provides an important resource for comparative single-cell studies in the brain.

      Strengths:

      This is a pioneering comparative snRNA-seq study that provides a roadmap for similar approaches in non-traditional model organisms.

      The authors have identified populations that may underlie sex- and species- differences in parenting behavior in rodents.

      A significant strength of the manuscript is the histological validation of their most robust marker genes.

      Weaknesses:

      My primary concern is that the dataset is limited: 52,121 neuronal nuclei across 24 samples, which does not provide many cells per cluster to analyze comparatively across sex and species, particularly given the heterogeneity of the region dissected. The Supplementary table reports lower UMIs/genes per cell than is typically seen as well. Perhaps additional information could be obtained from the data by not restricting the analyses to cells that can be assigned to Mus types. A direct comparison of the two Peromyscus species could be valuable as would a more complete Peromyscus POA atlas.

      In Supplement 7, it appears that most neurons can be assigned as excitatory or inhibitory, but then so many of these cells remain in the unassigned "gray blob" seen in panel 1E. Clustering of excitatory and inhibitory neurons separately, as in in prior cited work in Mus POA (refs 31 and 57) may boost statistical power to detect sex and species differences in cell types. Perhaps the cells that cannot be assigned to Mus contain too few reads to be useful, in which case they should be filtered out in the QC. The technical challenges of a comparative single-cell approach are considerable, so it benefits the scientific community to provide transparency about them.

      The Calb1 dimorphism as observed by immunostaining, appears much more extensive in P. maniculatus compared to P. polionotus (Figures 3 E and F). This finding is not reflected in the counts of the i20:Gal/Moxd1 cluster. The use of Calb1 staining as a proxy for the Gal/Moxd1 cluster would be strengthened if the number of POA Calb1+ neurons that are found in each cluster was apparent. There may be additional Calb+ neurons in the cells that are not annotated to a Mus cluster. This clarification would add support to the overall conclusion that there is reduced sexual dimorphism in P. polionotus.

      The relationship between the sex steroid receptor expression and the sex bias in gene expression would be improved if the sex bias in sex steroid receptor expression was included in Supplementary Figure 10.

      There is no explanation for the finding that there is a female bias in gene expression across all cell types in P. polionotus.

    5. Author response:

      We thank the reviewers for their thoughtful comments. 

      Based on their suggestions we will: 

      (1) Use more accurate language to describe the hypothalamus regions under investigation in this study. While we aimed to primarily investigate the medial preoptic area (MPOA), our dissections and sequencing data in fact capture several regions of the anterior hypothalamus including the anteroventral periventricular (AVPV), paraventricular (PVN), supraoptic (SON), suprachiasmatic nuclei (SCN), and more. We will revise the language in our manuscript to reflect that our study in fact investigates the cellular evolution of the anterior hypothalamus across behaviorally divergent deer mice.

      (2) Revise our language to clarify that while our study provides a rich dataset for generating hypotheses about which cell types may contribute to behavioral differences, it does not provide any evidence of causal relationships. We hope to investigate this further in future work.

      (3) Clarify specific methodological choices for which reviewers had questions, especially about the hypothalamic regions for which we did histology to validate cell abundance differences and methodological choices related to mapping our cell clusters to Mus cell types.

      Our responses to each reviewer’s specific comments are below.

      Reviewer #1:

      The major limitation of the study is the absence of causal experiments linking the observed changes in MPOA cell types to species-specific social behaviors. While the study provides valuable correlational data, it lacks functional experiments that would demonstrate a direct relationship between the neuronal differences and behavior. For instance, manipulating these cell types or gene expressions in vivo and observing their effects on behavior would have strengthened the conclusions, although I certainly appreciate the difficulty in this, especially in non-musculus mice. Without such experiments, the study remains speculative about how these neuronal differences contribute to the evolution of social behaviors.

      Yes, we agree the study lacks functional experiments. We hope that the dataset is of value for generating hypotheses about how hypothalamic neuronal cell types may govern species-specific social behaviors, and for these hypotheses to be functionally tested by us and others in future work.

      Reviewer #2:

      Some methodology could be further explained, like the decision of a 15% cutoff value for cell type assignment per cluster, or the necessity of a multi-step analysis pipeline for gene enrichment studies.

      A 15% cutoff value for cell type assignment was chosen to include all known homology correspondences between our dataset and the Mus atlas. For example, i14:Avp/Cck cells from the Mus atlas represent Avp cells from the suprachiasmatic nuclei (SCN). Though only 17.3% of cluster 15 maps to i14:Avp/Cck, we know these two clusters correspond based on the expression of Avp and additional SCN marker genes in cluster 15 (Supp Fig 6). We will further explain this cutoff in the revised manuscript.

      Our gene enrichment study includes a multi-step analysis pipeline because we wanted to control for confounders that may be introduced because of gene expression level. Genes that are more highly expressed are more accurately quantified and thus more likely to be identified as differentially expressed. Therefore, we wanted to test for gene enrichments in our set of DE genes against a background of genes with similar expression levels. We will clarify this motivation in the revised manuscript.

      The authors should exercise strong caution in making inferences about these differences being the basis of parental behavior. It is possible, given connections to relevant research, but without direct intervention, direct claims should be avoided. There should be clear distinctions of what to conclude and what to propose as possibilities for future research.

      Yes, we agree that we are unable to make direct claims about neuronal differences being the basis of parental behavior. We will revise our language to be clearer about which relationships we are hypothesizing and what we propose as possibilities for future research.

      Histology is not performed on all regions included in the sequencing analysis.

      We apologize that our language describing the hypothalamic regions included in the sequencing analysis and those included in the histology is unclear. We aimed to dissect the medial preoptic region for the sequencing analysis, but additionally captured parts of the anterior hypothalamus including the paraventricular (PVN), supraoptic (SON), and suprachiasmatic nuclei (SCN), and more.  Our histology was performed across the entire hypothalamus and includes all regions included in the sequencing data. We will revise the manuscript to more accurately describe the hypothalamic regions for which we investigated.

      Reviewer #3:

      My primary concern is that the dataset is limited: 52,121 neuronal nuclei across 24 samples, which does not provide many cells per cluster to analyze comparatively across sex and species, particularly given the heterogeneity of the region dissected. The Supplementary table reports lower UMIs/genes per cell than is typically seen as well. Perhaps additional information could be obtained from the data by not restricting the analyses to cells that can be assigned to Mus types. A direct comparison of the two Peromyscus species could be valuable as would a more complete Peromyscus POA atlas.

      Our dataset reports ~1,500 genes and ~1,000 UMIs per nuclei which is indeed lower than is typically reported in other single nuclei datasets. Some of this discrepancy is due to a lower quality genome and annotated transcriptome available for Peromyscus compared to Mus musculus, which results in a lower mapping rate than is typically reported in Mus studies. However, our dataset was sufficient to identify known peptidergic cell types (Supp Fig 6) and to map homology to Mus cell types for 34 (64%) of our 53 clusters. Additionally, although some of our clusters contain small numbers of cells, our differential abundance analysis accounts for the variance in cell numbers observed across samples and should be robust against any increase in variance due to small numbers. In fact, even differential abundance of very small cell clusters such as oxytocin neurons (cell type 40) was validated by histology. 

      We would like to clarify that all analyses were performed on all cell clusters, regardless of whether or not they could be assigned homology to a Mus cell type. All the cell types that we identified as differentially abundant or contained significant sex differences happened to be cell types for which homology to a Mus cell type could be defined. This may arise for a relatively uninteresting reason: cell types that have more distinct transcriptional signatures will be more accurately clustered, leading to more accurate identification of homology as well as more accurate measurements of differential abundance / expression. We will revise language to make this more clear in our manuscript.

      In Supplement 7, it appears that most neurons can be assigned as excitatory or inhibitory, but then so many of these cells remain in the unassigned "gray blob" seen in panel 1E. Clustering of excitatory and inhibitory neurons separately, as in prior cited work in Mus POA (refs 31 and 57) may boost statistical power to detect sex and species differences in cell types. Perhaps the cells that cannot be assigned to Mus contain too few reads to be useful, in which case they should be filtered out in the QC. The technical challenges of a comparative single-cell approach are considerable, so it benefits the scientific community to provide transparency about them.

      We are not certain about why we are unable to cluster and assign homology to many of our cells (i.e. cells in the unassigned “gray blob”). However, we note that even in the Mus atlas, many cells did not belong to obvious clusters by UMAP visualization and that several clusters lacked notable marker genes and were designated simply as “Gaba” and “Glut” clusters. Therefore, it is unsurprising that our own dataset also contains cells that lack the transcriptional signatures needed to be clustered and/or mapped to Mus cell types. We do know, however, that the median number of reads/nuclei is uniform across cell clusters and does not explain why some clusters could not be assigned to Mus. We will add this information to our revised manuscript. 

      We do not think that a two-stage clustering (i.e. clustering first by excitatory vs. inhibitory neurons) is expected to gain power to resolve cell types in this case. Excitatory vs. inhibitory neurons are clearly separable on our UMAP (Supp Fig 7) so that information is already being used by our clustering procedure. However, we will explore this further in our revised manuscript to see if doing so will boost statistical power.

      The Calb1 dimorphism as observed by immunostaining, appears much more extensive in P. maniculatus compared to P. polionotus (Figures 3 E and F). This finding is not reflected in the counts of the i20:Gal/Moxd1 cluster. The use of Calb1 staining as a proxy for the Gal/Moxd1 cluster would be strengthened if the number of POA Calb1+ neurons that are found in each cluster was apparent. There may be additional Calb+ neurons in the cells that are not annotated to a Mus cluster. This clarification would add support to the overall conclusion that there is reduced sexual dimorphism in P. polionotus.

      From the Mus MPOA atlas (which includes both single-cell sequencing data and imaging-based spatial information), it is known that the i20:Gal/Moxd1 cluster comprises sexually dimorphic cells that make up both the BNST and the SDN-POA. These sexually dimorphic cells are well-studied and known to be marked by Calb1, which we used in immunostaining as a proxy for i20:Gal/Moxd1. 

      However, we would like to clarify that in our study, the immunostaining of Calb1+ neurons and the sequencing counts of the i20:Gal/Moxd1 cluster are not completely reflective of each other because our sequencing dataset only captured the ventral portion of the BNST. Therefore our i20:Gal/Moxd1 counts contain a combination of some Calb1+ BNST cells and likely all Calb1+ SDN-POA cells and is difficult to interpret on its own. Our histology, however, covers the entire hypothalamus and is more reliable for identifying sex and species differences in each region. We will clarify this in the revised manuscript. 

      The relationship between the sex steroid receptor expression and the sex bias in gene expression would be improved if the sex bias in sex steroid receptor expression was included in Supplementary Figure 10.

      We will include this in the revised manuscript. 

      There is no explanation for the finding that there is a female bias in gene expression across all cell types in P. polionotus.

      We also find this observation interesting but don’t have a good explanation for why at this point. We plan to follow this up in future work.

    1. eLife Assessment

      This valuable study investigates the brain representations of Braille letters in blind participants and provides evidence using EEG and fMRI that the decoding of letter identity across the reading hand takes place in the visual cortex. The evidence supporting the claims of the authors is convincing and the work will be of interest to neuroscientists working on brain plasticity.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers examined how individuals who were born blind or lost their vision early in life process information, specifically focusing on the decoding of Braille characters. They explored the transition of Braille character information from tactile sensory inputs, based on which hand was used for reading, to perceptual representations that are not dependent on the reading hand.

      They identified tactile sensory representations in areas responsible for touch processing and perceptual representations in brain regions typically involved in visual reading, with the lateral occipital complex serving as a pivotal "hinge" region between them.

      In terms of temporal information processing, they discovered that tactile sensory representations occur prior to cognitive perceptual representations. The researchers suggest that this pattern indicates that even in situations of significant brain adaptability, there is a consistent chronological progression from sensory to cognitive processing.

      Strengths:

      By combining fMRI and EEG, and focusing on the diagnostic case of Braille reading, the paper provides an integrated view of the transformation processing from sensation to perception in the visually deprived brain. Such a multimodal approach is still rare in the study of human brain plasticity and allows to discern the nature of information processing in blind people early visual cortex, as well as the timecourse of information processing in a situation of significant brain adaptability.

      Weaknesses:

      ROI and searchlight analyses are not completely overlapping, although this might be due to the specific limits and strengths of each approach. Moreover, the conclusions regarding the behavioral relevance of the sensory and perceptual representations in the putatively reorganized brain, although important, are limited due to the behavioral measurements adopted.

    3. Reviewer #2 (Public review):

      Summary:

      Haupt and colleagues performed a well-designed study to test the spatial and temporal gradient of perceiving braille letters in blind individuals. Using cross-hand decoding of the read letters, and comparing it to the decoding of the read letter for each hand, they defined perceptual and sensory responses. Then they compared where (using fMRI) and when (using EEG) these were decodable. Using fMRI, they showed that low-level tactile responses specific to each hand are decodable from the primary and secondary somatosensory cortex as well as from IPS subregions, the insula and LOC. In contrast, more abstract representations of the braille letter independent from the reading hand were decodable from several visual ROIs, LOC, VWFA and surprisingly also EVC. Using a parallel EEG design, they showed that sensory hand-specific responses emerge in time before perceptual braille letter representations. Last, they used RSA to show that the behavioral similarity of the letter pairs correlates to the neural signal of both fMRI (for the perceptual decoding, in visual and ventral ROIs) and EEG (for both sensory and perceptual decoding).

      Strengths:

      This is a very well-designed study and it is analyzed well. The writing clearly describes the analyses and results. Overall, the study provides convincing evidence from EEG and fMRI that the decoding of letter identity across the reading hand occurs in the visual cortex in blindness. Further, it addresses important questions about the visual cortex hierarchy in blindness (whether it parallels that of the sighted brain or is inverted) and its link to braille reading.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      We thank Reviewer #1 for the relevant and insightful comments on our paper. Please find our detailed answers below in the Recommendations to the Authors section.

      Summary: 

      The researchers examined how individuals who were born blind or lost their vision early in life process information, specifically focusing on the decoding of Braille characters. They explored the transition of Braille character information from tactile sensory inputs, based on which hand was used for reading, to perceptual representations that are not dependent on the reading hand. 

      They identified tactile sensory representations in areas responsible for touch processing and perceptual representations in brain regions typically involved in visual reading, with the lateral occipital complex serving as a pivotal "hinge" region between them.

      In terms of temporal information processing, they discovered that tactile sensory representations occur prior to cognitive-perceptual representations. The researchers suggest that this pattern indicates that even in situations of significant brain adaptability, there is a consistent chronological progression from sensory to cognitive processing. 

      Strengths: 

      By combining fMRI and EEG, and focusing on the diagnostic case of Braille reading, the paper provides an integrated view of the transformation processing from sensation to perception in the visually deprived brain. Such a multimodal approach is still rare in the study of human brain plasticity and allows us to discern the nature of information processing in blind people's early visual cortex, as well as the time course of information processing in a situation of significant brain adaptability. 

      Weaknesses: 

      The lack of a sighted control group limits the interpretations of the results in terms of profound cortical reorganization, or simple unmasking of the architectural potentials already present in the normally developing brain. 

      We thank the reviewer for raising this important point! We acknowledge that our claims regarding the unmasking of architectural potentials in both the normally developing and visually deprived brain are limited by the study design we employed. However, we note that defining an appropriate control group and assessing non-visual reading in sighted participants is far from straightforward. We discuss these issues in our response to the Public Review of Reviewer 2.

      Moreover, the conclusions regarding the behavioral relevance of the sensory and perceptual representations in the putatively reorganized brain are limited due to the behavioral measurements adopted.

      We agree with the reviewer that the relation between behavior and neural representations as established via perceived similarity judgments are task-dependent, and that a richer assessment of behavior would be valuable. Please note, however, that this limitation pertains to any experimental task used to assess behavior in the laboratory. Our major goal was to assess whether the identified neural representations are suitably formatted to be used by the brain for at least one behavior rather than being epiphenomenal. We found that the representations are suitably formatted for similarity judgments, thus establishing that they are relevant for at least this behavior. We also argue that judging similarity is a complex task that may underlie many other relevant behaviors. We discuss this point further in response to the Recommendations to the Authors.

      Reviewer #2 (Public Review): 

      We thank the reviewer for the considerate and thoughtful suggestions. Please find a detailed description of the implemented changes below.

      Summary: 

      Haupt and colleagues performed a well-designed study to test the spatial and temporal gradient of perceiving braille letters in blind individuals. Using cross-hand decoding of the read letters, and comparing it to the decoding of the read letter for each hand, they defined perceptual and sensory responses. Then they compared where (using fMRI) and when (using EEG) these were decodable. Using fMRI, they showed that low-level tactile responses specific to each hand are decodable from the primary and secondary somatosensory cortex as well as from IPS subregions, the insula, and LOC. In contrast, more abstract representations of the braille letter independent from the reading hand were decodable from several visual ROIs, LOC, VWFA, and surprisingly also EVC. Using a parallel EEG design, they showed that sensory hand-specific responses emerge in time before perceptual braille letter representations. Last, they used RSA to show that the behavioral similarity of the letter pairs correlates to the neural signal of both fMRI (for the perceptual decoding, in visual and ventral ROIs) and EEG (for both sensory and perceptual decoding). 

      Strengths: 

      This is a very well-designed study and it is analyzed well. The writing clearly describes the analyses and results. Overall, the study provides convincing evidence from EEG and fMRI that the decoding of letter identity across the reading hand occurs in the visual cortex in blindness. Further, it addresses important questions about the visual cortex hierarchy in blindness (whether it parallels that of the sighted brain or is inverted) and its link to braille reading. 

      Weaknesses: 

      Although I have some comments and requests for clarification about the details of the methods, my main comment is that the manuscript could benefit from expanding its discussion. Specifically, I'd appreciate the authors drawing clearer theoretical conclusions about what this data suggests about the direction of information flow in the reorganized visual system in blindness, the role VWFA plays in blindness (revised from the original sighted role or similar to it?), how information arrives to the visual cortex, and what the authors' predictions would be if a parallel experiment would be carried out in sighted people (is this a multisensory recruitment or reorganization?). The data has the potential to speak to a lot of questions about the scope of brain plasticity, and that would interest broad audiences. 

      We thank the reviewer for the opportunity to provide clearer theoretical conclusions from our data. We elaborate on each of the points raised by the reviewer in the discussion section.

      Concerning the direction of information flow in the reorganized visual system in blindness, we focus on information arrival to EVC and information flow beyond EVC.

      p. 11, ll. 376-386, Discussion 4.1:

      “Overall, identifying braille letter representations in widespread brain areas raises the question of how information flow is organized in the visually deprived brain. Functional connectivity studies report deprivation-driven changes of thalamo-cortical connections which could explain both arrival of information to and further flow of information beyond EVC. First, the coexistence of early thalamic connections to both S1 and V1 (Müller et al., 2019) would enable EVC to receive from different sources and at different timepoints. Second, potentially overlapping connections from both sensory cortices to other visual or parietal areas (Ioannides et al., 2013) could enable the visually deprived brain to process information in a widespread and interconnected array of brain areas. In such a network architecture, several brain areas receive and forward information at the same time. In contrast to information discretely traveling from one processing unit to the next in the sighted brain’s processing cascade, we can rather picture information flowing in a spatially and functionally more distributed and overlapping fashion.”

      Regarding the role of VWFA, we propose that the functional organization of VWFA is modality-independent.

      p. 10, ll. 346-348, Discussion 4.1:

      “Second, we found that VWFA contains perceptual but not sensory braille letter representations. By clarifying the representational format of language representations in VWFA, our results support previous findings of the VWFA being functionally selective for letter and word stimuli in the visually deprived brain (Reich et al., 2011; Striem-Amit et al., 2012; Liu et al., 2023). Together, these findings suggest that the functional organization of the VWFA is modality-independent (Reich et al., 2011), depicting an important contribution to the ongoing debate on how visual experience shapes representations along the ventral stream (Bedny et al., 2021).” Lastly, we would like to share our thoughts about carrying out a parallel experiment in sighted people. 

      In general, we agree that it seems insightful to conduct a parallel, analogous experiment in sighted participants with the aim to disentangle whether the effects seen in blind participants are due to multisensory recruitment or reorganization. However, before making predictions regarding the outcome, we would have to define an analogous experiment in sighted participants that taps into the same mechanisms. This, however, is difficult to do as it is unclear what counts as analogous. For example, if we compare braille reading to reading visually presented braille dot arrays or Roman letters, we will assess visual object processing, a different mechanism from that involved in braille reading. Alternatively, if we compare braille reading to sighted participants reading embossed Roman letters haptically or ideally even reading Braille after extensive training, we still face the inherent problem that sighted participants have visual experiences and could use visual imagery strategies in these nonvisual tasks. As we cannot experimentally ensure that sighted participants do not use visual strategies to solve a task, this would always complicate drawing conclusions about the underlying processes. More specifically, we could never pinpoint whether differences between sighted and blind participants are due to measuring different mechanisms or measuring the same mechanism and unravelling underlying changes (i.e., multisensory recruitment or reorganization). Finally, apart from potential confounds due to visual imagery, considering populations of sighted readers and Braille readers as only differing with regard to their input modality and otherwise being comparable is problematic: In general, blind populations are more heterogenous than most typical samples due to various factors such as aetiologies, onset and severity (Merabet & Pascual-Leone, 2010). Even when carrying out studies in highly specific population subsamples, such as in congenitally blind braille readers, vast within-group differences remain, e.g., the quality and quantity of their braille education, as well as across braille and print readers, e.g., different passive exposure to braille versus written letters during childhood (Englebretson et al., 2023). Hence, to fully match the groups in terms of learning experience we would, for example, have to teach sighted infants braille reading in childhood and follow them up until a comparable age. This approach does not seem feasible. 

      p. 10, ll. 328-341, Discussion 4.1:

      “We note that our findings contribute additional evidence but cannot conclusively distinguish between the competing hypotheses that visually deprived brains dynamically adjust to the environmental constraints versus that they undergo a profound cortical reorganization. Resolving this debate would require an analogous experiment in sighted people which taps into the same mechanisms as the present study. Defining a suitable control experiment is, however, difficult. Any other type of reading would likely tap into different mechanism than braille reading. Further, whenever sighted participants are asked to perform a haptic reading task, outcomes can be confounded by visual imagery driving visual cortex (Dijkstra et al., 2019). Thus, the results would remain ambiguous as to whether observed differences between the groups index different mechanisms or plastic changes in the same mechanisms. Last, matching groups of sighted readers and braille readers such that they only differ with regard to their input modality seems practically unfeasible: There are vast differences within the blind population in general, e.g., aetiologies, onset and severity, and the subsample of congenitally blind braille readers more specifically, e.g., the quality and quantity of their braille education, as well as across braille and print readers, e.g., different passive exposure to braille versus written letters during childhood (Englebretson et al., 2023; Merabet & Pascual-Leone, 2010).”

      While we appreciate that the conclusions we can draw from our results are limited by our sample and defining an appropriate parallel experiment in sighted participants is difficult for the reasons discussed above, we would still like to share our speculations regarding the process underlying our result pattern. We think that our results, taken together with results of previous studies, suggest that EVC does not undergo fundamental reorganization in the case of visual deprivation. Rather, it can flexibly adjust to given processing requirements. This flexibility is not infinite; adjustments are limited by the area’s architectural and computational capacity. Importantly, we think that this claim refers to an unmasking of preexisting potential rather than multisensory recruitment.

      To aid in drawing even more concrete conclusions about the flow of information, I suggest that the authors also add at least another early visual ROI to plot more clearly whether EVC's response to braille letters arrives there through an inverted cortical hierarchy, intermediate stages from VWFA, or directly, as found in the sighted brain for spoken language. 

      We thank the reviewer for this comment. However, EVC here consists of V1 to V3, and we already also assess V4, LOC, VWFA and LFA. Thus, we assess regions at all levels of processing from mid- over low- to high-level and cannot add a further interim ROI. Our results using this ROI set do not allow us to arbitrate between the hypotheses raised by the reviewer.

      Similarly, it may be informative to look specifically at the occipital electrodes' time differences between decoding for the different parameters and their correlation to behavior.

      We thank the reviewer for this suggestion. However, the spatial resolution of EEG measurements is limited, and we cannot convincingly determine the neural source of signals being recorded from specific electrodes, i.e., occipital. When we reduce the number of electrodes before analysis, we primarily see comparable qualitative trends in the data albeit with a reduction in signal-to-noise-ratio.

      To illustrate, we repeated the EEG time decoding and the EEG-behavior RSA with only occipital and parieto-occipital electrodes (n=8) instead of all electrodes (n=63) and added the results to the Supplementary Material (see Supplementary Figure 3 and 4). Overall, we observe a reduction in signal-to-noise-ratio. This is not surprising given that the EEG searchlight decoding results (Figure 3b) reveal sources of the decoding signals extend beyond occipital and parieto-occipital electrodes. 

      In the EEG time decoding analysis, we see a comparable trend to the whole brain EEG analysis but do not find a significant difference in onsets of sensory and perceptual representation. 

      In the behavior-EEG RSA, we do find that the correlations between behavior and sensory representations emerge significantly earlier than correlations between behavior and perceptual representations. (N = 11, 1,000 bootstraps, one-tailed bootstrap test against zero, P< 0.001). This result is in line with the whole brain EEG analysis.

      Regarding the methods, further detail on the ability to read with both hands equally and any residual vision of the participants would be helpful.

      We thank the reviewer for raising this point. We assessed participants’ letter reading capabilities in a short screening task prior to the experiment. Participants read letters with both hands separately and we used the same presentation time as in the experiment. As the result showed that average performance for recognizing letters with the left hand (89%) and right hand (88%) were comparable. We did not measure continuous reading in the present study, and we did not assess further information about participants’ ability to read equally well with both hands. 

      While the information about the screening task was previously included in Methods section 5.3.2 EEG experiment, we now moved it into a separate section 5.3.3 Braille screening task to make the information better accessible. 

      p. 14, ll. 529-533, Methods 5.3.3:

      “Prior to the experiment, participants completed a short screening task during which each letter of the alphabet was presented for 500ms to each hand in random order. Participants were asked to verbally report the letter they had perceived to assess their reading capabilities with both hands using the same presentation time as in the experiment. The average performance for the left hand was 89% correct (SD = 10) and for the right hand it was 88% correct (SD = 13).”

      We thank the reviewer for the suggestion to include information regarding participant’s residual vision. We now added information about participants’ residual light perception to Supplementary Table 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) ROI vs Searchlight Results: Figures 2 b and c do not seem to match. The ROI results (b) should be somehow consistent with the whole brain results (c), but "perceptual" decoding in the searchlight (in green) seems localized in sensorimotor areas while for the same classification, no sensorimotor ROI is significant. can the authors clarify this difference?

      Similarly, perceptual decoding does not emerge in EVC with the searchlight analysis, whereas is quite strong in ROI analysis.

      We agree that the results of the ROI and searchlight decoding do not show a direct match. We think that this difference is due to methodological reasons. For example, ROI decoding can be more sensitive when ROIs follow functionally relevant boundaries in the brain, in comparison to spheres used in searchlight decoding that do not. In turn, searchlight decoding may be more sensitive when information is distributed across functional boundaries that would be captured in different ROIs rather than combined, or when ROI definition is difficult (such as here in the visual system of blind participants).

      However, we point out that the primary goal of our searchlight decoding was to show that no other areas beyond our hypothesized ROIs contained braille letter representations, rather than reproducing the ROI results.

      Decoding accuracies are tested against chance (50% for pairwise classifications) according to methods. In the case of "sensory and perceptual" and "perceptual" classification, this is straightforward. In the case of the analysis that isolates "sensory" representations though the difference is computed between "sensory and perceptual" and "perceptual" decoding accuracies, the accuracies resulting from this difference should thus be centered around 0.

      Are the accuracies tested against 0 in this case? This is not specified in the methods. Furthermore, the data reported in Figure 2 and Figure 3. seem to have 0% as a baseline and the label states "decoding accuracy". Can the authors clarify whether the reported data are the difference in accuracy with an estimated empirical baseline or an expected baseline of 50%? 

      The reviewer is correct in stating that we tested “sensory and perceptual” and “perceptual” against chance level and the difference score “sensory” against 0 and that this information was missing in the methods section.

      We now specify in the methods that we are testing the accuracies for the “sensory” analysis against 0.

      p. 16, ll. 625-627, Methods 5.6:

      “We conducted subject-specific braille letter classification in two ways. First, we classified between letter pairs presented to one reading hand, i.e., we trained and tested a classifier on brain data recorded during the presentation of braille stimuli to the same hand (either the right or the left hand). This yields a measure of hand-dependent braille letter information in neural measurements. We refer to this analysis as within-hand classification. Second, we classified between letter pairs presented to different hands in that we trained a classifier on brain data recorded during the presentation of stimuli to one hand (e.g., right), and tested it on data related to the other hand (e.g., left). This yields a measure of hand-independent braille letter information in neural measurements. We refer to this analysis as across-hand classification. We tested both within-hand and across-hand pairwise classification accuracies against a chance level of 50%. We also calculated a within-across hand classification score which we compared against 0.”

      Regarding Figures 2 and 3, we plot the results as decoding accuracies minus chance level to standardize the y-axes for all three analyses, i.e., compare them to 0. We have corrected the y-axis labels accordingly. 

      In our analyses, we assumed an expected baseline of 50%. But in the response below we provide evidence that our results remain stable whether using an expected or empirical baseline.

      If my understanding is correct, a potential problem persists. The different analyses may not be comparable, because in the "sensory" analysis the baseline is empirically defined, being the classification accuracies of the "perceptual" decoding, while in the other two analyses, the baseline is set at 50%. There are suggestions in the literature to derive empirically defined baselines by randomly shuffling the trial labels and repeating the classification accuracies [grootswagers 2017]. In the context of the present work, its use will make the different statistical analyses more comparable. I would thus suggest the authors define the baseline empirically for all their analyses or, given the high computational demand of this analysis, provide evidence that the results are not affected by this difference in the baseline. 

      We thank the reviewer for raising this point. As the reviewer correctly stated, the “sensory” analysis has an empirically defined baseline because it is a difference score while in the other two analyses the baseline is set at 50%.

      To provide evidence that our results are not affected by this difference in baseline, we now re-ran the EEG time decoding. We derived null distributions from the empirical data for all three analyses, following the guidelines from Grootswagers 2017 (page 688, section “Evaluation of Classifier Performance and Group Level Statistical Testing Statistical”):

      “Another popular alternative is the permutation test, which entails repeatedly shuffling the data and recomputing classifier performance on the shuffled data to obtain a null distribution, which is then compared against observed classifier performance on the original set to assess statistical significance (see, e.g., Kaiser et al., 2016; Cichy et al., 2014; Isik et al., 2014). Permutation tests are especially useful when no assumptions about the null distribution can be made (e.g., in the case of biased classifiers or unbalanced data), but they take much longer to run (e.g., repeating the analysis 10,000 times).”

      Running a sign permutation test with 10,000 repetitions, we show that the results are comparable to the previously reported results based on one-sided Wilcoxon signed rank tests. We are, therefore, confident that our reported results are not affected by this difference in baseline. We now added this control analysis to the results section and supplementary material (see Supplementary Figure 5).

      p. 7-8, ll. 213-215, Results 3.2: 

      “Importantly, the temporal dynamics of sensory and perceptual representations differed significantly. Compared to sensory representations, the significance onset of perceptual representations was delayed by 107ms (21-167ms) (N = 11, 1,000 bootstraps, one-tailed bootstrap test against zero, P= 0.012). This results pattern was consistent when defining the analysis baseline empirically (see Supplementary Figure 5).”

      (2) According to the authors, perceptual rather than sensory braille letter representations identified in space are suitably formatted to guide behavior. However, they acknowledge that this finding is likely to be task-dependent because it is based on subject similarity ratings.

      Maybe they could use a more objective similarity measurement of Braille letters similarity?

      For instance, they can compare letters using Jaccard similarity (See for instance: Bottini et al. 2022). 

      We thank the reviewer for the opportunity to clarify. We acknowledge that our findings regarding the behavioral relevance of the identified neural representations are task-dependent. But, importantly, this is not because we use perceived similarity ratings as a measurement, but because we only use one measurement while there are infinitely many other potential tasks to assess behavior. This means that the same limitation holds when using another similarity measure like Jaccard similarity. We now clarify this in the Discussion section: 

      p. 12, ll. 419-420, Discussion 4.3:

      “Our results clarified that perceptual rather than sensory braille letter representations identified in space are suitably formatted to guide behavior. However, we only use one specific task to assess behavior and, therefore, acknowledge that this finding is taskdependent.”

      Nevertheless, we calculated Jaccard similarity based on the definition used in Bottini et. al. There are no significant correlations for the EEG-behavior or fMRI-behavior RSA when we use the Jaccard matrix and subject-specific EEG or fMRI RDMs (see Supplementary Figure 6).

      This demonstrates that braille letter similarity ratings are significantly correlated with neural representations in space and time but Jaccard similarity of braille dot overlaps is not. 

      (3) If the primacy of perceptual similarity holds also with more objective measures of letter similarity, I think the authors should spend a few more words characterizing the results in fMRI and EEG that are rather divergent (concerning this analysis). Indeed, EEG analysis shows a significant correlation between similarity ratings and within-hand classification accuracy, although this correlation does not emerge in the "sensory" ROIs. I think these findings can be put together, hypothesizing that sensory-based similarity correlates with behavior but only in perceptual ROIs. However, why so? Can the authors provide a more mechanistic explanation? Am I missing something? 

      We thank the reviewer for this intriguing idea. We now speculate about how we could harmonize the results from the behavior-EEG and behavior-fMRI RSAs in the discussion section. 

      p. 12, ll. 438-442, Discussion 4.3:

      “Similarity ratings and sensory representations as captured by EEG are correlated, and so are similarity ratings and representations in perceptual ROIs, but not sensory ROIs. This might be interpreted as suggesting a link between the sensory representations captured in EEG and the representations in perceptual ROIs. However, we do not have any evidence towards this idea. Differing signalto-noise ratios for the different ROIs and sensory versus perceptual analysis could be an alternative explanation.“

      (4) In the methods they state that EEG decoding is tested against chance at each time point but these results are not reported, only latency analysis is reported. Can the authors report the significant time points of the EEG time series decoding?  

      We thank the reviewer for catching this inconsistency! We have now added this information to Figure 3a.

      (5) In fMRI ROI definition procedure, the top 321 voxels of each anatomical ROI that had the highest functional activation were selected. The number of voxels is based on the smaller ROI, which to my understanding means that for this ROI all the voxels were selected potentially introducing noise and impacting the comparison between ROIs. Can the authors clarify which ROI was the smallest? 

      Thank you for the question! The smallest ROI was V4. This indeed means that for this ROI all voxels were selected. This could have led to our results being noisy in V4 but should not influence the results in other ROIs. We now added this information to the methods section.  p. 15, ll. 592, Methods 5.4.4:

      “The smallest mask was V4 which included 321 voxels.”

      (6) Finally, the author suggests that: "Importantly, higher-level computations are not limited to the EVC in visually deprived brains. Natural sound representations 41 and language activations 53 are also located in EVC of sighted participants. This suggests that EVC, in general, has the capacity to process higher-level information 54. Thus, EVC in the visually deprived brain might not be undergoing fundamental changes in brain organization 53. This promotes a view of brain plasticity in which the cortex is capable of dynamic adjustments within pre-existing computational capacity limits 4,53-55." - The presence of a sighted control group would have strengthened this claim. 

      We agree with the reviewer and now discuss the limitations of our approach in the discussion section (see response to weaknesses raised by Reviewer 2 in the Public Review above).

      Reviewer #2 (Recommendations For The Authors): 

      (1) Can the authors comment on the reaction time of the two reading hands? Completely ambidextrous reading is not necessarily common, so any differences in ability or response time across the hands may affect the EEG results. Alternatively, do the authors have any additional behavioral data about the participants' ability to read well with both hands? 

      We thank the reviewer for these questions! We did not assess reaction times and acknowledge this as a limitation. We did, however, measure accuracies and would have expected to see a speed-accuracy-trade off if reaction times would differ between hands, i.e., we would have expected lower accuracy for the hand with higher RTs. But this was not the case: our participants had comparable accuracy values when reading letters with both hands (see methods section 5.3.3 and answer to Public Review above). This measure indicated that participants recognized Braille letters presented for 500ms equally well with both index fingers.

      (2) Please add information about any residual sight in the blind participants (or are they all without light perception?)

      We have now added information about residual light perception in Supplementary Table 1 (see above in response to Public Review).

      (3) Is active tactile exploration involved, or are the participants not moving their fingers at all over the piezo-actuators? Can the authors elaborate more on how the participants used this passive input?

      We thank the reviewer for the opportunity to clarify. Our experimental setup does not involve tactile exploration or sliding motions. Instead, participants rest their index fingers on the piezo-actuators and feel the static sensation of dots pushing up against their fingertips. We assume that participants used the passive input of specific dot stimulation location on fingers to perceive a dot array which, in turn, led to the percept of a braille letter.

      We now specify this information in the methods section.

      p. 13, ll. 474-475, Methods 5.2:

      “The modules were taped to the clothes of a participant for the fMRI experiment and on the table for the EEG and behavioral experiment. This way, participants could read in a comfortable position with their index fingers resting on the braille cells to avoid motion confounds. Importantly, our experimental setup did not involve tactile exploration or sliding motions. We instructed participants to read letters regardless of whether the pins passively stimulated their immobile right or left index finger.”

      (4) I appreciated the RSA analysis, but remain curious about what the ratings were based on.

      Do the authors know what parameters participants used to rate for? Were these consistent across participants? That would aid in interpreting the results.

      We thank the reviewer for the interest in our representational similarity analyses linking the neural representations to behavior. 

      We do not know which parameters participants explicitly used to rate the similarity between letters. We instructed participants to freely compare the similarity of pairs of braille letters without specifying which parameters they should use for the similarity assessment. We speculate that participants used a mixture of low-level features such as stimulation location on fingers and higher-level features such as linguistic similarity between letters. We now clarify the free comparison of braille letter pairs in the methods section:

      p. 14, ll. 538-539, Methods 5.3.4:

      “Each pair of letters was presented once, and participants compared them with the same finger. We instructed participants to freely compare the similarity of pairs of Braille letters without specifying which parameters they should use for the similarity assessment. The rating was without time constraints, meaning participants decided when they rated the stimuli. Participants were asked to verbally rate the similarity of each pair of braille letters on a scale from 1 = very similar to 7 = very different and the experimenter noted down their responses.”

      (5) Can the authors provide confusion matrices for the decoding analyses in the supplementary materials? This could be informative in understanding what pairs of letters are most discernable and where. 

      We have added confusion matrices for within- and between-hand decoding for all ROIs and for the time points 100ms, 200ms, 300ms and 400ms to the Supplementary Material (see Supplementary Figures 7-10).

      (6) Was slice time correction done for the fMRI data? This is not reported. 

      We now added this information to the methods section - our fMRI preprocessing pipeline did not include slice timing correction.  

      p. 14, ll. 554, Methods 5.4.2:

      “We did not apply high or low-pass temporal filters and did not perform slice time correction.”

    1. eLife Assessment

      This study presents a useful finding on the ferroptosis-mediated tumor microenvironment (TME) in triple-negative breast cancer (TNBC) using public single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing data. The evidence supporting the claims of the authors is somewhat incomplete and some data are rather questionable; the authors should clarify the relations between ferroptosis-related genes in immune cells and those genes applied in a risk factor analysis in tumor cells. Moreover, the authors should provide experimental validation for the risk score model based on ferroptosis-related genes. The work will be of interest to scientists or clinical scientists working in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      Triple-negative breast cancer (TNBC) accounts for approximately 15-20% of all breast cancers. Compared to other types of breast cancer, TNBC exhibits highly aggressive clinical characteristics, a greater likelihood of metastasis, poorer clinical outcomes, and lower survival rates. Immunotherapy is an important treatment option for TNBC, but there is significant heterogeneity in treatment response. Therefore, it is crucial to accurately identify immunosuppressive patients before treatment and actively seek more effective therapeutic approaches for TNBC patients.

      Strengths:

      In this work, the authors collected and integrated data from single cells and large volumes of RNA sequencing and RNA-SEQ to analyze the TME landscape mediated by genes associated with iron death. On this basis, the prediction model of prognosis and treatment response of 131 patients was constructed using a machine learning algorithm, which is beneficial to provide individualized and precise treatment guidance for breast cancer patients.

      Weaknesses:

      However, there are still some issues that need to be clarified:

      (1) The description of the research background is too brief and concise, and it is necessary to add some information about the limitations of existing methods and the differences and advantages of this study compared with other published relevant studies, so as to better highlight the necessity and research value of this study.

      (2) This study is a retrospective analysis of a public data set and lacks experimental validation and prospective experiments to support the results of bioinformatics analysis. This should be added to the acknowledgment of limitations in the study.

    3. Reviewer #2 (Public review):

      Summary:

      This study aims to explore the ferroptosis-related immune landscape of TNBC through the integration of single-cell and bulk RNA sequencing data, followed by the development of a risk prediction model for prognosis and drug response. The authors identified key subpopulations of immune cells within the TME, particularly focusing on T cells and macrophages. Using machine learning algorithms, the authors constructed a ferroptosis-related gene risk score that accurately predicts survival and the potential response to specific drugs in TNBC patients.

      Strengths:

      The study identifies distinct subpopulations of T cells and macrophages with differential expression of ferroptosis-related genes. The clustering of these subpopulations and their correlation with patient prognosis is highly insightful, especially the identification of the TREM2+ and FOLR2+ macrophage subtypes, which are linked to either favorable or poor prognoses. The risk model thus holds potential not only for prognosis but also for guiding treatment selection in personalized oncology.

      Weaknesses:

      The study has a relatively small sample size, with only 9 samples analyzed by scRNA-seq. Given the typically high heterogeneity of the tumor microenvironment (TME) in cancer patients, this may affect the accuracy of the conclusions. The scRNA-seq analysis focuses on the expression of ferroptosis-related genes in various cells within the TME. In contrast, bulk RNA sequencing uses data from tumor samples, and the results between the two analyses are not consistent. The bulk RNA sequencing results may not accurately capture the changes happening in the microenvironment.

    1. eLife Assessment

      This fundamental study substantially advances our understanding of how habitat fragmentation and climate change jointly influence bird community thermophilization in a fragmented island system. The authors provide convincing evidence using appropriate and validated methodologies to examine how island area and isolation affect the colonization of warm-adapted species and the extinction of cold-adapted species. This study is of high interest to ecologists and conservation biologists, as it provides insight into how ecosystems and communities respond to climate change.

    2. Reviewer #3 (Public review):

      Summary:

      Juan Liu et al. investigated the interplay between habitat fragmentation and climate-driven thermophilization in birds in an island system in China. They used extensive bird monitoring data (9 surveys per year per island) across 36 islands of varying size and isolation from the mainland covering 10 years. The authors use extensive modeling frameworks to test a general increase of the occurrence and abundance of warm-dwelling species and vice versa for cold-dwelling species using the widely used Community Temperature Index (CTI), as well the relationship between island fragmentation in terms of island area and isolation from the mainland on extinction and colonization rates of cold- and warm-adapted species. They found that indeed there was thermophilization happening during the last 10 years, which was more pronounced for the CTI based on abundances and less clearly for the occurrence based metric. Generally, the authors show that this is driven by an increased colonization rate of warm-dwelling and an increased extinction rate of cold-dwelling species. Interestingly, they unravel some of the mechanisms behind this dynamic by showing that warm-adapted species increased while cold-dwelling decreased more strongly on smaller islands, which is - according to the authors - due to lowered thermal buffering on smaller islands (which was supported by air temperature monitoring done during the study period on small and large islands). They argue, that the increased extinction rate of cold-adapted species could also be due to lowered habitat heterogeneity on smaller islands. With regards to island isolation, they show that also both thermophilization processes (increase of warm and decrease of cold-adapted species) was stronger on islands closer to the mainland, due to closer sources to species populations of either group on the mainland as compared to limited dispersal (i.e. range shift potential) in more isolated islands.

      The conclusions drawn in this study are sound, and mostly well supported by the results. Only few aspects leave open questions and could quite likely be further supported by the authors themselves thanks to their apparent extensive understanding of the study system.

      Strengths:

      The study questions and hypotheses are very well aligned with the methods used, ranging from field surveys to extensive modeling frameworks, as well as with the conclusions drawn from the results. The study addresses a complex question on the interplay between habitat fragmentation and climate-driven thermophilization which can naturally be affected by a multitude of additional factors than the ones included here. Nevertheless, the authors use a well balanced method of simplifying this to the most important factors in question (CTI change, extinction, colonization, together with habitat fragmentation metrics of isolation and island area). The interpretation of the results presents interesting mechanisms without being too bold on their findings and by providing important links to the existing literature as well as to additional data and analyses presented in the appendix.

      Weaknesses:

      The metric of island isolation based on distance to the mainland seems a bit too oversimplified as in real-life the study system rather represents an island network where the islands of different sizes are in varying distances to each other, such that smaller islands can potentially draw from the species pools from near-by larger islands too - rather than just from the mainland. Although the authors do explain the reason for this metric, backed up by earlier research, a network approach could be worthwhile exploring in future research done in this system. The fact, that the authors did find a signal of island isolation does support their method, but the variation in responses to this metric could hint on a more complex pattern going on in real-life than was assumed for this study.

      Comments on revisions:

      I'm happy with the revisions made by the authors.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #3 (Public review):

      Summary:

      Juan Liu et al. investigated the interplay between habitat fragmentation and climate-driven thermophilization in birds in an island system in China. They used extensive bird monitoring data (9 surveys per year per island) across 36 islands of varying size and isolation from the mainland covering 10 years. The authors use extensive modeling frameworks to test a general increase of the occurrence and abundance of warm-dwelling species and vice versa for cold-dwelling species using the widely used Community Temperature Index (CTI), as well the relationship between island fragmentation in terms of island area and isolation from the mainland on extinction and colonization rates of cold- and warm-adapted species. They found that indeed there was thermophilization happening during the last 10 years, which was more pronounced for the CTI based on abundances and less clearly for the occurrence based metric. Generally, the authors show that this is driven by an increased colonization rate of warm-dwelling and an increased extinction rate of cold-dwelling species. Interestingly, they unravel some of the mechanisms behind this dynamic by showing that warm-adapted species increased while cold-dwelling decreased more strongly on smaller islands, which is - according to the authors - due to lowered thermal buffering on smaller islands (which was supported by air temperature monitoring done during the study period on small and large islands). They argue, that the increased extinction rate of cold-adapted species could also be due to lowered habitat heterogeneity on smaller islands. With regards to island isolation, they show that also both thermophilization processes (increase of warm and decrease of cold-adapted species) was stronger on islands closer to the mainland, due to closer sources to species populations of either group on the mainland as compared to limited dispersal (i.e. range shift potential) in more isolated islands.

      The conclusions drawn in this study are sound, and mostly well supported by the results. Only few aspects leave open questions and could quite likely be further supported by the authors themselves thanks to their apparent extensive understanding of the study system.

      Strengths:

      The study questions and hypotheses are very well aligned with the methods used, ranging from field surveys to extensive modeling frameworks, as well as with the conclusions drawn from the results. The study addresses a complex question on the interplay between habitat fragmentation and climate-driven thermophilization which can naturally be affected by a multitude of additional factors than the ones included here. Nevertheless, the authors use a well balanced method of simplifying this to the most important factors in question (CTI change, extinction, colonization, together with habitat fragmentation metrics of isolation and island area). The interpretation of the results presents interesting mechanisms without being too bold on their findings and by providing important links to the existing literature as well as to additional data and analyses presented in the appendix.

      Weaknesses:

      The metric of island isolation based on distance to the mainland seems a bit too oversimplified as in real-life the study system rather represents an island network where the islands of different sizes are in varying distances to each other, such that smaller islands can potentially draw from the species pools from near-by larger islands too - rather than just from the mainland. Although the authors do explain the reason for this metric, backed up by earlier research, a network approach could be worthwhile exploring in future research done in this system. The fact, that the authors did find a signal of island isolation does support their method, but the variation in responses to this metric could hint on a more complex pattern going on in real-life than was assumed for this study.

      Thank you again for this suggestion. Based on the previous revision, we discussed more about the importance of taking the island network into future research. The paragraph is now on Lines 294-304:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections and island size could hint on a more complex pattern going on in real-life than was assumed for this study, thus reveal additional insights on fragmentation effects. For instance, smaller islands may also potentially utilize species pools from nearby larger islands, rather than being limited solely to those from the mainland. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should use a network approach to take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Great job on the revision! The new version reads well and in my opinion all comments were addressed appropriately. A few additional comments are as follows:

      Thank you very much for your further review and recognition. We have carefully modified the manuscript according to all recommendations.

      (1) L 62: replace shifts with process

      Done. We also added the word “transforming” to match this revision. The new sentence is now on Lines 61-63:

      “Habitat fragmentation, usually defined as the process of transforming continuous habitat into spatially isolated and small patches”

      (2) L 363: Your metric for habitat fragmentation is isolation and habitat area and I think this could be introduced already in the introduction, where you somewhat define fragmentation (although it could be clearer still). You could also discuss this in the discussion more, that other measures of fragmentation may be interesting to look at.

      Thank you for this suggestion. We now introduced metric of habitat fragmentation in the Introduction part after habitat fragmentation was defined. The sentence is now on Lines 64-66:

      “Among the various ways in which habitat fragmentation is conceptualized and measured, patch area and isolation are two of the most used measures (Fahrig, 2003).”

      (3) L 384: replace for with because of

      Done.

      (4) L 388: "Following this filtering, 60 ...."

      Done.

      (5) Figure 1: In panels b-d you use different terms (fragmented, small, isolated) but aiming to describe the same thing. I would highly recommend to either use fragmented islands or isolated islands for all panels. Although I see that in your study fragmentation includes both, habitat loss and isolation. So make this clear in the figure caption too...

      Thank you very much for this suggestion. It’s important to maintain consistency in using “fragmentation”. We change “fragmented, small, isolated” into “Fragmented patches” in the caption of b-d. The modified caption is now on Line 771:

      (6) L 783: replace background with habitat (or landscape) and exhibit with exemplify

      Done. The new sentence is now on Lines 782-784:

      “The three distinct patches signify a fragmented landscape and the community in the middle of the three patches was selected to exemplify colonization-extinction dynamics in fragmented habitats.”

      (7) One bigger thing is the definition of fragmentation in your study for which you used habitat area (from habitat loss process) and isolation. This could still be clarified a bit more, especially in the figures. In Fig. 1 the smaller panels b-d could all be titled fragmented islands as this is what the different terms describe in your study (small, isolated) and thus the figure would become even clearer. Otherwise I'm happy with the changes made.

      Thank you for raising this important question. Yes, “habitat fragmentation” in our research includes both habitat loss and fragmentation per se. We have clarified the caption of b-d in Figure 1 as suggested by Recommendation (5). We believe this can make it clearer to the readers.

    1. eLife Assessment

      Leveraging state-of-the-art experimental and analytical approaches, this important study characterizes the recruitment and activation of large populations of human motor units during slow isometric contractions in two lower limb muscles. Evidence for the main claims is solid and advances our understanding of how humans generate and control voluntary force.

    2. Reviewer #1 (Public review):

      Summary:

      The Avrillon et al. explore the neural control of muscle by decomposing the firing activity of constituent motor units from the grid of surface electromyography (EMG) in the Tibialis (TA) Anterior and Vastus Lateralis (VL) during isometric contractions. The study involves extensive samples of motor units across the broadest range of voluntary contraction intensities up to 80% of MVC. The authors examine rate coding of the population of motor units, which describes the instantaneous firing rate of each motor unit as a function of muscle force. This relationship is characterized by a natural logarithm function that delineates two distinct phases: an initial phase with a steep acceleration in firing rate, particularly pronounced in low-threshold motor units, and a subsequent modest linear increase in firing rate, more significant in high-threshold motor units.

      Strengths:

      The study makes a significant contribution to the field of neuromuscular physiology by providing a detailed analysis of motor unit behavior during muscle contractions in a few ways.

      (1) The significance lies in its comprehensive framework of motor unit activity during isometric contractions in the broad range of intensities, providing insights into the non-linear relationship between the firing rate and the muscle force. The extensive sample of motor units across the pool confirms the observation in animal studies in which the the spinal motoneuron exhibits a discharge consists of the distinct phases in response to synaptic currents, under the influence of persistent inward currents. As such, it is now reasonable to state the human motor units across the pool are also under control of gain modulation via some neuromodulatory effects in addition to synaptic inputs arising from ionotropic effects.<br /> (2) The firing scheme across in the entire motoneuron pool revealed in this study reconciles the discrepancy in firing organization under debate; i.e., whether it is 'onion skin' like or not (Heckman and Enoka 2012). The onion skin like model states that the low threshold motor units discharge higher than high threshold motor units and has been held for long time because the firing behaviors were examined in a partial range of contraction force range due to technical limitations. This reconciliation is crucial because it is fundamental to modelling the organization of motor unit recruitment and rate coding to achieve a desired force generation to advance our understanding of motor control.<br /> (3) The extensive data collection with a novel blind source separation algorithm on the expanded number of channel of surface EMG signal provides a robust dataset that enhances the reliability and validity of findings, setting a new standard for empirical studies in the field. \par<br /> Collectively, this study fills several knowledge gaps in the field and advances our understanding the mechanism underlying the isometric force generation.

    3. Reviewer #2 (Public review):

      Avrillon et al. provides a comprehensive assessment of firing rate parameters from a large percentage of the motor unit pool, in two muscles, during voluntary isometric contractions. The authors have used new quantitative methods to extract more unique motor units across contractions than prior studies. This was achieved by recording muscle fibre action potentials from four high density surface electromyogram (HDsEMG) arrays, quantifying residual EMG comparing the recorded and data-based simulation (Fig. 1A-B), and developing a metric to compare the spatial identification for each motor unit (Fig. 1D-E). From identified motor units, the authors have provided a detailed characterization of recruitment and firing rate responses during slow voluntary isometric contractions in the vastus lateralis and tibialis anterior muscles up to 75-80% of maximum intensity. In the lower limb it is interesting how lower threshold motor units have firing rate responses that saturate, whereas higher threshold units that presumably produce higher muscle contractile forces continue to increase their firing rate. Conceptually, the authors rightly focus on the literature of intrinsic motoneurone properties, but in vivo, other possibilities (that are difficult to measure in awake human participants) are that the form of descending supraspinal drive, spinal network dynamics and afferent inputs may have different effects across motor unit sizes, muscles and types of contractions. These results from single trail contractions and with a larger sample of motor units, supports the summary rate coding profiles of motor units in the extensor digitorum communis muscle (Monster and Chan, 1977).

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting manuscript which uses state of the art experimental and simulation approaches to quantify motor unit discharge patterns in the human TA and VL. The non-linear profiles of motor unit discharge were calculated and found to have an initial acceleration phase followed by an attenuation phase. Lower threshold motor units had a larger gain of the initial acceleration whereas the higher threshold motor unit had a higher gain in the attenuation phase. These data represent a technical feat and are important for understanding how humans generate and control voluntary force.

      Strengths:

      The authors used rigorous, state-of-the art analyses to decompose and validate their motor unit data during a wide range of voluntary efforts.

      Analyses are clearly presented, applied, and visualized.

      The supplemental data provides important transparency.

      Weaknesses:

      Number of participants and muscles tested are relatively small - particularly given the constraints on yield. It is unclear if this will translate to other motor pools. The justification for TA and VL should be provided.

      While in impressive effort was made to identify and track motor units across a range of contractions, it appears that a substantial portion of muscle force was not identified. Though high intensity contractions are challenging to decompose - the authors are commended in their technical ability in recording population motor unit discharge times with recruitment thresholds up to 75% a participant's maximal voluntary contractions. However previous groups have seen substantial recruitment motor units above 80% and even 90% maximum activation in the soleus. Given the innervation ratios of higher threshold motor units, if recruitment continued to 100%, the top quartile would likely represent a substantial portion of the traditional fast-fatigable motor units. It would be highly interesting to understand the recruitment and rate coding of the highest threshold motor units, at a minimum I would suggest using terms other than "entire range" or "full spectrum of recruitment thresholds"

      The quantification of hysteresis using torque appears to make self-evident the observation that lower threshold motor units demonstrate less hysteresis with respect to torque - If there was motor unit discharge there will be force. I believe this limitation goes beyond the floor effects discussed in the manuscript. Traditionally individuals have used the discharge of a lower threshold unit as the measure on which to apply hysteresis analyses to infer ion channel function in human spinal motoneurons.

      The main findings are not entirely novel. See Monster and Chan 1977 and Kanosue et al 1979

      Comments on revisions:

      I thank the authors for their thoughtful revision.

      Just to confirm, the ranges for motor unit yield are for a single contraction. So, for example, in a participant there were 71 unique and concurrently active VL motor units able to be decomposed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This study explores the neural control of muscle by decomposing the firing activity of constituent motor units from the grid of surface electromyography (EMG) in the Tibialis (TA) Anterior and Vastus Lateralis (VL) during isometric contractions. The study involves extensive samples of motor units across the broadest range of voluntary contraction intensities up to 80% of MVC. The authors examine the rate coding of the population of motor units, which describes the instantaneous firing rate of each motor unit as a function of muscle force. This relationship is characterized by a natural logarithm function that delineates two distinct phases: an initial phase with a steep acceleration in firing rate, particularly pronounced in low-threshold motor units, and a subsequent modest linear increase in firing rate, more significant in high-threshold motor units. 

      Strengths: 

      The study makes a significant contribution to the field of neuromuscular physiology by providing a detailed analysis of motor unit behavior during muscle contractions in a few ways.

      (1) The significance lies in its comprehensive framework of motor unit activity during isometric contractions in a broad range of intensities, providing insights into the non-linear relationship between the firing rate and the muscle force. The extensive sample of motor units across the pool confirms the observation in animal studies in which the spinal motoneuron exhibits a discharge consisting of distinct phases in response to synaptic currents, under the influence of persistent inward currents. As such, it is now reasonable to state the human motor units across the pool are also under the control of gain modulation via some neuromodulatory effects in addition to synaptic inputs arising from ionotropic effects.

      (2) The firing scheme across the entire motoneuron pool revealed in this study reconciles the discrepancy in firing organization under debate; i.e., whether it is 'onion skin' like or not (Heckman and Enoka 2012). The onion skin like model states that the low threshold motor units discharge higher than high threshold motor units and have been held for a long time because the firing behaviors were examined in a partial range of contraction force range due to technical limitations. This reconciliation is crucial because it is fundamental to modelling the organization of motor unit recruitment and rate coding to achieve a desired force generation to advance our understanding of motor control.

      (3) The extensive data collection with a novel blind source separation algorithm on the expanded number of channels of surface EMG signal provides a robust dataset that enhances the reliability and validity of findings, setting a new standard for empirical studies in the field. 

      Collectively, this study fills several knowledge gaps in the field and advances our understanding of the mechanism underlying the isometric force generation.

      We thank the reviewer for their positive appreciation of our work.

      Weaknesses: 

      Although the findings and claims based on them are mostly well aligned, some accounts of the methods and claims need to be clarified.

      (1) The authors examine the input-output function of a motor unit by constructing models, using force as an input and discharge rate as an output. It sounds circular, or the other way around to use the muscle force as an input variable, because the muscle force is the result of motor unit discharges, not the cause that elicits the discharges. More specifically, as a result of non-linear interactions of synchronous and/or asynchronous discharges of a population of a given motoneuron pool that give rise to transient increase/maintenance in twitch force, the gross muscle force is attained. I acknowledge that it is extremely challenging experimentally to measure synaptic currents impinging upon the spinal motoneurons in human subjects and the author has an assumption that the force could be used as a proxy of synaptic currents. However, it is necessary to explicitly provide the caveats and rationale behind that. Force could be used as the input variable for modelling.

      Force is indeed used in this study as a proxy of the common excitatory synaptic currents as their direct measurement is not possible in vivo in humans. It is worth noting that this approach has been extensively used in the past by many groups to study rate coding (e.g., Monsters & Chan, De Luca’s, Heckman’s, and Fuglevand’s groups). Heckman’s, Gorassini’s, Fuglevand’s groups and others have considered the non-linearities in the relation between motor unit firing rates and muscle force in humans as an indicator of the impact of neuromodulation on motor unit behaviour and changes of the intrinsic properties of motoneurons.

      One could also use the cumulative spike train as a more direct estimate of common excitatory inputs, assuming that it is possible to identify a group of motor units not influenced by PICs, as done when selecting a reference low-threshold motor neuron in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020). However, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did to generate force/firing rate relations on the widest range of force.

      We added a sentence in the discussion to highlight this limitation (P19, L470):

      ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020)’.

      (2) The authors examine the firing organizations in TA and VL in this study without explicit purposes and rationale for choosing these muscles. The lack of accounts makes it hard for the readers to interpret the data presented, particularly in terms of comparing the results from the different muscles.

      We wanted to compare the rate coding of pools of motor units from proximal (VL) and distal (TA) muscles within the lower limb. Indeed, distal and proximal muscles exhibit differences in rate coding and spatial recruitments (De Luca et al., 1982, J Physiol), potentially due to different levels of recurrent inhibition (Cullheim & Kellerth, 1978, J Physiol; Rossi & Mazzocchio, 1991, Exp Brain Res; Edgley et al., 2021, J Neurosci) or different levels of neuromodulation depending on their involvement (or not) in postural control (Hoonsgaard et al., 1988, J Physiol; Kim et al., 2020, J Neurophysiol).

      We added a paragraph at the beginning of the result section to support our muscle choice (P6; L137): ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.

      Another factor that guided our muscle choice was the low risk of crosstalk. For this, we verified with ultrasound that our arrays of 256 electrodes only covered the muscle of interest, staying away from the neighbouring muscles. This was possible as superficial muscles from the leg are bulkier than those from the upper limb. Given the small diameter of each electrode (2 mm), it is unlikely that the motor units from the neighbouring muscles were in the recorded muscle volume (Farina et al., 2003, IEEE Trans Biomed Eng)

      (3) In the methods, the author described the manual curation process after applying the blind source separation algorithm. For the readers to understand the whole process of decomposition and to secure rigor and robustness of the analyses, it would be necessary to provide details on what exact curation is performed with what criteria. 

      The manual curation of EMG decomposition with blind source separation is different from what is classically done with intramuscular EMG and template-matching algorithms. 

      In short, our decomposition algorithm uses fast independent component analysis (fastICA) to retrieve motor unit spike trains from the EMG signals. For this, it iteratively optimises a set of weights, i.e., a separation vector, for each motor unit. The projection of the EMG signals on this separation vector generates a sparse motor unit pulse train, with most of its samples close to zero and only a few samples close to one (Figure 1B). The discharge times are estimated from this motor unit pulse train using a peak detection function and a k-mean classification with two classes to separate the high peaks (spikes) from the low peaks (noise and other motor units).

      The manual curation consists of inspecting the automatic detection of the peaks of the motor unit pulse train and manually add missed peaks (missed discharge times) or remove wrongly detected peaks. Then, the separation vector is updated using the correct discharge times and the motor unit pulse train recalculated. This procedure generally improves the distance between the discharge times and the noise, which confirm the accuracy of the manual curation. If that’s not the case, the motor unit is discarded from the analyses.

      We added a section on manual editing in the methods (P23, L615):

      ‘At the end of these automatic steps, all the motor unit pulse trains and identified discharge times were visually inspected, and manual editing was performed to correct the false identification of artifacts or the missed discharge times (Del Vecchio et al., 2020; Hug et al., 2021; Avrillon et al., 2023). The manual editing consisted of i) removing the spikes causing erroneous discharge rates (outliers), ii) adding the discharge times clearly separated from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the entire EMG signals, and v) repeating this procedure until the selection of all the discharge times is achieved. The manual editing of potential missed discharge times and falsely identified discharge times was never immediately accepted. Instead, the procedure was consistently followed by the application of the updated motor unit separation vector on the entire EMG signals to generate a new motor unit pulse train. Then, the manual editing was only accepted when the silhouette value increased or stayed well above the threshold of 0.9 quantified with the silhouette value (Negro et al., 2016b). Only these motor units were retained for further analysis.’

      (4) In Figure 3, the early recruited units tend to become untraceable in the higher range of contraction. This is more pronounced in the muscle VL. This limitation would ambiguate the whole firing curve along the force axis and therefore limitation and the applicability in the different muscles needs to be discussed. 

      The loss of low threshold motor units in the higher range of contractions was caused either by the decrease in signal-to-noise ratio for small motor units when many larger ones are recruited, or by the cancellation of the surface action potentials of the small units in the interference electromyographic signal, or by the recruitment of a motor unit with a very similar spatio-temporal filter (an example is shown in the figure below). In the latter case, the motor unit pulse train contains peaks that represent the discharge times of both motor units (green and red dots in the simulated example below), making them undistinguishable by the operator during manual editing.

      Author response image 1.

      This was discussed in the results (P7; L190):

      ‘On average, we tracked 67.1 ± 10.0% (25th–75th percentile: 53.9 – 80.1%) of the motor units between consecutive contraction levels (10% increments, e.g., between 10% and 20% MVC) for TA and 57.2 ± 5.1% (25th–75th percentile: 46.6 – 68.3%) of the motor units for VL (Figure S2). There are two explanations for the inability to track all motor units across consecutive contraction levels. First, some motor units are recruited at higher targets only. Second, it is challenging to track small motor units beyond a few contraction levels due to a lower signal-to-noise ratio for the small motor units when larger motor units are recruited, or signal cancellation (Keenan et al., 2005; Farina et al., 2014a).’

      However, we believe that it had a limited impact on the output of the paper, as the non-linear portion of the rate coding/force relation due to the persistent inward currents occurs during the first seconds after recruitment, before plateauing (for a review see Binder et al., 2020, Physiology).

      (5) It is unclear how commonly the notion "the long-held belief that rate coding is similar across motor units from the same pool" is held among the community without a reference. Different firing organizations have been modelled and discussed in the seminal paper by Fuglevand et al. (1993) and as far as I understand, the debate has not converged to a specific consensus. As such, any reference would be required to support the claim the notion is widely recognized.

      In the paper of Fuglevand et al., (1993, J Neurophysiol), all the motor units had the same rate coding pattern relative to the excitatory input, though they changed the slope of the relations and the saturation threshold of motor units between simulations. This is similar to the paper of De Luca & Contessa (2012, J Neurophysiol), where the equation used to simulate the rate coding was non-linear, but consistent across motor units.  

      We added these citations to the text:

      ‘Overall, we found that motor units within a pool exhibit distinct rate coding with changes in force level (Figure 2 and 3), which contrasts with the long-held belief that rate coding is similar across motor units from the same pool (Fuglevand et al., 1993; De Luca and Contessa, 2012).’

      (6) The authors claim that the firing behavior as a function of force is well characterized by a natural logarithmic function, which consists of initial steep acceleration followed by a modest increase in firing rate. Arguably the gain modulation in firing rate could be attributed to a neuromodulatory effect on the spinal motoneuron, which has been suggested by a number of animal studies. However, the complexity of the interactions between ionotropic and neuromodulatory inputs to motoneurons may require further elucidation to fully understand the mechanisms of neural control; it is possible to consider the differential acceleration among different threshold motor units as a differential combinatory effect of ionotropic and neuromodulatory inputs, but it is not trivially determined how differentially or systematically the inputs are organized. Likewise, the authors make an account for the difference in firing rate between TA and VL in terms of different amounts or balances of excitatory and inhibitory inputs to the motoneuron pool, but again this could be explained by other factors, such as a different extent of neuromodulatory effects. To determine the complexity of the interactions, further studies will be warranted.

      We appreciate the reviewer’s view on this point, as we indeed only indirectly inferred the combination of neuromodulatory and ionotropic inputs to motoneurons in this study. A more direct manipulation of the sources of neuromodulatory and ionotropic inputs will be required in the future to directly highlight the mechanisms responsible for these variations in rate coding within pools. However, it is also worth noting that the acceleration in firing rate, the increase in firing rate during the ramp up, and the hysteresis between ramps up and downs have been used to infer the distribution of ionotropic and neuromodulatory inputs from the firing rate/force relations (Johnson et al., 2017; Beauchamp et al., 2023; Chardon et al., 2023). This approach has been validated with hundreds of thousands of simulations using a biophysical model of motor neurons (Chardon et al., 2023). There is also a series of studies in humans showing how the absence of neuromodulation modulated via inhibitory inputs (Revill & Fuglevand, 2017) or medication blocking serotonin receptors (Goodlich et al., 2023) impact the non-linearity of the firing rate/force relation. Therefore, we are confident that the differences observed within and between pools are linked to different distribution of excitatory/inhibitory inputs and neuromodulation.

      We added a sentence in the discussion to highlight this point (P18; L435):

      ‘Taken together, these results show how ionotropic and neuromodulatory inputs to motoneurons uniquely combine to generate distinct rate coding across the pool, even if a more direct manipulation of the sources of neuromodulatory and ionotropic inputs will be required to directly estimate their interactions.’

      (7) It is unclear with the account " ... the bandwidth of muscle force is < 10Hz during isometric contraction" in the manuscript alone, and therefore, it is difficult to understand the following claim. It appears very interesting and crucial for motor unit discharge and force generation and maintenance because it would pose a question of why the discharge rate of most motor units is higher than 10Hz, despite the bandwidth being so limited, but needs to be elaborated.

      We described the slow fluctuations in smoothed firing rates associated with the variations in force observed during isometric contractions. The bandwidth of muscle force is lower than 10Hz due to the contractile properties of muscle tissues (Baldissera et al., 1998, J Physiol). Having an average firing rate higher than this bandwidth enables the pool of motor neurons to effectively transmit the common inputs (the main discriminant of muscle force) over this bandwidth without distortion (Farina et al., 2014, J Physiol). Increasing the firing rate beyond the muscle bandwidth also increases the power of the spike train at the direct current frequency (frequency equal to 0) since this power is related to the number of spikes per second. Thus, increasing the firing rate well beyond the muscle bandwidth still has a clear effect in force. To illustrate this point, note that electrical stimuli delivered at 100 Hz can lead to an increase in muscle force.

      Reviewer #2 (Public Review):  

      Summary: 

      The motivation for this study is to provide a comprehensive assessment of motor unit firing rate responses of entire pools during isometric contractions. The authors have used new quantitative methods to extract more unique motor units across contractions than prior studies. This was achieved by recording muscle fibre action potentials from four high-density surface electromyogram (HDsEMG) arrays (Caillet et al., 2023), quantifying residual EMG comparing the recorded and data-based simulation (Figure 1A-B), and developing a metric to compare the spatial identification for each motor unit (Figure 1D-E). From identified motor units, the authors have provided a detailed characterization of recruitment and firing rate responses during slow voluntary isometric contractions in the vastus lateralis and tibialis anterior muscles up to 80% of maximum intensity. In the lower limb, it is interesting how lower threshold motor units have firing rate responses that saturate, whereas higher threshold units that presumably produce higher muscle contractile forces continue to increase their firing rate. In many ways, these results agree with the rate coding of motor units in the extensor digitorum communis muscle (Monster and Chan, 1977). The paper is detailed, and the analyses are well explained. However, there are several points that I think should be addressed to strengthen the paper.

      We thank the reviewer for their positive appreciation of our work.

      General comments: 

      (1) The authors claim they have measured the complete rate coding profiles of motor units in the vastus lateralis and tibialis anterior muscles. However, this study quantified rate coding during slow and prolonged voluntary isometric contractions whereas the function of rate coding during movements (Grimby and Hannerz, 1977) or more complex isometric contractions (Cutsem and Duchateau, 2005; Marshall et al., 2022) remains unexplored. For example, supraspinal inputs may not scale the same way across low and higher threshold motor units, or between muscles (Devanne et al., 1997), making the response of firing rates to increasing isometric contraction force less clear. 

      We agree with the reviewer that rate coding strategies may vary with the velocity and the type of contractions (Duchateau & Enoka, 2008, J Physiol). It is thus likely that the firing rate would increase during the first milliseconds of fast contractions, with the occurrence of doublets (Cutsem and Duchateau, 2005, J Physiol; Del Vecchio et al., 2019, J Physiol), or that motor unit firing rate may be lower during lengthening than shortening contractions (Duchateau & Enoka, J Physiol). 

      However, the decomposition of EMG signals in non-stationary conditions remains challenging, and is still limited to slow varying patterns of force (Chen et al., 2000, Oliveira & Negro, 2021, Mendez Guerra et al., 2024, Yeung et al., 2024). Future methodological developments will be required to expand our findings to other patterns of force.

      Conceptually, the authors focus on the literature on intrinsic motoneurone properties, but in vivo, other possibilities are that descending supraspinal drive, spinal network dynamics, and afferent inputs have different effects across motor unit sizes, muscles, and types of contractions. Also, the influence from local muscles that act as synergists (e.g., vastii muscles for the vastus lateralis, and peroneal muscles that evert the foot for the tibialis anterior) or antagonists (coactivation during higher contraction intensities would stiffen the joint) may provide differential forms of proprioceptive feedback across motor pools. 

      The reviewer is right that differences in spinal network dynamics and afferent inputs may explain the differences in rate coding observed between the two muscles. Indeed, computational models have shown how the pattern of inhibitory inputs may affect the increase in firing rate during linear increase in force (Powers & Heckman, 2017, J Neurophysiol; Chardon et al., 2023, Elife). Specifically, the difference observed between proportional inhibitory inputs vs. a push pull pattern mirror the differences observed here between the TA (push-pull like pattern) and the VL (proportional pattern). This difference may reflect the impact of various pathways of inhibition, such as reciprocal inhibition or recurrent inhibition from homonymous motor units or motor units from synergistic muscles. 

      These points have been further discussed in the manuscript (P19; L475):

      ‘The increase in firing rate was also significantly greater for TA motor units than for those in VL. This difference may reflect a varying balance between excitatory/inhibitory synaptic inputs and neuromodulation due to multiple spinal circuits (Heckman and Binder, 1993; Heckman et al., 2008; Johnson et al., 2017; Powers and Heckman, 2017; Chardon et al., 2023; Škarabot et al., 2023). Specifically, the strength of recurrent and reciprocal inhibitory inputs to motoneurons innervating VL and TA, and their proportional or inverse covariation with excitatory inputs, respectively, may explain the differences in rate limiting and maximal firing rates (Heckman and Binder, 1993; Heckman et al., 2008; Johnson et al., 2017; Powers and Heckman, 2017; Chardon et al., 2023; Škarabot et al., 2023). Thus, the motor units from the VL may receive more recurrent inhibition than those of distal muscles, though direct evidence of these differences remains to be found in humans (Windhorst, 1996). Interestingly, similar differences in rate coding were previously observed between proximal and distal muscles of the upper limb (De Luca et al., 1982). However, other muscles that serve different functions within the human body, such as muscles from the face, have different rate coding characteristics with much higher firing rates (Kirk et al., 2021). Future work should investigate those muscles and other to reveal the myriads of rate coding strategies in human muscles.’

      (2) The evidence that the entire motor unit pool was recorded per muscle is not clear. There appears to be substantial residual EMG (Figure 1B), signal cancellation of smaller motor units (lines 172-176), some participants had fewer than 20 identified motor units, and contractions never went above 80% of MVC. Also, to my understanding, there remains no gold-standard in awake humans to estimate the total motor unit number in order to determine if the entire pool was decomposed. 

      The reviewer is right that we did not decode the full pool of motor units. As indicated in the initial version of the manuscript (e.g. title, introduction), we considered that we identified an extensive sample of motor units representative of the dynamic of the pool. This claim was supported by the identification of motor units with recruitment thresholds ranging from 0 to 75% of the maximal force. 

      This statement was in the introduction (P4; L109): ‘We were able to identify up to ~200 unique active motor units per muscle and per participant in two human muscles in vivo, yielding extensive samples of motor units that are representative of the entire motoneuron pools (Caillet et al., 2023a).’

      Furthermore, using four HDsEMG arrays also raises questions about how some channels were placed over non-target muscles, and if motor units were decomposed from surrounding synergists.

      A factor that guided our muscle choice was the low risk of crosstalk. For this, we verified with ultrasound that our arrays of 256 electrodes only covered the muscle of interest, staying away from the neighbouring muscles. This was possible as superficial muscles from the leg are bulkier than those from the upper limb. Given the small diameter of each electrode (2 mm), it is unlikely that the motor units from the neighbouring muscles were in the recorded muscle volume.

      (3) The authors claim (Abstract L51; Discussion L376) that a commonly held view in the field is that rate coding is similar across motor units from the same pool. Perhaps this is in reference to some studies that have carefully assessed lower threshold motor units during lower force ramp contractions (e.g., Fuglevand et al., 2015; Revill and Fuglevand, 2017). However, a more complete integration of the literature exploring motor unit firing rate responses during rapid isometric contractions, comparing different muscles and contraction intensities would be helpful. From Figure 3, the range of rate coding in the tibialis anterior (~7-40 Hz) is greater than the vastus lateralis (~5-22 Hz) muscle across contraction levels. In agreement with other studies, the range of rate coding within some muscles is different than others (Kirk et al., 2021) and during maximal intensity (Bellemare et al., 1983) or rapid contractions (Desmedt and Godaux, 1978). Likewise, within a motor pool, there is a diversity of firing rate responses across motor units of different sizes as a function of isometric force (Monster and Chan, 1977; Desmedt and Godaux, 1977; Kukula and Clamann, 1981; Del Vecchio et al., 2019; Marshall et al., 2022). A strength of this paper is how firing rate responses are quantified across a wide range of motor unit recruitment thresholds and between two muscles. I suggest improving clarity for the general reader, especially in the motivation for testing two lower limb muscles, and elaborating on some of the functional implications.

      We thank the reviewer for his input on this question. We have added references to these works and lines of research in the discussion:

      (P18; L449): ‘In addition, rate coding patterns should also vary with the pattern of contractions, with fast contractions lowering the range of recruitment thresholds within motoneuron pools (Desmedt and Godaux, 1977b, 1979; van Bolhuis et al., 1997). The variability in rate coding observed here between motor units from the same pool could lead to small deviations from the size principle sometimes observed between pairs of units during isometric contractions with various patterns of force (Desmedt and Godaux, 1979; Marshall et al., 2022) or during the derecruitment phase (Bracklein et al., 2022).’ (P19; L487): ‘However, other muscles that serve different functions within the human body, such as muscles from the face, have different rate coding characteristics with much higher firing rates (Kirk et al., 2021). Future work should investigate those muscles and other to reveal the myriads of rate coding strategies in human muscles.’

      In addition to the responses above, we have added a section at the beginning of the results to motivate the choice of the muscles (P6; L137):

      ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.

      Reviewer #3 (Public Review): 

      Summary: 

      This is an interesting manuscript that uses state-of-the-art experimental and simulation approaches to quantify motor unit discharge patterns in the human TA and VL. The non-linear profiles of motor unit discharge were calculated and found to have an initial acceleration phase followed by an attenuation phase. Lower threshold motor units had a larger gain of the initial acceleration whereas the higher threshold motor unit had a higher gain in the attenuation phase. These data represent a technical feat and are important for understanding how humans generate and control voluntary force. 

      Strengths: 

      The authors used rigorous, state-of-the-art analyses to decompose and validate their motor unit data during a wide range of voluntary efforts.

      The analyses are clearly presented, applied, and visualized. 

      The supplemental data provides important transparency. 

      We thank the reviewer for their positive appreciation of our work.

      Weaknesses: 

      The number of participants and muscles tested are quite small - particularly given the constraints on yield. It is unclear if this will translate to other motor pools. The justification for TA and VL should be provided.

      One strength of our study is to provide relations between key-parameters of rate coding (acceleration in firing rate, increase in firing rate, hysteresis) and the recruitment thresholds of motor units within two different pools, and for each individual participant. These relations were consistent across all the participants (Figures 2 to 4), making us confident that increasing the sample size would not change the conclusions of the study.

      It is likely that the differences observed here between the VL and TA will also appear between other muscles of the leg, due to differences in the arrays of excitatory and inhibitory inputs they receive, the pattern of inhibitory inputs during increases in force (recurrent/reciprocal inhibition), and different levels of neuromodulation (Johnson et al., 2017, J Neurophysiol; Beauchamp et al., 2023; J Neural Eng). We have added a paragraph in the results to motivate our choice of muscles (P6; L137):

      ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.

      While an impressive effort was made to identify and track motor units across a range of contractions, it appears that a substantial portion of muscle force was not identified. Though high-intensity contractions are challenging to decompose - the authors are commended for their technical ability to record population motor unit discharge times with recruitment thresholds up to 75% of a participant's maximal voluntary contractions. However previous groups have seen substantial recruitment of motor units above 80% and even 90% maximum activation in the soleus. Given the innervation ratios of higher threshold motor units, if recruitment continued to 100%, the top quartile would likely represent a substantial portion of the traditional fast-fatigable motor units. It would be highly interesting to understand the recruitment and rate coding of the highest threshold motor units, at a minimum I would suggest using terms other than "entire range" or "full spectrum of recruitment thresholds"

      Motor units were indeed identified between 0 and 80% of the maximal force in this study. This is due to the requirements of the decomposition algorithm that needs sustained and stable contraction to converge toward a set of separation vectors that generate sparse spike trains. Thus, it was not possible for our participants to sustain contractions above 80%MVC without generating fatigue.

      However, it is important to note that only a few motor units are recruited above 80% of the maximal force in the TA (Van Cutsem et al., 1998, J Physiol), as well as in other muscles of the lower limb (Oya et al., 2009, J Physiol; Aeles et al., 2020, J Neurophysiol). Thus, we may have only missed a few motor units recruited above 80% of the maximal force. Nevertheless, we removed the terms ‘full spectrum of recruitment thresholds’ and ‘entire range’ from the manuscript to now read ‘most of the spectrum of recruitment thresholds observed in humans.’.

      The quantification of hysteresis using torque appears to make self-evident the observation that lower threshold motor units demonstrate less hysteresis with respect to torque. If there is motor unit discharge there will be force. I believe this limitation goes beyond the floor effects discussed in the manuscript. Traditionally, individuals have used the discharge of a lower threshold unit as the measure on which to apply hysteresis analyses to infer ion channel function in human spinal motoneurons.

      We agree with the reviewer that the hysteresis is classically estimated using the firing rate of a ‘reporter unit’ with the delta F method (introduced in humans by Gorassini et al..), or most recently with the advances in motor unit identification using the cumulative spike train of the identified motor unit. The researchers use this data as a proxy of the synaptic drive, and compare their values at recruitment and derecruitment thresholds of the ‘test unit’. 

      As mentioned above in response to reviewer 1, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did here to generate force/firing rate relations on the widest range of force. This limitation is now highlighted in the discussion section (P19; L470): ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020).’.

      The main findings are not entirely novel. See Monster and Chan 1977 and Kanosue et al 1979. 

      We agree with the reviewer that the results of the paper are remarkably aligned with previous experimental findings in humans, in animals, or with in vitro and in silico models. However, we believe that our study shows in humans the incredible variety of rate coding patterns within a pool of motor units that span most of the spectrum of recruitment thresholds observed in humans. It also highlights the variability of rate coding patterns between motor neurons that have a similar recruitment threshold. Finally, we observe differences between pools of motor neurons innervating two different muscles in the lower limb, mirroring what has been done in the past in the upper limb muscle. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      The wording 'decode' across the manuscript may sound somewhat unsuitable for the context, because 'decode' would involve interpreting the signals and activities to understand how they relate to specific variables or proxies of behavior. Here in this study it does not necessarily involve the interpretation, but sounds to be used for decomposing the signal into the constituent motor units. As such, it might be appropriate to use other words such as decompose, read out, or extract.

      ‘Decode’ was removed from the manuscript to now read motor unit ‘identification’

      Reviewer #2 (Recommendations For The Authors): 

      Figures 1 and 2 are informative and interesting. Figures 3 and 4 are harder to interpret. For example, in Figure 4, data plotted along the diagonal is overplotted and not as informative.

      For the sake of clarity, we separated the lines of the fits and the scatter plots in in the right panels in Figure 3. In Figure 4, we remove the scatter plots and only reported the lines of the fits for each participant. 

      Do you think the different durations of the isometric plateau across contraction intensities influenced motor unit derecruitment? Longer duration in lower threshold motor units would have resulted in a larger effect of PICs?

      We did not find an effect of the duration of the plateau on the derecruitment threshold. Notably, a computational study found that the duration of the plateau may impact the delta F, due to the combination of PICs, spike threshold accommodation and spike frequency adaptation (Revill & Fuglevand, 2011, J Neurophysiol). However, we did not use the delta F value here to estimate the effect of PICs on the hysteresis. 

      L703. For the measure of firing rate hysteresis the difference between recruitment and derecruitment was calculated, but why not use the delta-F method? This is more commonly used to assess hysteresis as a rough estimate of intrinsic dynamics.

      As further discussed above, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did here to generate force/firing rate relations on the widest range of force.

      This was mentioned in the discussion (P19; L470):

      ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020).’

      L144. The standard deviation seems high. Some participants had fewer than 20 motor units and your number of participants per muscle was eight, could you state the complete range?

      A table was added in the results section to indicate the yields of the decomposition per contraction.

      If other studies are able to randomly sample motor units with intramuscular electrodes does this also represent an estimate of rate coding from the 'entire' pool? One criticism of HDsEMG arrays is that they are biased towards decomposing superficial larger motor units and in the male sex. 

      The decomposition of EMG signals recorded with arrays of surface electrodes is indeed biased toward the identification of motor units with the larger action potentials in the signal (large and superficial; Farina & Holobar, 2016, Proceedings of IEEE). We took advantage of the latter limitation by performing successive contractions at different levels of force with the objective to identify the last recruited motor units (larger units according to the size principle), while tracking the smaller ones. In that way, we were able to sequentially identify motor units recruited from 0% to 75% of the maximal force. A similar approach could be applied to selective intramuscular electrodes. However, because identifying motor units up to maximal force requires a highly selective pair of fine wires or needle electrodes, the procedure described above should be repeated hundreds of times to reach the same samples as those obtained in our study.

      L151-161. The ratio between simulated and decomposed surface EMG reached 55% for the TA and 70% for the VL. How does this provide support that the "entire" MU pool was sampled?

      As said above, we do not identify all the motor units during each contraction, but rather the larger ones with the larger action potentials within the EMG signals. However, we used here a sequential approach to identify new motor units during each trial while tracking smaller units. In that way, we were able to sequentially identify on average 130 motor units per muscle.

      To avoid any confusion, we removed the references to ‘entire’ pools in the manuscript.  

      L266. How is it possible that in some participants no motor units were recruited below 5% of MVC? Do the authors suspect they produced force from synergist muscles or that the decomposition failed to identify these presumably smaller and deeper motor units?

      This mostly results from the limitations of the decomposition algorithm. In these participants, it is likely that the decomposition was biased toward motor units only active during the plateau of force or recruited at the end of the ramp.

      Figure 2B. Do the higher threshold motor units with linear responses receive more inhibitory input (coactivation) or are devoid of large PIC effects?

      Were antagonist muscles recorded? During higher contraction intensities, greater antagonist coactivation in some trials or participants may have linearized the firing rate profiles (e.g., Revill and Fuglevand, 2017).

      L427. This is a neat finding that higher threshold motor units are less likely to have the functional  hallmark of a strong PIC effect and may therefore be more representative of extrinsic inputs. Could this be an advantage to increase the precision of stronger contractions or reduce the fatigability of muscle fibres during repeated strong contractions?

      Synaptic contacts with Renshaw cells (Fyffe, 1991, J Neurophysiol) and Ia inhibitory interneurons (Heckman & Binder, 1991, J Neurophysiol) are widespread within pools of motor units, which induces homogeneously distributed inhibitory inputs. However, the amplitude of these inhibitory inputs can increase with muscle force. We found that the EMG amplitude of the soleus and the gastrocnemius medialis recorded with bipolar EMG during the dorsiflexion increased with the force. Therefore, the higher inhibitory at higher force may also contribute to the linearisation of the force/firing rate relations observed with high threshold motor neurons, as suggested by Revill and Fuglevand (2017, J Physiol). 

      We discussed this point in the new version of the manuscript (P17; L415):

      ‘The level of recurrent and reciprocal inhibition has also probably increased with the increase in force during the ramp up, progressively blunting the effect of persistent inward currents for late-recruited motor units (Kuo et al., 2003; Hyngstrom et al., 2007; Revill and Fuglevand, 2017). This may also explain the larger percentage of high-threshold motor units with a linear fit for the firing rate/force relation (Figure 2), as the integration of larger inhibitory inputs should linearise the firing rate/force relation (Revill and Fuglevand, 2017).’. 

      In Figure 2B, it makes sense that linear firing rate responses occur later in the ramp contraction when myotendinous slack is lower. Do the authors think contractile dynamics are matched to the firing rate profiles?

      To our knowledge, there is no direct data on the link between the linearity of the force/firing rate relation and the stiffness of the tendon. A recent work from Mazzo et al. (2021, J Physiol) has shown that repeated stretches of calf muscles, which induce a decrease in their stiffness, induced an increase in motor unit firing rate at low levels of forces. This indicates that the contractile properties of the muscle may potentially also impact the profile of rate coding when considered as function of force. 

      We added this point in the discussion (P20; L512):

      ‘On a different note, the steep increase in firing rate over the first percentages of the ramp-up may also enable the motor units to produce the required level of force despite having a more compliant muscletendon unit (Mazzo et al., 2021).’

      L371. It is likely that Marshall et al., 2022, recorded over 100 unique motor units from the same animal.

      The reviewer is right that Marshall may have identified hundreds of motor units across sessions in one non-human primate. However, there is no ways to verify this statement as they used fine wire electrodes inserted in different locations in each session, which made it impossible to verify the uniqueness of each identified unit. Conversely, we verified in our study that all the motor units were unique using the distribution of their surface action potentials across the 236 surface electrodes.

      L378. What do the authors mean by "rate coding is similar"? I find this statement confusing. Is this regarding the absolute firing rate range, response to force increases, hysteresis, or how they scale with contraction intensity?

      This statement was removed from the discussion to avoid any confusion.

      Reviewer #3 (Recommendations For The Authors): 

      The authors may want to consider other mechanisms of the linearization of discharge rates of medium and high threshold motor units. Monica's work may suggest that, over time, there is a subthreshold activation of the PIC, which serves to linearize the eventual suprathreshold activation underlying repetitive discharge. Additionally, Andy has shown that inhibitory drive from cutaneous inputs can linearize the initial acceleration of low threshold motor units - cutaneous inputs, or even Ib inputs, may be greater later in the contraction and serve to linearize discharge rates. 

      We thank the reviewer for their input on the discussion, where we now discuss this point:

      ‘The level of recurrent and reciprocal inhibition has also probably increased with the increase in force during the ramp up, progressively blunting the effect of persistent inward currents for late-recruited motor units (Kuo et al., 2003; Hyngstrom et al., 2007; Revill and Fuglevand, 2017). This may also explain the larger percentage of high-threshold motor units with a linear fit for the firing rate/force relation (Figure 2), as the integration of larger inhibitory inputs should linearise the firing rate/force relation (Revill and Fuglevand, 2017).’. 

      Lines 433 - intrinsic properties, in particular the afterhyperpolarization, will likely influence maximal discharge rate and provide a ceiling to the change in firing rate.

      This point is now discussed in the draft (P17; L428):

      ‘This difference may be explained by smaller excitatory synaptic inputs onto low- than high-threshold motoneurons (Powers and Binder, 2001; Heckman and Enoka, 2012), lower synaptic driving potential of the dendritic membrane (Powers and Binder, 2000; Cushing et al., 2005; Fuglevand et al., 2015), and longer and larger afterhyperpolarisation phase in low- than high-threshold motoneurons (Bakels and Kernell, 1993; Gardiner, 1993; Deardorff et al., 2013; Caillet et al., 2022).’

      The actual yield per contraction is not entirely clear. Figure S2 is quite nice in this regard, but a table with this and other information on it may be helpful. This would help with the beginning of the abstract and discussion when it is stated that, on average over 100 motor units were identified per person. 

      We added a table in the results to give the number of motor units identified per contraction.

      Are the thin film units represented in S2 and S3?

      Only motor units identified from signals recorded with arrays of surface electrodes are presented in figures S2 and S3.

    1. eLife Assessment

      This study provides valuable advances in our understanding of how inputs from multiple sources can impact the physiology of motor neurons during the process of multisensory integration. Specifically, the authors show how streams of auditory and principally visual information modulate the physiology of Mauthner neurons in goldfish, thus allowing the different senses to influence escape behavior. Supporting evidence is generally convincing, although material reporting the direct control of behavior is less representative of the data.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength that MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

      Strengths:

      The methods applied are challenging and appropriate and appear to be well executed. Open questions about the physiological underpinnings of M-cell function are addressed using sound experimental design and methodology, and convincing results are provided that advance our understanding of how two streams of sensory information can interact to control behavior.

      Weaknesses:

      Our concerns about the manuscript are captured in the following specific comments, which we hope will provide a useful perspective for readers and actionable suggestions for the authors.

      Comments relevant to the revised manuscript:

      Our general assessment (above) stands unchanged from the original version. All of our comments and concerns about the original manuscript have been addressed except for two, one very minor and one quite important:

      Original Comment 1 (Minor):<br /> "Line 124. Direct stimulation of the tectum to drive M-cell-projecting tectal neurons not only bypasses the retina, it also bypasses intra-tectal processing and inputs to the tectum from other sources (notably the thalamus). This is not an issue with the interpretation of the results, but this description gives the (false) impression that bypassing the retina is sufficient to prevent adaptation. Adding a sentence or two to accurately reflect the complexity of the upstream circuitry (beyond the retina) would be welcome."

      The authors have replied:<br /> "The reviewer is right in that direct tectal stimulation bypasses all neural processing upstream, not only that produced in the retina and that the tectum does not exclusively process visual information. The revised version now acknowledges (lines 245-252, revised manuscript) the complexity of the system."

      We think that this is sufficient to address our concern. Some citations may be in order to underpin the new text.

      Original Comment 5 (Major):<br /> Figure 4C and lines 398-410.<br /> "These are beautiful examples of M-cell firing, but the text suggests that they occurred rarely and nowhere close to significantly above events observed from single modalities. We do not see this a valid result to report because there is insufficient evidence that the phenomenon shown is consistent or representative of your data."

      The authors have replied:<br /> "Our experimental conditions required anesthesia and paralysis, conditions designed to reduce neuronal firing and suppress motor output. We think it is valuable to report that we still see that simultaneous presentation subthreshold unisensory stimuli can add up to become suprathreshold, paralleling behavioral observations. We do not claim and acknowledge that those examples are representative of our recording conditions, but are likely to be more representative of the multisensory integration process taking place in freely moving fish. The revised manuscript adds context to these example traces to justify their inclusion (lines 420-426)."

      We do not feel that this important concern has been addressed. The stats are definitively negative. There is no statistical evidence from these data that multisensory integration is occurring in this assay. The aesthesia, paralysis, and low n may provide explanations for this negative result, but it is still a negative result (p=0.5269). To show two examples of multisensory integration for subthreshold stimuli fits the narrative, but this result is not supported. Examples where individual stimuli caused APs (and combined stimuli did not) also occurred, presumably, and at a rate that is statistically indistinguishable to the examples shown in Figure 5. As such, if results from this assay are going to be in the manuscript, acoustic-only and tectum-only examples should be shown as well, although they would not fit the narrative. To be meaningful, this experiment would have to show that multisensory integration is happening in this circuit. Frustrating though it must be, the experiment has given a negative result to that question.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Otero-Coronel et al. address an important question for neuroscience - how does a premotor neuron capable of directly controlling behavior integrate multiple sources of sensory inputs to inform action selection? For this, they focused on the teleost Mauthner cell, long known to be at the core of a fast escape circuit. What is particularly interesting in this work is the naturalistic approach they took. Classically, the M-cell was characterized, both behaviorally and physiologically, using an unimodal sensory space. Here the authors make the effort (substantial!) to study the physiology of the M-cell taking into account both the visual and auditory inputs. They performed well-informed electrophysiological approaches to decipher how the M-cell integrates the information of two sensory modalities depending on the strength and temporal relation between them.

      Strengths:

      The empirical results are convincing and well-supported. The manuscript is well-written and organized. The experimental approaches and the selection of stimulus parameters are clear and informed by the bibliography. The major finding is that multisensory integration increases the certainty of environmental information in an inherently noisy environment.

      Weaknesses:

      Even though the manuscript and figures are well organized, I found myself struggling to understand key points of the figures.

      For example, in Figure 1 it is not clear what are actually the Tonic and Phasic components. The figure will benefit from more details on this matter. Then, in Figure 4 the label for the traces in panel A is needed since I was not able to pick up that they were coming from different sensory pathways.

      We added an inset to Figure 1 showing how the tonic and phasic components are measured. We now use solid colors instead of transparencies, and the color scheme was modified for consistency. We added labels to the traces used as examples in Figure 4 panel A.

      In line 338 it should be optic tectum and not "optical tectum".

      We replaced two instances of the term “optical tectum” with “optic tectum”.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well-written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength and MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

      Strengths:

      The methods applied are challenging and appropriate and appear to be well executed. Open questions about the physiological underpinnings of M-cell function are addressed using sound experimental design and methodology, and convincing results are provided that advance our understanding of how two streams of sensory information can interact to control behavior.

      Weaknesses:

      Our concerns about the manuscript are captured in the following specific comments, which we hope will provide a useful perspective for readers and actionable suggestions for the authors.

      Comment 1 (Minor):

      Line 124. Direct stimulation of the tectum to drive M-cell-projecting tectal neurons not only bypasses the retina, it also bypasses intra-tectal processing and inputs to the tectum from other sources (notably the thalamus). This is not an issue with the interpretation of the results, but this description gives the (false) impression that bypassing the retina is sufficient to prevent adaptation. Adding a sentence or two to accurately reflect the complexity of the upstream circuitry (beyond the retina) would be welcome.

      The reviewer is right in that direct tectal stimulation bypasses all neural processing upstream, not only that produced in the retina and that the tectum does not exclusively process visual information. The revised version now acknowledges (lines 245-252, revised manuscript) the complexity of the system.

      Comment 2 (Major): The premise is that stimulation of the tectum is a proxy for a visual stimulus, but the tectum also carries the auditory, lateral line, and vestibular information. This seems like a confound in the interpretation of this preparation as a simple audio-visual paradigm. Minimally, this confound should be noted and addressed. The first heading of the Results should not refer to "visual tectal stimuli".

      We changed the heading of the corresponding section of the Results section as requested and also omitted the term “optic” when we did not specifically refer to tectal circuits that process optic information.  

      Comment 3 (Major): Figure 1 and associated text.

      It is unclear and not mentioned in the Methods section how phasic and tonic responses were calculated. It is clear from the example traces that there is a change in tonic responses and the accumulation of subthreshold responses. Depending on how tonic responses were calculated, perhaps the authors could overlay a low-passed filtered trace and/or show calculations based on the filtered trace at each tectal train duration.

      The revised version of the manuscript now includes a description of how the phasic and tonic components were calculated (lines 163-172). We also modified the color scheme and the inset of Figure 1A to clarify how these two components were defined. Since we quantified the response in a 12 ms window, we did not include an overlayed low-pass filtered trace since it might be confusing with respect to the metric used.

      Comment 4 (Minor): Figure 3 and associated text.

      This is a lovely experiment. Although it is not written in text, it provides logic for the next experiment in choosing a 50ms time interval. It would be great if the authors calculated the first timepoint at which the percentage of shunting inhibition is not significantly different from zero. This would provide a convincing basis for picking 50ms for the next experiment. That said, I suspect that this time point would be earlier than 50 ms. This may explain and add further complexity to why the authors found mostly linear or sublinear integration, and perhaps the basis for future experiments to test different stimulus time intervals. Please move calculations to Methods.

      We moved calculations to the Methods section (lines 201-208). We mention the rationale for selecting the 50 ms interval in the next experiment (Figure 4, lines 369-371) and discuss in detail the potential contribution of FFI to the complexity of the integration taking place in the M-cell circuit (Discussion, lines 512-535).

      Comment 5 (Major): Figure 4C and lines 398-410.

      These are beautiful examples of M-cell firing, but the text suggests that they occurred rarely and nowhere close to significantly above events observed from single modalities. We do not see this as a valid result to report because there is insufficient evidence that the phenomenon shown is consistent or representative of your data.

      Our experimental conditions required anesthesia and paralysis, conditions designed to reduce neuronal firing and suppress motor output. We think it is valuable to report that we still see that simultaneous presentation subthreshold unisensory stimuli can add up to become suprathreshold, paralleling behavioral observations. We do not claim and acknowledge that those examples are representative of our recording conditions, but are likely to be more representative of the multisensory integration process taking place in freely moving fish. The revised manuscript adds context to these example traces to justify their inclusion (lines 420-426).

      Reviewer #2 (Recommendations For The Authors):

      Methods

      The Methods section on "Auditory stimuli" contains a long background on the biophysics of the M-cell and its inputs. This does not belong in Methods. The same is true, to a lesser degree, in the next heading. The argument that direct stimulation of the tectum is necessary to bypass adaptation should be in Results, not Methods.

      Following the reviewer recommendation, we have moved both paragraphs to the Results section.

      Figure 1 and associated text.

      Visually, the use of transparency to differentiate phasic and tonic calculations is difficult to read. Example traces are also cut off at the top and bottom at random sizes.

      We changed the color scheme to avoid the use of transparency and modified the inset of Figure 1A to clarify how the phasic and tonic components were calculated. We also modified the dimensions of the clipping mask used to trim the stimulation artifacts of sample traces to make them more similar while still enabling clear observation of the phasic and tonic components of the response.

      Line 338 "optical tectum" is not correct. "optic tectum" is more common, or better still, just "tectum".

      We apologize for the error. The two instances of “optical tectum” were replaced by the correct term (“optic tectum”).

    1. eLife Assessment

      This important study highlights the use of siderophores as antibacterials, and the authors also discuss the consequences and efficacy of 'siderophore therapy' in more complex communities/environments. The evidence supporting the overall hypotheses ranges is largely convincing. The work will be of broad interest to people working in the fields of evolutionary ecology, microbiology and medical sciences.

    1. eLife Assessment

      This work is an important contribution to the development of a biologically plausible theory of statistical modeling of spiking activity. The authors convincingly implemented the statistical inference of input likelihood in a simple neural circuit, demonstrating the relationship between synaptic homeostasis, neural representations, and computational accuracy. This work will be of interest to neuroscientists, both theoretical and experimental, who are exploring how statistical computation is implemented in neural networks.

    2. Reviewer #1 (Public review):

      Summary

      A novel statistical model of neural population activity called the Random Projection model has been recently proposed. Not only is this model accurate, efficient, and scalable, but also is naturally implemented as a shallow neural network. This work proposes a new class of RP model called the reshaped RP model. Inheriting the virtue of the original RP model, the proposed model is more accurate in terms of data fitting and efficient in terms of lower firing rate than the original, as well as compatible with various biological constraints. In particular, the authors have demonstrated that normalizing the total synaptic input in the reshaped model has a homeostatic effect on the firing rates of the neurons, resulting in even more efficient representations with equivalent accuracy. These results suggest that synaptic normalization contributes to synaptic homeostasis as well as efficiency in neural encoding.

      Strength

      This paper demonstrates that the accuracy and efficiency of the random projection models can be improved by extending the model with reshaped projections. Furthermore, it broadens the applicability of the model under biological constraints of synaptic regularization. It also suggests the advantage of the sparse connectivity structure over the fully connected model for modeling spiking statistics. In summary, this work successfully integrates two different elements, statistical modeling of the spikes and synaptic homeostasis in a single biologically plausible neural network model. The authors logically demonstrate their arguments with clear visual presentations and well-structured text, facilitating an unambiguous understanding for readers.

      Discussions

      The authors have clearly responded to most of our questions in the revised manuscript and we are happy to recommend publishing the final version of the article as it is. Below, we would like to present some alternative interpretations of the results. These comments are not exclusive with the claims made in the articles; it is rather intended to enhance the understanding of readers by providing additional perspectives.

      As summarized above, the main contribution of the work consists of two parts; (1) the reshaped RP model achieved higher performance in explaining the statistics of the spiking activity of cortical neurons with more efficient representations (=lower firing rate), (2) synaptic homeostatic normalization in the reshaped RP model yields even more efficient representations without impairing the fitting performance.

      For part (1),<br /> Suppl. Fig. 1B compares reshaped RP models with RP and RP with pruning and replacement (R&P). The better performance of RP with P&R might imply the advantage of pruning over gradient descent in this setting, reflecting the non-convexities of the problem. Alternatively, it might suggest the benefit of the increased number of parameters, since pruning allows the network to explore the broader parameter space during the learning process. This latter view might partially explain the superiority of the reshaped RP model over the original RP model.<br /> It is interesting that the backprop model has higher firing rate than the reshaped model (Fig. 1D). This trend is unchanged when optimization of the neural threshold is also allowed (Supp. Fig. 2A). Since backprop model overperforms reshaped model slightly but robustly, high firing rates in the backprop model might be a genuine feature of the spike statistics.

      For part (2),<br /> We note that λ regulates the average firing rate, since maximizing the entropy <-ln p(x)> with a regularization term -λ <\sum _i f(x_i)> results in λ_i = λ for all i in the Boltzmann distribution of eq. 2. Suppl. Fig. 2B could be understood as demonstrating this "homeostatic" effect of λ.<br /> Suppl. Fig. 3 could be interpreted as demonstrating the interaction of two different homeostatic mechanisms: one at the firing-rate level (as regulated by λ) and the other at the synaptic level (as regulated by φ). It shows that the range of synaptic constraints where the fitting performance is not impaired differs by the value of λ. For example, if lambda is small (\lambda = 0.25), synaptic constraint can easily deteriorate the performance; on the other hand, if lambda is large (\lambda = 4), performance remains unchanged under strict synaptic constraint. Considering that practically we are most interested in the regime where the model performs best (λ = 2.0), an advantageous feature of the homeostatic model is that homeostatic constraint is harmless at λ=2.0 for the wide range of constraints.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Comments:

      (1) We find it interesting that the reshaped model showed decreased firing rates of the projection neurons. We note that maximizing the entropy <-ln p(x)> with a regularizing term -\lambda <\sum _i f(x_i)>, which reflects the mean firing rate, results in \lambda _i = \lambda for all i in the Boltzmann distribution. In other words, in addition to the homeostatic effect of synaptic normalization which is shown in Figures 3B-D, setting all \lambda_i = 1 itself might have a homeostatic effect on the firing rates. It would be better if the contribution of these two homeostatic effects be separated. One suggestion is to verify the homeostatic effect of synaptic normalization by changing the value of \lambda.

      This is an interesting question and we, therefore, explored the effects of different values of $\lambda$ on the performance of unconstrained reshaped RP models and their firing rates. The new supp. Figure 2B presents the results of this exploration: We found that for models with a small set of projections, a high value of $\lambda$ results in better performance than models with low ones, while for models with a large set of projections we find the opposite relation. The mean firing rates of the projection neurons for models with different values of $\lambda$ show a clear trend, where higher $\lambda$ values results in lower mean firing rates.

      Thus, these results suggest an interplay between the optimal size of the projection set and the value of $\lambda$ one should pick. For the population sizes and projection sets we have used here, $\lambda=1$ is a good choice, but, for different population sizes or data sets a different value of $\lambda$ might be better.

      Thus, in addition to supp. Figure 2B, we therefore added the following to the main text:

      “An additional set of parameters that might affect the Reshaped RP models are the coefficients $\lambda$, that weigh each of the projections. Above, we used $\lambda=1$ for all projections, here we investigated the effect of the value of $\lambda$ on the performance of the Reshaped RP models (supp. Figure 2B). We find that for models with a small projection set, high $\lambda$ values result in better performance than models with low values. We find an opposite relation for models with large number projection sets. (We submit that the performance decrease of Reshaped RP models with high value of $\lambda$, as the number of projections grows, is a reflection of the non-convex nature of the Reshaped RP optimization problem).

      The mean firing rates of the projection neurons for models with different values of $\lambda$ show a clear trend, higher $\lambda$ values results in lower mean firing rates. Thus, we conclude that there is an interplay between the number of projections and the value of $\lambda$ we should pick. For the sizes of projection sets we have used here, $\lambda=1$ is a good choice, but, we note that in general, one should probably seek the appropriate value of $\lambda$ for different population sizes or data sets.”

      In addition, we explored the effect of synaptic normalization on models with different values of $\lambda$ (supp. Figure 3). We found that homeostatic Reshaped RP models are superior to the non-homeostatic Reshaped RP models: For low values of $\lambda$, the homeostatic and Reshaped RP models show similar performance in terms of log-likelihood, whereas the homeostatic models are more efficient. For high values of $\lambda_i$ homeostatic models are not only more efficient but also show better performance. These results indicate that the benefit of the homeostatic model is insensitive to the specific choice of $\lambda$.

      In addition to supp. Figure 3, we added the following to the main text:

      “Exploring the effect of synaptic normalization on models with different values of $\lambda$ (supp. Figure 3), we find that homeostatic Reshaped RP models are superior to the non-homeostatic Reshaped RP models: For low values of $\lambda$, the homeostatic and Reshaped RP models show similar performance in terms of log-likelihood, whereas the homeostatic models are more efficient. Importantly, for high values of $\lambda_i$ homeostatic models are not only more efficient but also show better performance. We conclude that the benefit of the homeostatic model is insensitive to the specific choice of $\lambda$.”

      (2) As far as we understand, \theta_i (thresholds of the neurons) are fixed to 1 in the article. Optimizing the neural threshold as well as synaptic weights is a natural procedure (both biologically and engineeringly), and can easily be computed by a similar expression to that of a_ij (equation 3). Do the results still hold when changing \theta _i is allowed as well? For example,

      a. If \theta _i becomes larger, the mean firing rates will decrease. Does the backprop model still have higher firing rates than the reshaped model when \theta _i are also optimized?

      b. Changing \theta _i affects the dynamic range of the projection neurons, thus could modify the effect of synaptic constraints. In particular, does it affect the performance of the bounded model (relative to the homeostatic input models)?

      We followed the referee’s suggestion, and extended our current analysis, and added threshold optimization to the Reshape and Backpropagation models, which is now shown in supp. Figure 2A.  Comparing the performance and properties of these models to ones with fixed thresholds, we found that this addition had a small effect on the performance of the models in terms of their likelihood. (supp. Figure 2A). We further find that backpropagation models with tuned thresholds show lower firing rates compared to backpropagation models with fixed threshold, while reshaped RP models with optimized thresholds show higher firing rates compared to models with fixed threshold. These differences are, again, rather small, and both versions of the reshaped RP models show lower firing rates compared to both versions of the backpropagation models.

      In addition to supp. Figure 2A, we added the following to the main text:

      “The projections' threshold $\theta_i$, which is analogous to the spiking threshold of the projection neurons, strongly affects the projections' firing rates. We asked how, in addition to reshaping the coefficients of each projection, we can also change $\theta_i$ to optimize the reshaped RP and backpropagation models.

      We find that this addition has a small effect on the performance of the models in terms of their likelihood (supp. Figure 2A).

      We also find that this has a small effect on the firing rates of the projection neurons: backpropagation models with tuned thresholds show lower firing rates compared to backpropagation models with fixed threshold, whereas reshaped RP models with optimized thresholds show higher firing rates compared to models with fixed threshold. Yet, both versions of the reshaped RP models show lower firing rates compared to both versions of the backpropagation models. Given the small effect of tuning threshold on models' performance and their internal properties, we will, henceforth, focus on Reshaped RP models with fixed thresholds.”

      (3) In Figure 1, the authors claim that the reshaped RP model outperforms the RP model. This improved performance might be partly because the reshaped RP model has more parameters to be optimized than the RP model. Indeed, let the number of projections N and the in-degree of the projections K, then the RP model and the reshaped RP model have N and KN parameters, respectively. Does the reshaped model still outperform the original one when only (randomly chosen) N weights (out of a_ij) are allowed to be optimized and the rest is fixed? (or, does it still outperform the original model with the same number of optimized parameters (i.e. N/K neurons)?)

      Indeed, the number of tuned parameters in the reshaped RP model is much larger compared to the number of tuned parameters in an RP model with the same projection set size. Yet, we submit that the larger number of tuned parameters is not the reason for the improved performance of the reshaped RP model: Maoz et al [30] have already shown that by optimizing an RP model with a small projection set using the pruning and replacement of projections (P&R), one can reach high accuracy with an almost order of magnitude fewer projections. Thus, we argue that the improved performance stems from the properties of the projections in the model.

      Accordingly, we therefore added supp. Figure 2B that shows the performance of P&R sigmoid RP model compared to RP and reshaped RP models. We added the following to the main text:

      “Because reshaping may change all the existing synapses of each projection, the number of parameters is the number of projections times the projections in-degree. While this is much larger than the number of parameters that we learn for the RP model (one for each projection), we suggest that the performance of the reshaped models is not a naive result of having more parameters. In particular, we have seen that RP models that use a small set of projections can be very accurate when the projections are optimized using the pruning and replacement process [30] (see also supp. Figure 1B). Thus, it is really the nature of the projections that shapes the performance. Indeed, our results here show that a small fixed connectivity projection set with weight tuning is enough for accurate performance which is on par or better than an RP model with more projections.”

      (4) In Figure 2, the authors have demonstrated that the homeostatic synaptic normalization outperforms the bounded model when the allowed synaptic cost is small. One possible hypothesis for explaining this fact is that the optimal solution lies in the region where only a small number of |a_ij| is large and the rest is near 0. If it is possible to verify this idea by, for example, exhibiting the distribution of a_ij after optimization, it would help the readers to better understand the mechanism behind the superiority of the homeostatic input model.

      We modified supp. Figure 4 and made the following change in the relevant part in the main text to address the reviewer comment about the distribution of the $a_{ij}$ values:

      “Figure 5E shows the mean rotation angle over 100 homeostatic models as a function of synaptic cost -- reflecting that the different forms of homeostatic regulation results in different reshaped projections. We show in Supp. Figure 4C the histogram of the rotation angles of several different homeostatic models, as well as the unconstrained Reshape model.

      Analyzing the distribution of the synaptic weights $a_{ij}$ after learning leads to a similar conclusion (supp. Figure 4D): The peak of the histograms is at $a_{ij} = 0$, implying that during reshaping most synapses are effectively pruned. While the distribution is broader for models with higher synaptic budget, it is asymmetric, showing local maxima at different values of $a_{ij}$.

      The diversity of solutions that the different model classes and parameters show imply a form of redundancy in model choice or learning procedure. This reflects a multiplicity of ways to learn or optimize such networks that biology could use to shape or tune neural population codes.“

      (5) In Figures 5D and 5E, the authors present how different reshaping constraints result in different learning processes ("rotation"). We find these results quite intriguing, but it would help the readers understand them if there is more explanation or interpretation. For example,

      a. In the "Reshape - Hom. circuit 4.0" plot (Fig 5D, upper-left), the rotation angle between the two models is almost always the same. This is reasonable since the Homeostatic Circuit model is the least constrained model and could be almost irrelevant to the optimization process. Is there any similar interpretation to the other 3 plots of Figure 5D?

      We added a short discussion of this difference to the main text, but do not have a geometric or other intuitive explanation for the nature of these differences.

      b. In Figure 5E, is there any intuitive explanation for why the three models take minimum rotation angle at similar global synaptic cost (~0.3)?

      We added discussion of this issue to the main text, and the histogram of the rotation angles in Supp Figure 4c shows that they are not identical. But, we don’t have an intuitive explanation for why the mean values are so similar.

      Recommendations for the authors:

      (1) Some claims on the effect of synaptic normalization on the reshaped model sound a little overstated since the presented evidence does not clearly show the improvement of the computational performance (in comparison to the vanilla reshaped model) in terms of maximizing the likelihood of the inputs. Here are some examples of such claims: "Incorporating more biological features and utilizing synaptic normalization in the learning process, results in even more efficient and accurate models." (in Abstract), "Thus, our new scalable, efficient, and highly accurate population code models are not only biologically-plausible but are actually optimized due to their biological features." (in Abstract), or "in our Reshaped RP models, homeostatic plasticity optimizes the performance of network models" (in Discussion).

      We changed the wording according to the reviewers’ suggestions.

      (2) In equation (1) and the following sentence, \theta _j (threshold) should be \theta _i.

      Fixed

      (3) While the authors mention that "reshaping with normalization or without it drives the projection neurons to converge to similar average firing rate values (Figure 3B)", they also claim that "reshaping with normalization implies lower firing rates as well as... (Figure 3E)". These two claims look a little inconsistent to us. Besides, it is not very clear from Figure 3E that the normalization decreases the firing rate (it is clear from Figure 3B, though). How about just deleting "lower firing rates as well as"?

      We changed the wording according to the reviewers’ suggestion.

      (4) The captions of Figures 4D and 4E should be exchanged.

      Fixed

      (5) Typo in In Figure 4F: "normalized in-dgreree".

      Fixed

      (6) In Figure 5D (upper left plot) the choice of "Reshape" and "Bounded3.0" looks a bit weird. Is this the typo of "Hom. cicruit 4.0"?

      There is no typo in the figure labels. We discussed the results of figure 5D in our response to point (5) in the public comments list and addressed the upper left panel of figure 5D in the main text.

      (7) In the paper, the letter \theta represents (1) the threshold of the projection neurons (eq. 1), (2) the "ceiling" value of the bounded model, and (3) the rotation angle of projections (Figure 5). We find this notation a bit confusing and recommend using different notations for different entities.

      Thanks for the suggestion, we changed the confusing notations: (1) The threshold of each projection neuron is still $\theta$, following the notation of the original RP model formulation [30]. (2) The notation of the “ceiling” value of the bounded model is now $\omega$. (3) The rotation angle of the projections during reshape is now marked by $\alpha$.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are: 

      (1) We have added additional analysis of running onsets in closed and open loop conditions for audiomotor (Figure 2H) and visuomotor (Figure 3H) coupling.  

      (2) We have also added analysis of running speed and pupil dilation upon mismatch presentation (Figures S2A and S2B, S4A and S4B, and S5A and S5B).

      (3) We have expanded on the discussion of the nature of differences between audiomotor and visuomotor mismatches.

      Reviewer #1:

      The manuscript presents a short report investigating mismatch responses in the auditory cortex, following previous studies focused on the visual cortex. By correlating the mouse locomotion speed with acoustic feedback levels, the authors demonstrate excitatory responses in a subset of neurons to halts in expected acoustic feedback. They show a lack of responses to mismatch in the visual modality. A subset of neurons show enhanced mismatch responses when both auditory and visual modalities are coupled to the animal's locomotion. 

      While the study is well-designed and addresses a timely question, several concerns exist regarding the quantification of animal behavior, potential alternative explanations for recorded signals, correlation between excitatory responses and animal velocity, discrepancies in reported values, and clarity regarding the identity of certain neurons. 

      Strengths: 

      (1) Well-designed study addressing a timely question in the field. 

      (2) Successful transition from previous work focused on the visual cortex to the auditory cortex, demonstrating generic principles in mismatch responses. 

      (3) The correlation between mouse locomotion speed and acoustic feedback levels provides evidence for a prediction signal in the auditory cortex. 

      (4) Coupling of visual and auditory feedback shows putative multimodal integration in the auditory cortex. 

      Weaknesses: 

      (1) Lack of quantification of animal behavior upon mismatches, potentially leading to alternative interpretations of recorded signals. 

      (2) Unclear correlation between excitatory responses and animal velocity during halts, particularly in closed-loop versus playback conditions. 

      (3) Discrepancies in reported values in a few figure panels raise questions about data consistency and interpretation. 

      (4) Ambiguity regarding the identity of the [AM+VM] MM neurons. 

      The manuscript is a short report following up on a series of papers focusing on mismatch responses between sensory inputs and predicted signals. While previous studies focused on the visual modality, here the authors moved to the auditory modality. By pairing mouse locomotion speed to the sound level of the acoustic feedback, they show that a subpopulation of neurons displays excitatory responses to halts in the (expected) acoustic feedback. These responses were lower in the open-loop state, when the feedback was uncorrelated to the animal locomotion. 

      Overall it is a well-designed study, with a timely and well-posed question. I have several concerns regarding the nature of the MM responses and their interpretations. 

      - One lacks quantification of the animal behavior upon mismatches. Behavioral responses may trigger responses in the mouse auditory cortex, and this would be an alternative explanation to the recorded signals. 

      What is the animal speed following closed-loop halts (we only have these data for the playback condition)? 

      We have quantified the running speed of the mouse following audiomotor and visuomotor mismatches. We found no evidence of a change in running speed. We have added this to Figures S2A and S4A, respectively.

      Is there any pupillometry to quantify possible changes in internal states upon halts (both closed-loop and playback)?

      The term 'internal state' may be somewhat ambiguous in this context. We assume the reviewer is asking whether we have any evidence for possible neuromodulatory changes. We know that there are noradrenergic responses in visual cortex to visuomotor mismatches (Jordan and Keller, 2023), but no cholinergic responses (Yogesh and Keller, 2023). Pupillometry, however, is likely not always sensitive enough to pick up these responses. With very strong neuromodulatory responses (e.g. to air puffs, or other startling stimuli), pupil dilation is of course detected, but this effect is likely at best threshold linear. Looking at changes in pupil size following audiomotor and visuomotor mismatch responses, we found no evidence of a change. We have added this to Figures S2B and S4B, respectively. Note, we suspect this is also strongly experience-dependent. The first audio- or visuomotor mismatch the mouse encounters is likely a more salient stimulus (to the rest of the brain, not necessarily to auditory or visual cortex), than the following ones.  

      These quantifications must be provided for the auditory mismatches but also for the VM or [AM+VM] mismatches.  

      During the presentation of multimodal mismatches [AM + VM], mice did not exhibit significant changes in running speed or pupil diameter. These data have been now added to Figures S5A and S5B.

      - AM MM neurons supposedly receive a (excitatory) locomotion-driven prediction signal. Therefore the magnitude of the excitation should depend on the actual animal velocity. Does the halt-evoked response in a closed loop correlate with the animal speed during the halt? Is the correlation less in the playback condition? 

      This is indeed what one would expect. We fear, however, that we don’t have sufficient data to address this question properly. Moreover, there is an important experimental caveat that makes the interpretation of the results difficult. In addition to the sound we experimentally couple to the locomotion speed of the mouse, the mouse self-generates sound by running (the treadmill rotating, changes to the airflow of the air-supported treadmill, footsteps, etc.). These sources of sound all also correlate in intensity with running speed. Thus, it is not entirely clear how our increase in sound amplitude with increasing running speed relates to the increase in self-generated sounds on the treadmill. This is one of the key reasons we usually do this type of experiment in the visual system where experimental control of visual flow feedback (in a given retinotopic location) is straightforward. 

      Having said that, if we look at the how mismatch responses change as a function of locomotion speed across the entire population of neurons, there appears to be no systematic change with running speed (and the effects are highly dependent on speed bins we choose). However, just looking at the most audiomotor mismatch responsive neurons, we find a trend for increased responses with increasing running speed (Author response image 1). We analyzed the top 5% of cells that showed the strongest response to mismatch (MM) and divided the MM trials into three groups based on running speed: slow (10-20 cm/s), middle (20-30 cm/s), and fast (>30 cm/s). Given the fact that we have on average 14 mismatch events in total per neuron, we don’t have sufficient data to analyze this. 

      Author response image 1.

      The average response of strongest AM MM responders to AM mismatches as a function of running speed (data are from 51 cells, 11 fields of view, 6 mice). 

      Values in Figure 2H are way higher than what can be observed in Figures 2C, and D. Could you explain the mismatch in values? Same for 3H and 4F. 

      In Figure 2H (now Figure S2F), we display responses from 4 755 individual neurons. Since most recorded neurons did not exhibit significant responses to mismatch presentations, their responses cluster around zero, significantly contributing to the final average shown in panel D. To clarify how individual neurons contribute to the overall population activity, we have added a histogram showing the distribution of neurons responding to audiomotor mismatch and sound playback halts. We hope this addition clarifies how individual neuron responses affect the final population activity. 

      Furthermore, neurons exhibiting suppression upon closed-loop halts (Figure 2C) show changes in deltaF/F of the same order of magnitude as the AM MM neurons (with excitatory responses). I cannot picture where these neurons are found in the scatter plot of Figure 2H. 

      This is caused by a ceiling effect. While we could adjust the scale of the heat map to capture neurons with very high responses (e.g. [-50 50], Author response image 2), doing so would obscure the response dynamics of most neurons. Note that the number of neurons on the y-axis far exceeds the resolution of this figure and thus there are also aliasing issues that mask the strong responses. 

      Author response image 2.

      Responses of all L2/3 ACx neurons to audiomotor mismatches. Same as Figure 2C with different color scale [-50 50] which does not capture most of the neural activity.  

      - Are [AM+VM] MM neurons AM neurons? 

      Many of [AM + VM] and [AM] neurons overlap but it is not exactly the same population. This is partially visible in Figure 4F. There is a subset of neurons (13.7%; red dots, Figure 4F) that selectively responded to the concurrent [AM+VM] mismatch, while a different subset of neurons (11.2%; yellow dots, Figure 4F) selectively responded to the mismatch responses in isolation. The [VM] response contributes only little to the sum of the two responses [AM] + [VM]. 

      Please do not use orange in Figure 4F, it is perceptually too similar to red. 

      We have now changed it to yellow. 

      Reviewer #2 (Public Review): 

      In this study, Solyga and Keller use multimodal closed-loop paradigms in conjunction with multiphoton imaging of cortical responses to assess whether and how sensorimotor prediction errors in one modality influence the computation of prediction errors in another modality. Their work addresses an important open question pertaining to the relevance of non-hierarchical (lateral cortico-cortical) interactions in predictive processing within the neocortex. 

      Specifically, they monitor GCaMP6f responses of layer 2/3 neurons in the auditory cortex of head-fixed mice engaged in VR paradigms where running is coupled to auditory, visual, or audio-visual sensory feedback. The authors find strong auditory and motor responses in the auditory cortex, as well as weak responses to visual stimuli. Further, in agreement with previous work, they find that the auditory cortex responds to audiomotor mismatches in a manner similar to that observed in visual cortex for visuomotor mismatches. Most importantly, while visuomotor mismatches by themselves do not trigger significant responses in the auditory cortex, simultaneous coupling of audio-visual inputs to movement non-linearly enhances mismatch responses in the auditory cortex. 

      Their results thus suggest that prediction errors within a given sensory modality are non-trivially influenced by prediction errors from another modality. These findings are novel, interesting, and important, especially in the context of understanding the role of lateral cortico-cortical interactions and in outlining predictive processing as a general theory of cortical function. 

      In its current form, the manuscript lacks sufficient description of methodological details pertaining to the closed-loop training and the overall experimental design. In several scenarios, while the results per se are convincing and interesting, their exact interpretation is challenging given the uncertainty about the actual experimental protocols (more on this below). Second, the authors are laser-focused on sensorimotor errors (mismatch responses) and focus almost exclusively on what happens when stimuli deviate from the animal's expectations. 

      While the authors consistently report strong running-onset responses (during open-loop) in the auditory cortex in both auditory and visual versions of the task, they do not discuss their interpretation in the different task settings (see below), nor do they analyze how these responses change during closed-loop i.e. when predictions align with sensory evidence. 

      However, I believe all my concerns can be easily addressed by additional analyses and incorporation of methodological details in the text. 

      Major concerns: 

      (1) Insufficient analysis of audiomotor mismatches in the auditory cortex: 

      Lack of analysis of the dependence of audiomotor mismatches on the running speed: it would be helpful if the authors could clarify whether the observed audiomotor mismatch responses are just binary or scale with the degree of mismatch (i.e. running speed). Along the same lines, how should one interpret the lack of dependence of the playback halt responses on the running speed? Shouldn't we expect that during playback, the responses of mismatch neurons scale with the running speed? 

      Regarding the scaling of AM mismatch responses with running speed, please see our response to reviewer 1 above to the same question. 

      Regarding the playback halt response and dependence on running speed, we would not expect there to be a dependence. The playback halt response (by design) measures the strength of the sensory response to a cessation of a stimulus (think OFF response). These typically are less strong in cortex than the corresponding ON responses but need to be controlled for (else a mismatch response might just be an OFF response – the prediction error is quantified as the difference between AM mismatch response and playback halt response). Given that sound onset responses only have a small dependence on running state, we would similarly expect sound offset (playback halt) responses to exhibit only minimal dependence on running state. 

      Slow temporal dynamics of audiomotor mismatches: despite the transient nature of the mismatches (1s), auditory mismatch responses last for several seconds. They appear significantly slower than previous reports for analogous visuomotor mismatches in V1 (by the same group, using the same methods) and even in comparison to the multimodal mismatches within this study (Figure 4C). What might explain this sustained activity? Is it due to a sustained change in the animal's running in response to the auditory mismatch? 

      This is correct, neither AM or AM+VM mismatch return to baseline in the 3 seconds following onset. VM mismatch response in visual cortex also do not return to baseline in that time window (see e.g.

      Figure 1E in (Attinger et al., 2017), or Figure 1F in (Zmarz and Keller, 2016). What the origin or computation significance of this sustained calcium response is we do not know. In intracellular signals, we do not see this sustained response (Jordan and Keller, 2020). Also peculiar is indeed the fact that in the case of AM mismatch the sustained response is similar in strength to the initial response. But also here, why this would be the case, we do not know. It is conceivable that the initial and the sustained calcium response have different origins, if the sustained response amplitude is all or nothing, the fact that the AM mismatch response is the smallest of the three could explain why sustained and initial responses are closer than for [AM+VM] or VM (in visual cortex) mismatch responses. All sustained responses appear to be roughly 1% dF/F. There are no apparent changes in running speed or pupil dilation that would correlate with the sustained activity (new panel A in Figure S2). 

      (2) Insufficient analysis and discussion of running onset responses during audiomotor sessions: The authors report strong running-onset responses during open-loop in identified mismatch neurons. They also highlight that these responses are in agreement with their model of subtractive prediction error, which relies on subtracting the bottom-up sensory evidence from top-down motor-related predictions. I agree, and, thus, assume that running-onset responses during the open loop in identified 'mismatch' neurons reflect the motor-related predictions of sensory input that the animal has learned to expect. If this is true, one would expect that such running-onset responses should dampen during closed-loop, when sensory evidence matches expectations and therefore cancels out this prediction. It would be nice if the authors test this explicitly by analyzing the running-related activity of the same neurons during closed-loop sessions. 

      Thank you for the suggestion. We now show running onset responses in both closed and open loop conditions for audiomotor and visuomotor coupling (new Figures 2H and 3H). In closed loop, we observe only a transient running onset response. In the open loop condition, running onset responses are sustained. For the visuomotor coupling, running onset responses are sustained in both closed and open loop conditions. This would be consistent with a slightly delayed cancellation of sound and motor related inputs in the audiomotor closed loop condition but not otherwise. 

      (3) Ambiguity in the interpretation of responses in visuomotor sessions. 

      Unlike for auditory stimuli, the authors show that there are no obvious responses to visuomotor mismatches or playback halts in the auditory cortex. However, the interpretation of these results is somewhat complicated by the uncertainty related to the training history of these mice. Were these mice exclusively trained on the visuomotor version of the task or also on the auditory version? I could not find this info in the Methods. From the legend for Figure 4D, it appears that the same mice were trained on all versions of the task. Is this the case? If yes, what was the training sequence? Were the mice first trained on the auditory and then the visual version? 

      The training history of the animals is important to outline the nature of the predictions and mismatch responses that one should expect to observe in the auditory cortex during visuomotor sessions.

      Depending on whether the mice in Figure 3 were trained on visual only or both visual and auditory tasks, the open-loop running onset responses may have different interpretations. 

      a) If the mice were trained only on the visual task, how should one interpret the strong running onset responses in the auditory cortex? Are these sensorimotor predictions (presumably of visual stimuli) that are conveyed to the auditory cortex? If so, what may be their role? 

      b) If the mice were also trained on the auditory version, then a potential explanation of the running-onset responses is that they are audiomotor predictions lingering from the previously learned sensorimotor coupling. In this case, one should expect that in the visual version of the task, these audiomotor predictions (within the auditory cortex) would not get canceled out even during the closedloop periods. In other words, mismatch neurons should constantly be in an error state (more active) in the closed-loop visuomotor task. Is this the case? 

      If so, how should one then interpret the lack of a 'visuomotor mismatch' aligned to the visual halts, over and above this background of continuous errors? 

      As such, the manuscript would benefit from clearly stating in the main text the experimental conditions such as training history, and from discussing the relevant possible interpretations of the responses. 

      Mice were not trained on either audiomotor or visuomotor coupling and were reared normally. Prior to the recording day, the mice were habituated to running on the air-supported treadmill without any coupling for up to 5 days. On the first recording day, the mice experienced all three types of sessions (audiomotor, visuomotor, or combined coupling) in a random order for the first time. We have clarified this in the methods. 

      Regarding the question of how one should interpret the strong running onset responses in the auditory cortex, this is complicated by the fact that – unless mice are raised visually or auditorily deprived – they always have life-long experience with visuomotor or audiomotor coupling. The visuomotor coupling they experience in VR is geometrically matched to what they would experience by moving in the real world, for the audiomotor coupling the exact relationship is less clear, but there are a diverse set of sound sources that scale in loudness with increasing running speed. Hence running onset responses reflect either such learned associations (as the reviewer also speculates), or spurious input. Rearing mice without coupling between movement and visual feedback does not abolish movement related responses in visual cortex (Attinger et al., 2017), to the contrary, it enhances them considerably. We suspect this reflects visual cortex being recruited for other functions in the absence of visual input. But given the data we have we cannot distinguish the different possible sources of running related responses. It is very likely that any “training” related effect we could achieve in a few hours pales in comparison to the life-long experience the mouse has in the world. 

      Regarding the lack of a 'visuomotor mismatch' aligned to the visual halts, we are not sure we understand. Our interpretation is that there are no (or only a very small - we speculate that any nonzero VM mismatch response is just inherited from visual cortex) VM mismatch responses in auditory cortex above chance. Our data are consistent with the interpretation that there is no opposition of bottom up visual and top down motor related input in auditory cortex, hence no VM mismatch responses (independent of how strong the top-down motor related input is). This is of course not surprising – this is more of a sanity check and becomes relevant in the context of interpreting AM+VM responses. 

      (4) Ambiguity in the interpretation of responses in multimodal versus unimodal sessions. 

      The authors show that multimodal (auditory + visual) mismatches trigger stronger responses than unimodal mismatches presented in isolation (auditory only or visual only). Further, they find that even though visual mismatches by themselves do not evoke a significant response, co-presentation of visual and auditory stimuli non-linearly augments the mismatch responses suggesting the presence of nonhierarchical interactions between various predictive processing streams. 

      In my opinion, this is an important result, but its interpretation is nuanced given insufficient details about the experimental design. It appears that responses to unimodal mismatches are obtained from sessions in which only one stimulus is presented (unimodal closed-loop sessions). Is this actually the case? An alternative and perhaps cleaner experimental design would be to create unimodal mismatches within a multimodal closed-loop session while keeping the other stimulus still coupled to the movement. 

      This is correct, unimodal mismatches were acquired in unimodal coupling. Testing unimodal mismatch responses in multimodally coupled VR is an interesting idea we had initially even pursued. However, halting visual flow in a condition of coupling of both visual flow and sound amplitude to running speed has an additional complication. Introducing an audiomotor mismatch in this coupling inherently also creates an audiovisual (AV) mismatch, and the same applies to visuomotor mismatches, which cause a concurrent visuoaudio (VA) mismatch (Figure R3). This assumes that there are cross modal predictions from visual cortex to auditory cortex as there are from auditory cortex to visual cortex (Garner and Keller, 2022). There are interesting differences between the different types of mismatches, but with the all the necessary passive controls this quickly exceeded the amount of data we could reasonably acquire for this paper. This remains an interesting question for future research. 

      Author response image 3.

      Rationale of unimodal mismatches introduced within multimodal paradigm. 

      Given the current experiment design (if my assumption is correct), it is unclear if the multimodal potentiation of mismatch responses is a consequence of nonlinear interactions between prediction/error signals exchanged across visual and auditory modalities. Alternatively, could this result from providing visual stimuli (coupled or uncoupled to movement) on top of the auditory stimuli? If it is the latter, would the observed results still be evidence of non-hierarchical interactions between various predictive processing streams? 

      Mice are not in complete darkness during the AM mismatch experiments (the VR is off, but there is low ambient light in the experimental rooms primarily from computer screens), so we can rule out the possibility that the difference comes from having “no” visual input during AM mismatch responses. Addressing the question of whether it is this particular stimulus that cause the increase would require an experiment in which we couple sound amplitude but keep visual flow open loop. We did not do this, but also think this is highly unlikely. However, as described above, we did do an experiment in which we coupled both sound amplitude and visual flow to running, and then either halted visual flow, or sound amplitude, or both. Comparing the [AM+VM] and [AM+AV] mismatch responses, we find that [AM+VM] responses are larger than [AM+AV] responses as one would expect from an interaction between [AM] and [VM] responses (Author response image 4). Finally, either way the conclusion that there are nonhierarchical interactions of prediction error computations holds either way – if any visual stimulus (either visuomotor mismatch, or visual flow responses) influences audiomotor mismatch responses, this is evidence of non-hierarchical interactions.   

      Author response image 4.

      Average population response of all L2/3 neurons to concurrent [AM + VM] or [AM+AV] mismatch. Gray shading indicates the duration of the stimulus.

      Along the same lines, it would be interesting to analyze how the coupling of visual as well as auditory stimuli to movement influences responses in the auditory cortex in close-loop in comparison to auditoryonly sessions. Also, do running onset responses change in open-loop in multimodal vs. unimodal playback sessions? 

      We agree, and why we started out doing the experiments described above. We stopped with this however, because it quickly became a combinatorial nightmare. We will leave addressing the question of how different types of coupling influences responses in auditory cortex to brave future neuroscientists. 

      Regarding the question of running onset responses, in both the multimodal and auditory only paradigms, running onset responses are transient; bottom-up sensory evidence is quickly subtracted from top-down motor-related prediction (Author response image 5). While there appears to be a small difference in the dynamics of running onset responses between these two paradigms, it was not significant. Note, we also have much less data than we would like here for this type of analysis. 

      Author response image 5.

      Running onset responses recorded in unimodal and multimodal closed loop sessions (1903 neurons, 16 fields of view, 8 mice)

      We also compared running onsets in open loop sessions and did not find any significant differences between unimodal and multimodal sessions (Author response image 6). We found only six sessions in which animals performed at least two running onsets in each session type, therefore, we do not have enough data to include it in the manuscript. 

      Author response image 6.

      Running onset responses recorded within unimodal and multimodal open loop sessions (659 cells, 6 field of view, 5 mice).

      Minor concerns and comments:

      (1) Rapid learning of audiomotor mismatches: It is interesting that auditory mismatches are present even on day 1 and do not appear to get stronger with learning (same on day 2). The authors comment that this could be because the coupling is learned rapidly (line 110). How does this compare to the rate at which visuomotor coupling is learned? Is this rapid learning also observable in the animal's behavior i.e. is there a change in running speed in response to the mismatch? 

      In the visual system this is a bit more complicated. If you look at visuomotor mismatch responses in a normally reared mouse, responses are present from the first mismatch (as far as we can tell given the inherently small dataset with just one response pre mouse). However, this is of course confounded by the fact that a normally reared mouse has visuomotor coupling throughout life from eye-opening. Raising mice in complete darkness, we have shown that approximately 20 min of coupling are sufficient to establish visuomotor mismatch responses (Attinger et al., 2017). 

      Regarding the behavioral changes that correlate with learning, we are not sure what the reviewer would expect. We cannot detect a change in mismatch responses and hence would also not expect to see a change in behavior.

      (2) The authors should clarify whether the sound and running onset responses of the auditory mismatch neurons in Figure 2E were acquired during open-loop. This is most likely the case, but explicitly stating it would be helpful. 

      Both responses were measured in isolation (i.e. VR off, just sound and just running onset), not in an open-loop session. We have clarified in the figure legend that these are the same data as in Figure 1H and N. 

      (3) In lines 87-88, the authors state 'Visual responses also appeared overall similar but with a small increase in strength during running ...'. This statement would benefit from clarification. From Figure S1 it appears that when the animal is sitting there are no visual responses in the auditory cortex. But when the animal is moving, small positive responses are present. Are these actually 'visual' responses - perhaps a visual prediction sent from the visual cortex to the auditory cortex that is gated by movement? If so, are they modulated by features of visual stimuli eg. contrast, intensity? Or, do these responses simply reflect motor-related activity (running)? Would they be present to the same extent in the same neurons even in the dark? 

      This was wrong indeed - we have rephrased the statement as suggested. Regarding the source of visual responses, we use the term “visual response” operationally here agnostic to what pathway might be driving it (i.e. it could be a prediction triggered by visual input). 

      We did not test if recorded visual responses are modulated by contrast or intensity. However, testing whether they are would not help us distinguish whether the responses are ‘visual’ or ‘visual predictions’. Finally, regarding the question about whether they are motor-related responses, this might be a misunderstanding. These are responses to visual stimuli while the mouse is already running (i.e. there is no running onset), hence we cannot test whether these responses are present in the dark (this would be the equivalent of looking at random triggers in the dark while the mouse is running).  

      (4) The authors comment in the text (lines 106-107) about cessation of sound amplitude during audiomotor mismatches as being analogous to halting of visual flow in visuomotor mismatches. However, sound amplitude versus visual flow are quite different in nature. In the visuomotor paradigm, the amount of visual stimulation (photons per unit time) does not necessarily change systematically with running speed. Whereas, in the audiomotor paradigm, the SNR of the stimulus itself changes with running speed which may impact the accuracy of predictions. On a broader note, under natural settings, while the visual flow is coupled to movement, sound amplitude may vary more idiosyncratically with movement. 

      This is a question of coding space. The coding space of visual cortex of the mouse is probably visual flow (or change in image) not number of photons. This already starts in the retina. The demonstration of this is quite impressive. A completely static image on the retina will fade to zero response (even though the number of photons remains constant). This is also why most visual physiologists use dynamic stimuli – e.g. drifting gratings, not static gratings – to map visual responses in visual cortex. If responses were linear in number of photons, this would make less of a difference. The correspondence we make is between visual flow (which we assume is the main coding space of mouse V1 – this is not established fact, but probably implicitly the general consensus of the field) and sound amplitude. Responses in auditory cortex are probably more linear in sound amplitude than visual cortex responses are linear in number of photons, but whether that is the correct coding space is still unclear, and as far as we can tell there is no clear consensus in the field. We did consider coupling running speed to frequency, which may work as well, but given the possible equivalence (as argued above) and the fact that we could see similar responses with sound amplitude coupling we did not explore frequency coupling. 

      If visual speed is the coding space of V1, SNR should behave equivalently in both cases. 

      Perhaps such differences might explain why unlike in the case of visual cortex experiments, running speed does not affect the strength of playback responses in the auditory cortex. 

      Possible, but the more straightforward framing of this point is that sensory responses are enhanced by running in visual cortex while they are not in auditory cortex. A playback halt response (by design) is just a sensory response. Why running does not generally increase sensory responses in auditory cortex (L2/3 neurons), but does so in visual cortex, would be the more general version of the same question.

      We fear we have no intelligent answer to this question.  

      Reviewer #3 (Public Review): 

      This study explores sensory prediction errors in the sensory cortex. It focuses on the question of how these signals are shaped by non-hierarchical interactions, specifically multimodal signals arising from same-level cortical areas. The authors used 2-photon imaging of mouse auditory cortex in head-fixed mice that were presented with sounds and/or visual stimuli while moving on a ball. First, responses to pure tones, visual stimuli, and movement onset were characterized. Then, the authors made the running speed of the mouse predictive of sound intensity and/or visual flow. Mismatches were created through the interruption of sound and/or visual flow for 1 second while the animal moved, disrupting the expected sensory signal given the speed of movement. As a control, the same sensory stimuli triggered by the animal's movement were presented to the animal decoupled from its movement. The authors suggest that auditory responses to the unpredicted silence reflect mismatch responses. That these mismatch responses were enhanced when the visual flow was congruently interrupted, indicates the cross-modal influence of prediction error signals. 

      This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation. 

      This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation. The pattern they observe is different from the visuomotor mismatch responses the authors found in V1 (Keller et al., 2012), where the interruption of visual flow did not activate neuronal activity in the decoupled condition. 

      Just to add brief context to this. The reviewer is correct here, the (Keller et al., 2012) paper reports finding no responses to playback halt. However, this was likely a consequence of indicator sensitivity (these experiments were done with what now seems like a pre-historic version of GCaMP). Experiments performed with more modern indicators do find playback halt responses in visual cortex (see e.g. (Zmarz and Keller, 2016)). 

      The auditory system is sensitive to transitions, also those to silence. See the work of the Linden or the Barkat labs on-off responses, and also that of the Mesgarani lab (Khalighinejad et al., 2019) on responses to transitions 'to clean' (Figure 1c) in the human auditory cortex. Since the responses described in the current work are modulated by movement and the relationship between movement and sound is more consistent during the coupled sessions, this could explain the difference in response size between coupled and uncoupled sessions. There is also the question of learning. Prediction signals develop over a period of several days and are frequency-specific (Schneider et al., 2018). From a different angle, in Keller et al. 2012, mismatch responses decrease over time as one might expect from repetition. 

      Also for brief context, this might be a misconception. We don’t find a decrease of mismatch responses in the (Keller et al., 2012) paper – we assume what the reviewer is referring to is the fact that mismatch responses decrease in open-loop conditions (they normally do not in closed-loop conditions). This is the behavior one would expect if the mouse learns that movement no longer predicts visual feedback. 

      It would help to see the responses to varying sound intensity as a function of previous intensity, and to plot the interruption response as a function of both transition and movement in both conditions. 

      Given the large populations of neurons recorded and the diversity of the responses, from clearly negative to clearly positive, it would be interesting to understand better whether the diversity reflects the diversity of sounds used or a diversity of cell types, or both. 

      Comments and questions: 

      Does movement generate a sound and does this change with the speed of movement? It would be useful to have this in the methods. 

      There are three ways to interpret the question – below the answers to all three:

      (1) Running speed is experimentally coupled to sound amplitude of a tone played through a loudspeaker. Tone amplitude is scaled with running speed of the mouse in a closed loop fashion. We assume this is not what the reviewer meant, as this is described in the methods (and the results section). 

      (2) Movements of the mouse naturally generate sounds (footsteps, legs moving against fur, etc.). Most of these sounds trivially scale with the frequency of leg movements – we assume this also not what the reviewer meant. 

      (3) Finally, there are experimental sounds related to the rotation speed of the air supported treadmill that increase with running speed of the mouse. We have added this to the methods as suggested. 

      Figures 1a and 2a. The mouse is very hard to see. Focus on mouse, objective, and sensory stimuli? The figures are generally very clear though. 

      We have enlarged the mouse as suggested. 

      1A-K was the animal running while these responses were measured? 

      We did not restrict this analysis to running or sitting and pooled responses over both conditions.  We have made this more explicit in the results section.  

      Data in Figure 1: Since the modulation of sensory responses by movement is relevant for the mismatch responses, I would move this analysis from S1 to Figure 1 and analyze the responses more finely in terms of running speed relative to sound and gratings. I would include here a more thorough analysis of the responses to 8kHz at varying intensities, for example in the decoupled sessions. Does the response adapt? Does it follow the intensity? 

      We agree that these are interesting questions, but they do not directly pertain to our conclusions here. The key point Figure S1 addresses is whether auditory responses are generally enhanced by running (as they are e.g. in visual cortex) – the answer, on average, is no. We have tried emphasizing this more, but it changes the flow of the paper away from our main message, hence we have left the panels in the supplements. 

      Regarding the 8kHz modulation, there is a general increase of the suppression of activity with increasing sound amplitude (Author response image 7 and Author response image 8). But due to the continuously varying amplitude of the stimulus, we do not have sufficient data (or do not know how to with the data we have) to address questions of adaptation. We assume there is some form of adaptation. However, either way, we don’t see how this would change our conclusions. 

      Author response image 7.

      Neural activity as a function of sound level in an AM open loop session. 

      Author response image 8.

      The average sound evoked population response of all ACx layer 2/3 neurons to 60 dB or 75 dB 8 kHz pure tones. Stimulus duration was 1 s (gray shading).

      2C-D why not talk of motor modulation? Paralleling what happens in response to auditory and visual stimuli? 

      This is correct, a mismatch response (we use mismatch here to operationally describe the stimulus – not the interpretation) can be described either as a prediction error (this is the interpretation) or a stimulus specific motor modulation. Note, the key here is “stimulus specific”. It is stimulus specific as there is an approximately 3x change between mismatch and playback halt (the same sensory stimulus with and without locomotion), but basically no change for sound onsets (Figure S1). Having said that, one explanation (prediction error) has predictive power (and hence is testable – see e.g. (Vasilevskaya et al., 2023) for an extensive discussion on exactly this argument for mismatch responses in visual cortex), while the other does not (a “stimulus specific” motor modulation has no predictive value or computational theory behind it and is simply a description). Thus, we choose to interpret it as a prediction error. Note, this finding does not stand in isolation and many of the testable predictions of the predictive processing interpretation have turned out to be correct (see e.g. (Keller and Mrsic-Flogel, 2018) for a review). 

      Note, we try to only use the interpretation of “prediction error” when motivating why we do the experiments, and in the discussion, but not directly in the description of the results (e.g. in Figure 2).  

      How does the mismatch affect the behavior of the mouse? Does it stop running? This could also influence the size of the response. 

      We quantified animal behavior during audiomotor mismatches and did not find any significant acceleration or slowing down upon mismatch events. Thus, neural responses recorded during AM mismatches are unlikely to be explained by changes in animal behavior. These data have been added in Figure S2A and Figure S4A.

      Figure 3. What about neurons that were positively modulated by both grating and movement? How do these neurons respond to the mismatch? 

      Neurons positively modulated by both grating and movement were slightly more responsive to MM than the rest of the population, though this difference was not significant (Author response image 9). This is also visible in Figure 3G – the high VM mismatch responsive neurons are randomly distributed in regard to correlation with running speed and visual flow speed. 

      Author response image 9.

      Responses to visuomotor mismatches of neurons positively modulated by grating and movement and remaining of the population.

      Line 176. The authors say 'Thus, in the case of a [AM + VM] mismatch both the halted visual flow and the halted sound amplitude are predicted by running speed' but the mismatch (halted flow and amplitude) is not predicted by the speed, correct? Please rephrase. 

      Thank you for pointing this out – this was indeed phrased incorrectly. We have corrected this. 

      How was the sound and/or visual flow interruption triggered? Did the animal have to run at a minimum speed in order for it to happen?

      Sound and visual flow interruptions were triggered randomly, independent of the animal's running speed. However, for the analysis, only MM presentations during which animals were running at a speed of at least 0.3 cm/s were included. The 0.3 cm/s was simply the (arbitrary) threshold we used to determine if the mouse was running. In a completely stationary mouse a mismatch event will not have any effect (sound amplitude/visual flow speed are already at 0). This is described in the methods section.

    2. eLife Assessment

      This study provides important findings on the modulation of cortical neuronal responses to sensory stimuli by motor-driven predictive signals. The study is methodologically sound and well-designed. Solid evidence is presented for the conclusion that audiomotor mismatch responses are observed in the auditory cortex and that these are strongly modulated by crossmodal signals, though further investigation of the effects of running speed on audiomotor coupling and of sound offset effects on the observed responses would strengthen the interpretation of the results.

    3. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a short report investigating mismatch responses in the auditory cortex, following previous studies focused on visual cortex. By correlating mouse locomotion speed with acoustic feedback levels, the authors demonstrate excitatory responses in a subset of neurons to halts in expected acoustic feedback. They show a lack of responses to mismatch in he visual modality. A subset of neurons show enhanced mismatch responses when both auditory and visual modalities are coupled to the animal's locomotion.

      While the study is well-designed and addresses a timely question, several concerns exist regarding the quantification of animal behavior, potential alternative explanations for recorded signals, correlation between excitatory responses and animal velocity, discrepancies in reported values, and clarity regarding the identity of certain neurons.

      Strengths:

      (1) Well-designed study addressing a timely question in the field.<br /> (2) Successful transition from previous work focused on visual cortex to auditory cortex, demonstrating generic principles in mismatch responses.<br /> (3) Correlation between mouse locomotion speed and acoustic feedback levels provides evidence for prediction signal in the auditory cortex.<br /> (4) Coupling of visual and auditory feedback show putative multimodal integration in auditory cortex.

      Weaknesses:

      (1) Lack of quantification of animal behavior upon mismatches, potentially leading to alternative interpretations of recorded signals.<br /> (2) Unclear correlation between excitatory responses and animal velocity during halts, particularly in closed-loop versus playback conditions.<br /> (3) Discrepancies in reported values in a few figure panels raise questions about data consistency and interpretation.<br /> (4) Ambiguity regarding the identity of the [AM+VM] MM neurons.

      Comments on revisions:

      I am satisfied with all clarifications and additional analyses performed by the authors.<br /> The only concern I have is about changes in running after [AM+VM] mismatches.<br /> The authors reported that they "found no evidence of a change in running speed or pupil diameter following [AM + VM] mismatch (Figures S5A)" (line 197).<br /> Nevertheless, it seems that there is a clear increase in running speed for the [AM+VM] condition (S5A). Could this be more specifically quantified? I am concerned that part of the [AM+VM] could stem from this change in running behavior. Could one factor out the running contribution?

    4. Reviewer #2 (Public review):

      In this study, Solyga and Keller use multimodal closed-loop paradigms in conjunction with multiphoton imaging of cortical responses to assess whether and how sensorimotor prediction errors in one modality influence the computation of prediction errors in another modality. Their work addresses an important open question pertaining to the relevance of non-hierarchical (lateral cortico-cortical) interactions in predictive processing within the neocortex.

      Specifically, they monitor GCaMP6f responses of layer 2/3 neurons in the auditory cortex of head-fixed mice engaged in VR paradigms where running is coupled to auditory, visual, or audio-visual sensory feedback. The authors find strong auditory and motor responses in the auditory cortex, as well as weak responses to visual stimuli. Further, in agreement with previous work, they find that the auditory cortex responds to audiomotor mismatches in a manner similar to that observed in visual cortex for visuomotor mismatches. Most importantly, while visuomotor mismatches by themselves do not trigger significant responses in the auditory cortex, simultaneous coupling of audio-visual inputs to movement non-linearly enhances mismatch responses in the auditory cortex.

      Their results thus suggest that prediction errors within a given sensory modality are non-trivially influenced by prediction errors from another modality. These findings are novel, interesting, and important, especially in the context of understanding the role of lateral cortico-cortical interactions and in outlining predictive processing as a general theory of cortical function.

      Comments on revisions:

      The authors thoroughly addressed the concerns raised. In my opinion, this has substantially strengthened the manuscript, enabling much clearer interpretation of the results reported. I commend the authors for the response to review. Overall, I find the experiments elegantly designed, and the results robust, providing compelling evidence for non-hierarchical interactions across neocortical areas and more specifically for the exchange of sensorimotor prediction error signals across modalities.

    5. Reviewer #3 (Public review):

      This study explores sensory prediction errors in sensory cortex. It focuses on the question of how these signals are shaped by non-hierarchical interactions, specifically multimodal signals arising from same level cortical areas. The authors used 2-photon imaging of mouse auditory cortex in head-fixed mice that were presented with sounds and/or visual stimuli while moving on a ball. First, responses to pure tones, visual stimuli and movement onset were characterized. Then, the authors made the running speed of the mouse predictive of sound intensity and/or visual flow (closed loop). Mismatches were created through the interruption of sound and/or visual flow for 1 second, disrupting the expected sensory signal. As a control, sensory stimuli recorded during the close loop phase were presented again decoupled from the movement (open loop). The authors suggest that auditory responses to the unpredicted interruption of the sound, which affected neither running speed nor pupil size, reflect mismatch responses. That these mismatch responses were enhanced when the visual flow was congruently interrupted, indicates cross-modal influence of prediction error signals.

      This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extend what the authors call mismatch responses are not sensory responses to sound interruption (offset responses). The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation.

      Comments on revisions:

      The incorporation of the analysis of the animal's running speed and the pupil size upon sound interruption improves the interpretation of the data. The authors can now conclude that responses to the mismatch are not due to behavioral effects.<br /> The issue of the relationship between mismatch responses and offset responses remains uncommented. The auditory system is sensitive to transitions, also to silence. See the work of the Linden or the Barkat labs (including the work of the first author of this manuscript) on offset responses, and also that of the Mesgarani lab (Khalighinejad et al., 2019) on responses to transitions 'to clean' (Figure 1c) in human auditory cortex. Offset responses, as the first author knows well, are modulated by intensity and stimulus length (after adaptation?). That responses to the interruption of the sound are similar in quality, if not quantity, in the closed and open loop conditions suggest that offset response might modulate the mismatch response. A mismatch response that reflects a break in predictability would presumably be less modulated by the exact details of the sensory input than an offset response. Therefore, what is the relationship between the mismatch response and the mean sound amplitude prior to the sound interruption (for example during the preceding 1 second)? And between the mismatch response and the mean firing rate over the same period?<br /> Finally, how do visual stimuli modulate sound responses in the absence of a mismatch? Is the multimodal response potentiation specific to a mismatch?

    1. Author response:

      The following is the authors’ response to the previous reviews.

      (1) We agreed that there was insufficient evidence for the authors' conclusion that Myc-overexpressing clones lacking Fmi become losers. We request that the authors change the text to discuss that suppression of Myc clone growth through Fmi depletion is reminiscent of a cell acquiring loser status, although at this point in the manuscript there is no clear demonstration whether this is mostly driven by growth suppression and/or an increase in apoptosis.

      We agree that at the point in the manuscript where we have only described the clone sizes, one cannot make firm conclusions about competition, so we have changed the language to reflect this. We argue that after showing our apoptosis data, those conclusions become firm. Please see the more lengthy responses to reviewers below.

      (2) We agreed that the apoptosis assay, data and interpretation need to be improved. The graphs in Fig. 4O and P should be better discussed in the text and in the legend. Additionally, the graphs are lacking the red lines that are written in the text.

      We regret that we did not adequately explain the data displayed in these two graphs. Supercompetition tends to cause apoptosis in both winners and losers, with the ratio between WT and super-competitor cells being critical in deciding the outcome of competition. We wanted to represent this visually but failed to properly explain our analysis. We have rewritten the figure legend and our discussion in the main text, hopefully making it clearer. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that expression activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which make continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. They assess the role of fmi in several kinds of winners, and their data support the conclusion that fmi is required for winner status. However, they make the claim that loss of fmi from Myc winners converts them to losers, and the data supporting this conclusion is not compelling.

      Strengths:

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      I have read the revised manuscript and have found issues that need to be resolved. The biggest concern is the overstatement of the results that loss of fmi from Myc-overexpressing clones turns them into losers. This is not shown in a compelling manner in the revised manuscript and the authors need to tone down their language or perform more experiments to support their claims. Additionally, the data about apoptosis is not sufficiently explained.

      We take issue with this reviewer’s framing of their criticism. First, the reviewer is selectively reporting the results published in PMID: 20679206. They correctly state that those authors show that small discreet clones of RasV12 lgl are eliminated (Fig. 3B), but they omit the fact that the authors also show that larger RasV12 lgl clones induce apoptosis in the surrounding wild type cells, and therefore behave as winners (Fig. 3C). Hence, the size of the clone appears to determine its winner/loser status. Of course, lgl is not scrib, and it is not a certainty that they would behave similarly, but they also show that large RasV12 scrib clones induce considerable apoptosis of the neighboring wild type cells. 

      The reviewer then discusses “continuous” clones induced by ey-flp, as we use in our manuscript. Here, the term “continuous” is probably misleading; because ey is expressed ubiquitously in the disc from early in development, it is most likely the case that the majority of cells have flipped relatively early, resulting in ~half the cells becoming clone and the other ~half twin spot. The clone cells then likely fuse to make larger clones. We show that ey-flp induced RasV12 scrib clones also behave as winners. It is logical to conclude that this is because they are large. The reviewer talks about “a privileged environment that insulates them from competition,” but if they were insulated from competition, how could they become winners? Because they occupy more territory than the wild type cells, and because they induce apoptosis in the wild type neighbors, they are winners. 

      Having shown that ey-flp induced RasV12 scrib clones behave as winners, we then remove Fmi from these clones, and show that they behave as losers by the same criteria: they occupy less area than the wild type cells (our Fig. 1 and Fig. 1 Supp 2), and they induce apoptosis in the wild type cells (our Fig 4A-H). 

      With respect to the comment about additional experiments are needed to support the claim that loss of Fmi from Myc winners converts them to losers, we’re not sure what additional data the reviewer would want. As for the tumor clones, we show that >>Myc clones get bigger than the twin control clones (Fig. 2), and we measure similar low levels of apoptosis in each (Fig. 4I-K, O). In contrast >>Myc fmi clones are out-grown by wild type clones, and apoptosis is higher in the >>Myc fmi clones than in the wild type clones (Fig. 4L-N, P-S). We therefore believe it is correct to say that >>Myc clones become losers when Fmi is removed.

      In additional comments, the reviewer takes issue with using winner and loser language at the point in the manuscript where we have only shown the clone sizes but not yet the apoptosis data, and about this we agree. We have changed the language accordingly. 

      Re explanation of the apoptosis data, see the response to reviewer #3.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a mechanistic understanding of how Fmi regulates cell competition. While induction of apoptosis and JNK activation are commonly observed outcomes in various cell competition conditions, it is crucial to determine the specific mechanisms through which they are induced in fmi-depleted clones. Furthermore, it is recommended that the authors utilize the power of fly genetics to conduct a series of genetic epistasis analyses.

      We agree that it is desirable to have a mechanistic understanding of Fmi’s role in competition, but that is beyond the scope of this manuscript. Here, our goal is to report the phenomenon. We understand and share with the reviewer the interest in better understanding the relationship between Fmi and JNK signaling in competition. The role of JNK in competition, tumorigenesis and cell death is infamously complex. In some preliminary experiments, we explored some epistasis experiments, but these were inconclusive so we elected to not report them here. In the future, we will continue with additional analyses to gain a better understanding of the mechanism by which Fmi affects competition.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific of Flamingo as it cannot be recapitulated with other components of the PCP pathway, does not rely on interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo do not just suppress the competitive advantage of winner clones, but even turn them in putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long term avenue for therapeutic purpose as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantifications and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provide some hints on a putative mechanism (specifically by comparing its localisation in winner and loser cells).

      While we did not perform a thorough analysis, our current revision of the manuscript shows Fmi staining results that do not support a change in subcellular localization of Fmi. In our images, Fmi seemed to localize similarly along the winner-loser clone boundaries, and inside and outside the clones. We cannot rule out that a subtle change in localization is taking place that could perhaps be detected with higher resolution imaging.

      Also, on a more interpretative note, the absence of impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      See our comment to Reviewer 2 regarding JNK.

      Strengths:

      A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition

      One of the rare genetic conditions that affects very specifically winner cells without any impact in losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective on the long term) Weaknesses:

      The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      We agree that in the future, it will be desirable to gain a mechanistic understanding of Fmi’s role in competition.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have read the revised manuscript and have found issues that need to be resolved. The biggest concern is the overstatement of the results that loss of fmi from Myc-overexpressing clones turns them into losers. This is not shown in a compelling manner in the revised manuscript and the authors need to tone down their language or perform more experiments to support their claims.

      (1) I do not agree with the language used by the authors last paragraph of p. 4 stating loss of fmi from Myc supercompetitors (Fig. 2) makes them losers. At this point in the paper, they only use clone size as a readout. By definition, losers in imaginal discs die by apoptosis, which is not measured in this figure. As such, the authors do not prove that fmi-mutant Myc over-expressing clones are now losers at this point in the manuscript. The authors should discuss this in the results section regarding Fig. 2.

      We have modified the language in text and figure legend to acknowledge that the clone size data alone do not demonstrate competition.

      (2) Related to point #1, I do not agree with the language in the legend of Fig. 2H that the graph is measuring "supercompetition". They are only measuring clone ratios, not apoptosis. Growing to a smaller size does not make a clone have loser status without also assessing cell death.

      (a) I suggest that the authors remove the sentence "A ratio over 0 indicates supercompetition of nGFP+ clones, and below 0 indicates nGFP+ cells are losers." in the legend to Fig. 2H. Instead, they should describe the assay in times of clone ratios.

      The reviewer raises a valid point, as at this point in the manuscript we did not quantify cell death and proliferation. However, based on decades of knowledge of supercompetiton, Myc clones are classified as super-competitors in every instance they’ve been studied. (Myc clones show apoptosis when competing with WT cells, while at the same time they eliminate WT neighbors by apoptosis to become winners. Their faster proliferation rate may be what ultimately makes them winners.) We changed the language to address this distinction. 

      (3) In Fig. 4, they do attempt to monitor apoptosis, which is the fate of bona fide losers in imaginal tissue. However, I have several concerns about these data (panels 4I-K, O and P have been added to the revised manuscript.)

      (a) In Fig. 4I-K, why is there no death of WT cells which would be expected based on de la Cova Cell 2004? The authors need to comment on this.

      (b) Cell death should also be observed in the Myc over-expressing clones but none is seen in this disc (see de la Cova 2004 and PMID: 18257071 Fig. 4). The authors need to comment on this.

      We do not understand why the reviewer raises these two points. We see some cell death in >Myc eye discs both in winners and losers, as displayed in the graph. In our hands, the levels were on average very low. The example shown is representative of the analysis and shows apoptosis both in WT and >Myc cells, highlighted by the arrows in 4J. We added a mention to the arrows in the figure legend to make it clearer. In the main text, we already compared our observations to the same publication the reviewer mentions (De la Cova 2004). 

      (c) The data in panel 4O is not explained sufficiently in the legend or results section. What do the lines between the data points in the left side of the panel mean? Why is there a bunch of clustered data points in the right part of the Fig. 4O, when two different genotypes are listed below? I would have expected two clusters of points. The authors need to comment on this.

      We intended to convey as much information as possible in an informative manner in these graphs, and we regret not explaining better the analysis shown. We modified the legends for the apoptosis analysis to better explain the displayed data.

      (d) What is the sample size (n) for the genotypes listed in this figure? The authors need to comment on this and explicitly list the sample size in the legend.

      We added the n for both conditions to the figure. 

      (e) In panels 4L-N, why is the death occurring in the apparent center of the fmiE59>>Myc clone. If these clones are truly losers as the authors claim, then apoptosis should be seen at the boundaries between the fmiE59>>Myc clone and the WT clones. The results in this figure are not compelling, yet this is the critical piece of data to support their claim that fmiE59>>Myc clone are losers. The authors need to comment on this.

      The majority of cell death in this example is observed 1-3 cells away from the clone boundary. In some cases, we observe cell death farther from the boundary, but those cells were not counted in our analyses. As described in our methods, we only considered for the analysis cells at the clone boundary or in the vicinity, as those are the ones that most probably have apoptosis triggered by the neighboring clone.

      (f) There is no red line in Fig. 4O and 4P, in contrast to what is written in the legend in the revised manuscript. This should be corrected.

      We thank the reviewer for catching the error about the line. We have now simplified the graph by removing the line at Y=0 and just leave one dashed line, representing the mean difference between WT and >>Myc cells.

      (4) On p. 10, the reference Harvey and Tapon 2007 to support hpo-/- supercompetitor status is incorrect. The references are Ziosi 2010 and Neto-Silva 2010. This should be changed.

      We thank the reviewer for the correction. While the review we provided discusses the role of the Hpo pathway in proliferation and cancer, it does not discuss competition. The reference we intended to include here was Ziosi 2010. We now cite both in the revised manuscript.

      (5) The legend for Fig. 3A-H is missing from the revised manuscript. This needs to be added.

      This was likely a copy-edit glitch. The missing parts of the legend have been restored.

      (6) Material and methods is missing details on the hs-induced clones. The authors need to specifically state when the clones were generated and when they were analyzed in hours after egg laying.

      The timing of the heat-shock and analysis was described in the methods: “Heat-shock was performed on late first instar and early second instar larvae, 48 hrs after egg laying (AEL). Vials were kept at 25ºC after heat-shock until larvae were dissected”. And additionally, in the dissection methods: “Third instar wandering larvae (120 hrs AEL) were dissected…” We have included in this revision the length of the heat-shock (15 min). 

      I have read the rebuttal and some of my concerns are not sufficiently addressed.

      (8) I raised the point of continuously-generated clones becoming large enough to evade competition, and I disagree with the authors' reply. I think that competition of RasV12, scrib (or lgl) competition largely depends the size of the clone, which is de facto larger when generated by continuous expression of flp (such as eyeless or tubulin promoters used in this study). I think that at that point, we are at an impasse with respect to this issue, but I wanted to register my disagreement for the record. Related to this, one possible reason for the fragmentation of the fmimutant Myc overexpressing clones in the wing disc is because they were not continuously generated and hence did not merge with other clones.

      Please see the discussion above in the public comments. We remain unclear about what, exactly, the reviewer disagrees. As stated above, we think they are correct that the size of the clone is critical in determining winner vs loser status.

      Reviewer #2 (Recommendations for the authors):

      Although the authors have addressed some of my concerns, I still feel that a detailed mechanistic understanding is essential. I hope the authors will conduct additional experiments to solve this issue.

      We also consider the mechanism of interest and will pursue this in the future. To test our hypotheses we require a set of genetic mutants that are still in the making that will help us dissect the function and potential partners of Fmi, and we hope to have these results in a future publication.

      Reviewer #3 (Recommendations for the authors):

      - There is no clear demonstration that the relative decrease of clone size in UASMyc/Fmi mutant is mostly driven by either a context dependant suppression of growth and/or an increase of apoptosis (the latter being the more classic feature of loser phenotype).

      We believe that it is driven by both, and refrain from making assumptions about the magnitude of contribution from each. This question is something that we will be interested to explore in the future.

      The distribution of cell death in Fmi/UAS-Myc mutant is somehow surprising and may not fit with most of the competition scenarios where death is mostly restricted to clone periphery (although this may be quite variable and would require much more quantification to be clear).

      While we observe some cell death far from clone boundaries, most of the dying cells are a few cells away from a clone boundary. In other publications quantifying cell death, examples of cell death farther from the boundary are not rare (See for example Moreno and Basler 2004 Fig 6, De la Cova et al. Fig 2, Meyer et al 2014 Fig 2). We did not count cells dying far from clone boundaries in our analysis.

      I just noticed a few mistakes in the legend :

      Figure 3M legend is missing (it would be useful to know at which stage the quantification is performed)

      Another reviewer brought to our attention the problems with Fig 3 legend. We restored the missing parts.

      It would be good to give an estimate of the number of larvae observed when showing the representative cases in Figure 1 .

      This is a good point. We now include these numbers in the figure legend.

    2. eLife Assessment

      This study investigates the role of the Cadherin Flamingo (Fmi) in cell competition in developing tissues in Drosophila melanogaster. The findings are valuable in that they show that Fmi is required in winning cells in several competitive contexts. The evidence supporting the conclusions is solid, as the authors identify Fmi as a potential new regulator of cell competition, however, they don't delve into a mechanistic understanding of how this occurs.

    3. Reviewer #1 (Public review):

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that expression activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which make continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. They assess the role of fmi in several kinds of winners, and their data support the conclusion that fmi is required for winner status. However, they make the claim that loss of fmi from Myc winners converts them to losers, and the data supporting this conclusion is not compelling.

      Strengths:

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

    4. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.<br /> (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

    5. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific of Flamingo as it cannot be recapitulated with other components of the PCP pathway, does not rely on interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo do not just suppress the competitive advantage of winner clones, but even turn them in putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long term avenue for therapeutic purpose as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantifications and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provide some hints on a putative mechanism (specifically by comparing its localisation in winner and loser cells).

      Also, on a more interpretative note, the absence of impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      Strengths:

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition

      - One of the rare genetic conditions that affects very specifically winner cells without any impact in losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective on the long term)

    1. eLife Assessment

      This fundamental study advances substantially our understanding of sound encoding at synapses between single inner hair cells of the mouse cochlea and spiral ganglion neurons. Dual patch-clamp recordings-a technical tour-de force-and careful data analysis provide compelling evidence that the functional heterogeneity of these synapses contributes to the diversity of spontaneous and sound-evoked firing by the neurons. The work will be of broad interest to scientists in the field of auditory neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      Tobón and Moser reveal a remarkable amount of presynaptic diversity in the fundamental Ca dependent exocytosis of synaptic vesicles at the afferent fiber bouton synapse onto the pilar or mediolar sides of single inner hair cells of mice. These are landmark findings with profound implications for understanding acoustic signal encoding and presynaptic mechanisms of synaptic diversity at inner hair cell ribbon synapses. The paper will have an immediate and long-lasting impact in the field of auditory neuroscience.

      Main findings: 1) Synaptic delays and jitter of masker responses are significantly shorter (synaptic delay: 1.19 ms) at high SR fibers (pilar) than at low SR fibers (mediolar; 2.57 ms). 2) Masked evoked EPSC are significantly larger in high SR than in low SR. 3) Quantal content and RRP size are 14 vesicles in both high and low SR fibers. 4) Depression is faster in high SR synapses suggesting they have a higher release probability and tighter Ca nanodomain coupling to docked vesicles. 5) Recovery of master-EPSCs from depletion is similar for high and low SR synapses, although there is a slightly faster rate for low SR synapses that have bigger synaptic ribbons, which is very interesting. 6) High SR synapses had larger and more compact (monophasic) sEPSCs, well suited to trigger rapidly and faithfully spikes. 7) High SR synapses exhibit lower voltage (~sound pressure in vivo) dependent thresholds of exocytosis.

      Great care was taken to use physiological external pH buffers and physiological external Ca concentrations. Paired recordings were also performed at higher temperatures with IHCs at physiological resting membrane potentials and in more mature animals than previously done for paired recordings. This is extremely challenging because it becomes increasingly difficult to visualize bouton terminals when myelination becomes more prominent in the cochlear afferents. In addition, perforated patch recordings were used in the IHC to preserve its intracellular milieu intact and thus extend the viability of the IHCs. The experiments are tour-de-force and reveal several novel aspects of IHC ribbon synapses. The data set is rich and extensive. The analysis is detailed and compelling.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Jaime-Tobon & Moser is a truly major effort to bridge the gap between classical observations on how auditory neurons respond to sounds and the synaptic basis of these phenomena. The so-called spiral ganglion neurons (SGNs) are the primary auditory neurons connecting the brain with hair cells in the cochlea. They all respond to sounds increasing their firing rates, but also present multiple heterogeneities. For instance, some present a low threshold to sound intensity, whereas others have high threshold. This property inversely correlates with the spontaneous rate, i.e., the rate at which each neuron fires in the absence of any acoustic input. These characteristics, along with others, have been studied by many reports over years. However, the mechanisms that allow the hair cells-SGN synapses to drive these behaviors are not fully understood.

      The level of experimental complexity described in this manuscript is unparalleled, producing data that is hardly found elsewhere. The authors provide strong proof for heterogeneity in transmitter release thresholds at individual synapses and they do so in an extremely complex experimental settings. In addition, the authors found other specific differences such as in synaptic latency and max EPSCs. A reasonable effort is put in bridging these observations with those extensively reported in in vivo SGNs recordings. Similarities are many and differences are not particularly worrying as experimental conditions cannot be perfectly matched, despite the authors' efforts in minimizing them.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Jaime Tobon and Moser uses patch-clamp electrophysiology in cochlear preparations to probe the pre- and post-synaptic specializations that give rise to diverse activity of spiral ganglion afferent neurons (SGN). The experiments are quite an achievement! They use paired recordings from pre-synaptic cochlear inner hair cells (IHC) that allow precise control of voltage and therefore calcium influx, with post-synaptic recordings from type I SGN boutons directly opposed to the IHC for both presynaptic control of membrane voltage and post-synaptic measurement of synaptic function with great temporal resolution.

      Any of these techniques by themselves are challenging, but the authors do them in pairs, at physiological temperatures, and in hearing animals, all of which combined make these experiments a real tour de force. The data is carefully analyzed and presented, and the results are convincing. In particular, the authors demonstrate that post-synaptic features that contribute to the spontaneous rate (SR) of predominantly monophasic post-synaptic currents (PSCs), shorter EPSC latency, and higher PSC rates are directly paired with pre-synaptic features such as a lower IHC voltage activation and tighter calcium channel coupling for release to give a higher probability of release and subsequent increase in synaptic depression. Importantly, IHCs paired with Low and High SR afferent fibers had the same total calcium currents, indicating that the same IHC can connect to both low and high SR fibers. These fibers also followed expected organizational patterns, with high SR fibers primarily contacting the pillar IHC face and low SR fibers primarily contacting the modiolar face. The authors also use in vivo-like stimulation paradigms to show different RRP and release dynamics that are similar to results from SGN in vivo recordings. Overall, this work systematically examines many features giving rise to specializations and diversity of SGN neurons.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors): 

      Discussion, page 28. The argument that the authors put forward justifying the (small) size of the spontaneous EPSCs seems reasonable. Nonetheless, it would be good to have an amplitude distribution constructed with voltage-evoked EPSCs to compare with that of spontaneous EPSCs. Not the large initial EPSC, obtained upon IHC depolarization but rather EPSCs occurring later during the longer pulses (figure 4). The authors made the claim that upon IHC depolarization, EPSCs sizes increased, but this is not backed with data. 

      Following the reviewer recommendation, we have analyzed the voltage-evoked EPSCs occurring during the last 20 ms of the Masker stimulus. We compared the cumulative distribution of the amplitude of these eEPSCs to the cumulative distribution of the amplitude of the sEPSCs (Figure 1-figure supplement 1, panel G) from the same synapses. The two distributions are significantly different (p < 0.0001, Kolmogorov-Smirnov test), with evoked EPSCs having larger amplitudes (average sEPSC amplitude of -97.28 ± 2.22 pA [median 82.10 pA] vs average eEPSC amplitude of 135.8 ± 3.24 pA [median 120.0 pA]).

    1. eLife Assessment

      This useful study employs AlphaFold2 to predict interactions among 20 nuage proteins, identifying five novel interaction candidates, three of which are validated experimentally through co-immunoprecipitation. Expanding the analysis to 430 oogenesis-related proteins and screening ~12,000 Drosophila proteins for interactions with Piwi, the study identifies 164 potential binding partners, demonstrating how computational predictions can streamline experimental validation. This study provides a solid basis for further investigations into eukaryotic protein interaction networks.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates protein-protein interactions (PPIs) within the nuage, a germline-specific organelle essential for piRNA biogenesis in Drosophila melanogaster, using AlphaFold2 to predict interactions among 20 nuage-localizing proteins. The authors identify five novel interaction candidates and experimentally validate three of them, including Spindle-E and Squash, through co-immunoprecipitation assays. They confirm the functional significance of these interactions by disrupting salt bridges at the Spn-E_Squ interface. The study further expands its scope to analyze approximately 430 oogenesis-related proteins, validating three additional interaction pairs. A comprehensive screen of around 12,000 Drosophila proteins for interactions with the key piRNA pathway player, Piwi, identifies 164 potential binding partners. Overall, the research demonstrates that in silico approaches using AlphaFold2 can link bioinformatics predictions with experimental validation, streamlining the identification of novel protein interactions and reducing the reliance on extensive experimental efforts. The manuscript is commendably clear and easy to follow; however, areas for improvement should be addressed to enhance its clarity and rigor.

      Major Concerns:

      (1) While AlphaFold2 was developed and trained primarily for predicting protein structures and their interactions, applying it to predict protein-protein interactions is an extrapolation of its intended use. This introduces several important considerations and risks. First, it assumes that AlphaFold's accuracy in structure prediction extends to interactions, despite not being explicitly trained for this task. Additionally, the assumption that high-scoring models with structural complementarity imply biologically relevant interactions is not always valid. Experimental validation is essential to address these uncertainties, as over-reliance on computational predictions without such validation can lead to false positives and inaccurate conclusions. The authors should expand on the assumptions, limitations, and risks associated with using AlphaFold2 for predicting protein-protein interactions.

      (2) The authors experimentally validated three interactions, out of five predicted interactions, using co-immunoprecipitation (co-IP). They attributed the lack of validation for the other two predictions to the limitations of the co-IP method. However, further clarification on the potential limitations of the co-immunoprecipitation behind the negative results would strengthen the conclusions. While co-IP is a widely used technique, it may not detect weak or transient interactions, which could explain the failure to validate some predictions. Suggesting alternative validation methods such as FRET or mass spectrometry could further substantiate the results. On the other hand, AlphaFold2 predictions are not infallible and may generate false positives, particularly when dealing with structurally plausible but biologically irrelevant interactions. By acknowledging both the potential limitations of co-IP and the possibility of false positives from AlphaFold2, the authors can provide a more balanced interpretation of their findings.

      (3) In line 143, the authors state that "This approach identified 13 pairs; seven of these were already known to form complexes, confirming the effectiveness of AlphaFold2 in predicting complex formations (Table 2). The highest pcScore pair was the Zuc homodimer, possibly because AlphaFold2 had learned from Zuc homodimer's crystal structure registered in the database." While the authors mentioned the presence of the Zuc homodimer's crystal structure, they do not provide a systematic bioinformatics analysis to evaluate pairwise sequence identity or check for the presence of existing structures for all the proteins or protein pairs (or their homologs) in databases such as the Protein Data Bank (PDB) or Swiss-Model. Conducting such an analysis is critical, as it significantly impacts the novelty and reliability of AlphaFold2 predictions. For instance, high sequence identity between the query proteins could lead to high-scoring models for biologically irrelevant interactions. Including this information would strengthen the conclusions regarding the accuracy and utility of the predictions.

      (4) While the manuscript successfully identifies novel protein interactions, the broader biological significance of these interactions remains underexplored. The manuscript could benefit from elaborating on how these findings may contribute to understanding the piRNA pathway and its implications on germline development, transposon repression, and oogenesis.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use AlphaFold2 to identify potential binding partners of nuage localizing proteins.

      Strengths:

      The main strength of the paper is that the authors experimentally verify a subset of the predicted interactions.

      Many studies have been performed to predict protein-protein interactions in various subsets of proteins. The interesting story here is that the authors (i) focus on an organelle that contains quite some intrinsically disordered proteins and (ii) experimentally verify some (but not all) predictions.

      Weaknesses:

      Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      Another weakness is the use of a non-standard name for "ranking confidence" - the author calls it the pcScore - while the name used in AlphaFold (and many other publications) is ranking confidence.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates protein-protein interactions (PPIs) within the nuage, a germline-specific organelle essential for piRNA biogenesis in Drosophila melanogaster, using AlphaFold2 to predict interactions among 20 nuage-localizing proteins. The authors identify five novel interaction candidates and experimentally validate three of them, including Spindle-E and Squash, through co-immunoprecipitation assays. They confirm the functional significance of these interactions by disrupting salt bridges at the Spn-E_Squ interface. The study further expands its scope to analyze approximately 430 oogenesis-related proteins, validating three additional interaction pairs. A comprehensive screen of around 12,000 Drosophila proteins for interactions with the key piRNA pathway player, Piwi, identifies 164 potential binding partners. Overall, the research demonstrates that in silico approaches using AlphaFold2 can link bioinformatics predictions with experimental validation, streamlining the identification of novel protein interactions and reducing the reliance on extensive experimental efforts. The manuscript is commendably clear and easy to follow; however, areas for improvement should be addressed to enhance its clarity and rigor.

      Major Concerns:

      (1) While AlphaFold2 was developed and trained primarily for predicting protein structures and their interactions, applying it to predict protein-protein interactions is an extrapolation of its intended use. This introduces several important considerations and risks. First, it assumes that AlphaFold's accuracy in structure prediction extends to interactions, despite not being explicitly trained for this task. Additionally, the assumption that high-scoring models with structural complementarity imply biologically relevant interactions is not always valid. Experimental validation is essential to address these uncertainties, as over-reliance on computational predictions without such validation can lead to false positives and inaccurate conclusions. The authors should expand on the assumptions, limitations, and risks associated with using AlphaFold2 for predicting protein-protein interactions.

      We appreciate the reviewer's point. The prediction of protein-protein interactions using AlphaFold2 relies on the number of conserved homologous sequences and previous conformational data. We shall add limitations and risks to the AlphaFold2 prediction method in the revised manuscript.

      (2) The authors experimentally validated three interactions, out of five predicted interactions, using co-immunoprecipitation (co-IP). They attributed the lack of validation for the other two predictions to the limitations of the co-IP method. However, further clarification on the potential limitations of the co-immunoprecipitation behind the negative results would strengthen the conclusions. While co-IP is a widely used technique, it may not detect weak or transient interactions, which could explain the failure to validate some predictions. Suggesting alternative validation methods such as FRET or mass spectrometry could further substantiate the results. On the other hand, AlphaFold2 predictions are not infallible and may generate false positives, particularly when dealing with structurally plausible but biologically irrelevant interactions. By acknowledging both the potential limitations of co-IP and the possibility of false positives from AlphaFold2, the authors can provide a more balanced interpretation of their findings.

      We appreciate the reviewer's point of view. We have used the co-IP method to detect interactions in this study. However, as the reviewer pointed out, it is likely that weak and transient interactions may not be detected. We plan to add a note on the detection limits of the co-IP method and the possibility that AlphaFold2 method produces false positives in the revised manuscript.

      (3) In line 143, the authors state that "This approach identified 13 pairs; seven of these were already known to form complexes, confirming the effectiveness of AlphaFold2 in predicting complex formations (Table 2). The highest pcScore pair was the Zuc homodimer, possibly because AlphaFold2 had learned from Zuc homodimer's crystal structure registered in the database." While the authors mentioned the presence of the Zuc homodimer's crystal structure, they do not provide a systematic bioinformatics analysis to evaluate pairwise sequence identity or check for the presence of existing structures for all the proteins or protein pairs (or their homologs) in databases such as the Protein Data Bank (PDB) or Swiss-Model. Conducting such an analysis is critical, as it significantly impacts the novelty and reliability of AlphaFold2 predictions. For instance, high sequence identity between the query proteins could lead to high-scoring models for biologically irrelevant interactions. Including this information would strengthen the conclusions regarding the accuracy and utility of the predictions.

      We appreciate the reviewer's critical point. The AlphaFold2 method generates a high confidence score when the 3D structure of the protein of interest, or of proteins with very similar sequences, is solved. We will investigate whether the proteins used in this study are included in the 3D structure database and add the information to the revised manuscript.

      (4) While the manuscript successfully identifies novel protein interactions, the broader biological significance of these interactions remains underexplored. The manuscript could benefit from elaborating on how these findings may contribute to understanding the piRNA pathway and its implications on germline development, transposon repression, and oogenesis.

      We plan to add to the revise manuscript the potential biological significance of the novel protein-protein interactions presented in this manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use AlphaFold2 to identify potential binding partners of nuage localizing proteins.

      Strengths:

      The main strength of the paper is that the authors experimentally verify a subset of the predicted interactions.

      Many studies have been performed to predict protein-protein interactions in various subsets of proteins. The interesting story here is that the authors (i) focus on an organelle that contains quite some intrinsically disordered proteins and (ii) experimentally verify some (but not all) predictions.

      Weaknesses:

      Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      We thank the reviewer for his kind suggestions. Although dimer structure predictions were made in this manuscript, if a protein is predicted to interact with two other proteins, it is possible that three proteins could interact. We plan to add such trimer predictions to the revise manuscript.

      Another weakness is the use of a non-standard name for "ranking confidence" - the author calls it the pcScore - while the name used in AlphaFold (and many other publications) is ranking confidence.

      We take the reviewer’s point and will revise the text accordingly.

    1. eLife Assessment

      This study presents important findings on cold tolerance shared between hibernating and non-hibernating mammals, identifying a key molecule, GPX4, through multi-species genome-wide CRISPR screens. The evidence supporting these conclusions is compelling, combining multi-species CRISPR screening with rigorous pharmacological assays. This work will be of significant interest to biologists studying hibernation physiology and medical researchers interested in cold tolerance.

    2. Reviewer #1 (Public review):

      Summary:

      Through a series of CRISPR-Cas9 screens, the GPX4 antioxidant pathway was identified as a critical suppressor of cold-induced cell death in hibernator-derived cells. Hamster BHK-21 cells exposed to repeated cold and rewarming cycles revealed five genes (Gpx4, Eefsec, Pstk, Secisbp2, and Sepsecs) as critical components of the GPX4 pathway, which protects against cold-induced ferroptosis. A second screen with continuous cold exposure confirmed the essential role of GPX4 in prolonged cold tolerance. GPX4 knockout lines exhibited complete cell death within four days of cold exposure, and pharmacological inhibition of GPX4 further increased cell death, underscoring the necessity of GPX4's catalytic activity in cold conditions.

      An additional CRISPR screen in human cold-sensitive K562 cells identified 176 genes for cold survival. The GPX4 pathway was found to confer significant resistance to cold in hibernators and human cells, with GPX4 loss significantly increasing cold-induced cell death.

      Comparing hamster and human GPX4, overexpression of GPX4 in human K562 cells, whether hamster or human GPX4, dramatically improved cold tolerance, while catalytically dead mutants showed no such effect. These findings suggest that GPX4 abundance is a key limiting factor for cold tolerance in human cells, and primary cell types show strong sensitivity to GPX4 loss, highlighting that differences in cold tolerance across species may be due to varying GPX4-mediated protection.

      Strengths:

      (1) Innovative Approach: The study employs a series of unbiased genome-wide CRISPR-Cas9 screens in both hibernator- and non-hibernator-derived cells to investigate the mechanisms controlling cellular cold tolerance. Notably, this is the first genome-scale CRISPR-Cas9 screen conducted in cells derived from a hibernator, the Syrian hamster.

      (2) Identification of the GPX4 Pathway: Identifying glutathione peroxidase 4 (GPX4) as a critical suppressor of cold-induced cell death significantly contributes to the field. Recently, GPX4 was also reported as a potent regulator of cold tolerance through overexpression screening (Sone et al.) in hamsters, which further supports this finding.

      (3) Improved Cold Viability Assessment: The study identifies an important technical artifact in using trypan blue to assess cell viability following cold exposure. It reveals that cells stained immediately after cold exposure retain the dye, inaccurately indicating cell death. By introducing a brief rewarming period before viability assessment, the authors significantly improve the accuracy of detecting cold-induced cell death. This refinement in methodology ensures more reliable results and sets a new standard for future research on cold stress in cells.

      Weaknesses:

      (1) Mechanisms Regulating GPX4 Levels: While the study highlights GPX4 levels as a major determinant of cellular cold tolerance, it does not discuss how these levels are regulated or why they differ between hibernators and non-hibernators. This omission leaves an important aspect of GPX4's role in cold tolerance unexplored.

      (2) Generalizability Across Species: Although the study demonstrates the role of GPX4 in several mammalian species, it does not investigate whether this mechanism extends to other vertebrates (e.g., fish and amphibians) that also face cold challenges. This limitation could restrict the broader evolutionary claims made by the study.

      (3) Variability in Cold Sensitivity Across Human Cell Lines: The study observes significant variability in cold tolerance among different human cell lines but does not explain these differences clearly. This leaves a key aspect of human cell cold sensitivity insufficiently addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Lam et al., present a very intriguing whole genome CRISPR screen in Syrian Hamster cells as well as K562 cells to identify key genes involved in hypothermia-rewarming tolerance. Survival screens were performed by exposing cells to 4C in a cooled CO2 incubator followed by a rewarming period of 30 minutes prior to survival analysis. In this paradigm, Syrian hamster-derived cell lines exhibit more robust survival than human cell lines (BHK-21 and HaK vs HT1080, HeLa, RPE1, and K562). A genome-wide Syrian hamster CRISPR library was created targeting all annotated genes with 10 guides/gene. LV transduction of the library was performed in BHK-21 cells and the survival screen procedures involved 3 cycles of 4C cold exposure x4 days followed by 2 days of re-warming.

      When compared to controls maintained at 37C, 9 genes were required for BHK-21 survival of cold cycling conditions and 5 of these 9 are known components of the GPX4 antioxidant pathway. GPX4 KO BHK-21 cells had reduced cell growth at 37C and profoundly worse cold tolerance which could be reduced by GPX4 expression. GPX4 inhibitors also reduced survival in cold. CRISPR KO screens and GPX4 KO in K562 cells revealed comparable results (though intriguingly glutathione biosynthesis genes were more critical to K562 cells than BHK-21 cells). Human or Syrian hamster GPX4 overexpression improved cold tolerance.

      Strengths:

      This is a very nicely written paper that clearly communicates in figures and text complicated experimental manipulations and in vitro genetic screening and cell survival data. The focus on GPX4 is interesting and relatively novel. The converging pharmacologic, loss-of-function, and gain-of-function experiments are also a strength.

      Weaknesses:

      A recently published article (Reference 43, Sone et al.) also independently explored the role of GPX4 in Syrian hamster cold tolerance through gain-of-function screening. Further exploration of the GPX4 species-specific mechanisms would be of great interest, but this is considered a minor weakness given the already very comprehensive and compelling data presented.

    4. Reviewer #3 (Public review):

      Summary:

      This work aims to address a fundamental biological question: how do mammalian cells achieve/lose tolerance to cold exposure? The authors first tried to establish an experimental system for cell cold exposure and evaluation of cell death and then performed genome-scale CRISPR-Cas9 screening on immortalized cell lines from Syrian Hamster (BHK-21) and human (K562) for key genes that are associated with cell survival during prolonged cold exposure. From these screenings, they focused on glutathione peroxidase 4 (GPX4). Using genetic modifications or pharmacological interventions, and multiple cell models including primary cells from various mammalian species, they showed that GPX4 proteins are likely to retain their activities at 4 {degree sign}C, functioning to prevent cold-induced cell ferroptosis.

      Strengths:

      (1) This paper is neatly written and hence easy to follow.

      (2) Experiments are well designed.

      (3) The data showing the overall good cell survival after a prolonged cold exposure or repeated cold-warm cycles are helpful to show the advantages of the experimental instruments and methods the authors used, and hence the validity of their results.

      (4) The CRISPR-Cas9 screening is a great attempt.

      (5) Multiple cell types from hibernating mammals (cold tolerant) and cold-intolerant species are used to test their findings.

      (6) Although some may argue that other labs have published works with different approaches that have pointed out the importance of GPX4 and ferroptosis in hamster cell survival from anoxia-reoxygenation or cold exposure models, hence hurting the novelty of this work, this reviewer thinks that it is highly valuable to have independent research groups and different methods/systems to validate an important concept.

      Weaknesses:

      (1) Only cell death was robustly surveyed; though cell proliferation was evaluated too in some experiments, other cellular functions, such as mitochondrial ATP production vs. glycolysis, and the extent of lipid peroxidation, could have been measured to reflect cellular physiology.

      Validations on complex tissues or in vivo systems would have further strengthened the work and its impact.

      CRISPR-Cas9 screening may have technical limitations as knock-out of some essential genes/pathways may lead to cell lethality during screening, and hence the relevance of these genes/pathways to cell cold tolerance may not be noted. From the data presented in this study, this reviewer thinks that the GPX4 pathway is likely a conserved mechanism for long-term cold survival, but not for cold sensitivity or acute cell death from cold exposure. In line with my such speculation, their CRISPR-Cas9 screening revealed genes in the GPX4 pathway from a relatively cold-sensitive human cell line, but the endogenous GPX4 pathway is seemingly operational in this cold-sensitive cell line. Also, these cells are viable after GPX4 knock-out. Dead cells from the acute cold exposure phase may detached, or their genomic DNAs have been severely damaged by the time of sample collection, hence not giving any meaningful sequencing reads. Crippling other factors/pathways such as FOXO1 (PMID: 38570500) or 5-aminolevulinic acid (ALA) metabolism (PMID: 35401816) have been shown to severely aggravate cold-induced cell death, including TUNEL-revealed DNA damage, within a much shorter time scale, whilst loss-function knockouts of FOXO1 or ALA Synthase 1 (ALAS1) are usually cell lethal. Thus, they and other possible essential genes may not be screenable from the current experimental protocol. These important points need to be taken into consideration by the authors.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Through a series of CRISPR-Cas9 screens, the GPX4 antioxidant pathway was identified as a critical suppressor of cold-induced cell death in hibernator-derived cells. Hamster BHK-21 cells exposed to repeated cold and rewarming cycles revealed five genes (Gpx4, Eefsec, Pstk, Secisbp2, and Sepsecs) as critical components of the GPX4 pathway, which protects against cold-induced ferroptosis. A second screen with continuous cold exposure confirmed the essential role of GPX4 in prolonged cold tolerance. GPX4 knockout lines exhibited complete cell death within four days of cold exposure, and pharmacological inhibition of GPX4 further increased cell death, underscoring the necessity of GPX4's catalytic activity in cold conditions.

      An additional CRISPR screen in human cold-sensitive K562 cells identified 176 genes for cold survival. The GPX4 pathway was found to confer significant resistance to cold in hibernators and human cells, with GPX4 loss significantly increasing cold-induced cell death.

      Comparing hamster and human GPX4, overexpression of GPX4 in human K562 cells, whether hamster or human GPX4, dramatically improved cold tolerance, while catalytically dead mutants showed no such effect. These findings suggest that GPX4 abundance is a key limiting factor for cold tolerance in human cells, and primary cell types show strong sensitivity to GPX4 loss, highlighting that differences in cold tolerance across species may be due to varying GPX4-mediated protection.

      Strengths:

      (1) Innovative Approach: The study employs a series of unbiased genome-wide CRISPR-Cas9 screens in both hibernator- and non-hibernator-derived cells to investigate the mechanisms controlling cellular cold tolerance. Notably, this is the first genome-scale CRISPR-Cas9 screen conducted in cells derived from a hibernator, the Syrian hamster.

      (2) Identification of the GPX4 Pathway: Identifying glutathione peroxidase 4 (GPX4) as a critical suppressor of cold-induced cell death significantly contributes to the field. Recently, GPX4 was also reported as a potent regulator of cold tolerance through overexpression screening (Sone et al.) in hamsters, which further supports this finding.

      (3) Improved Cold Viability Assessment: The study identifies an important technical artifact in using trypan blue to assess cell viability following cold exposure. It reveals that cells stained immediately after cold exposure retain the dye, inaccurately indicating cell death. By introducing a brief rewarming period before viability assessment, the authors significantly improve the accuracy of detecting cold-induced cell death. This refinement in methodology ensures more reliable results and sets a new standard for future research on cold stress in cells.

      Weaknesses:

      (1) Mechanisms Regulating GPX4 Levels: While the study highlights GPX4 levels as a major determinant of cellular cold tolerance, it does not discuss how these levels are regulated or why they differ between hibernators and non-hibernators. This omission leaves an important aspect of GPX4's role in cold tolerance unexplored.

      (2) Generalizability Across Species: Although the study demonstrates the role of GPX4 in several mammalian species, it does not investigate whether this mechanism extends to other vertebrates (e.g., fish and amphibians) that also face cold challenges. This limitation could restrict the broader evolutionary claims made by the study.

      (3) Variability in Cold Sensitivity Across Human Cell Lines: The study observes significant variability in cold tolerance among different human cell lines but does not explain these differences clearly. This leaves a key aspect of human cell cold sensitivity insufficiently addressed.

      We thank the reviewer for the positive evaluation and thoughtful comments on the manuscript. We acknowledge that our study does not delve into the mechanisms regulating GPX4 levels, including differences between hibernators and non-hibernators, differences between cell types, or the possibility that GPX4 levels are dynamically regulated by environmental conditions. We consider these as interesting open questions that could be addressed in future studies.

      While our study focused entirely on mammalian species, we agree that examining cold tolerance mechanisms across a broader range of vertebrates, including fish and amphibians, could enhance our evolutionary perspective. Interestingly, previous work has indicated that C.elegans adapt to cold temperatures through ferritin mediated Fe2+ detoxification. This suggests that cold induces Fe2+-mediated toxicity in C.elegans as well as mammalian cells, but that the mechanisms through which distantly related species counteract cold-mediated cell death may vary. 

      Finally, we agree that the variability in cold sensitivity across human cell lines could be further explored, and we will strongly consider conducting follow up experiments to examine the extent to which this variability is driven by levels of GPX4.

      We are grateful for these insightful comments, as they highlight important avenues for future research. Addressing these questions will enable a more comprehensive understanding of GPX4's role in cold tolerance and its evolutionary significance across diverse organisms.

      Reviewer #2 (Public review):

      Summary:

      Lam et al., present a very intriguing whole genome CRISPR screen in Syrian Hamster cells as well as K562 cells to identify key genes involved in hypothermia-rewarming tolerance. Survival screens were performed by exposing cells to 4C in a cooled CO2 incubator followed by a rewarming period of 30 minutes prior to survival analysis. In this paradigm, Syrian hamster-derived cell lines exhibit more robust survival than human cell lines (BHK-21 and HaK vs HT1080, HeLa, RPE1, and K562). A genome-wide Syrian hamster CRISPR library was created targeting all annotated genes with 10 guides/gene. LV transduction of the library was performed in BHK-21 cells and the survival screen procedures involved 3 cycles of 4C cold exposure x4 days followed by 2 days of re-warming.

      When compared to controls maintained at 37C, 9 genes were required for BHK-21 survival of cold cycling conditions and 5 of these 9 are known components of the GPX4 antioxidant pathway. GPX4 KO BHK-21 cells had reduced cell growth at 37C and profoundly worse cold tolerance which could be reduced by GPX4 expression. GPX4 inhibitors also reduced survival in cold. CRISPR KO screens and GPX4 KO in K562 cells revealed comparable results (though intriguingly glutathione biosynthesis genes were more critical to K562 cells than BHK-21 cells). Human or Syrian hamster GPX4 overexpression improved cold tolerance.

      Strengths:

      This is a very nicely written paper that clearly communicates in figures and text complicated experimental manipulations and in vitro genetic screening and cell survival data. The focus on GPX4 is interesting and relatively novel. The converging pharmacologic, loss-of-function, and gain-of-function experiments are also a strength.

      Weaknesses:

      A recently published article (Reference 43, Sone et al.) also independently explored the role of GPX4 in Syrian hamster cold tolerance through gain-of-function screening. Further exploration of the GPX4 species-specific mechanisms would be of great interest, but this is considered a minor weakness given the already very comprehensive and compelling data presented.

      We greatly appreciate the reviewer’s compliments and thoughtful comments on our manuscript. We agree with the reviewer that our approach (dual unbiased genome-scale screens in human and hamster cells) and the recent investigation by Sone et al (gain-of-function screening involving the insertion of hamster cDNA into human cells) mutually strengthen the importance of GPX4 in cold tolerance across cell types and species.

      Reviewer #3 (Public review):

      Summary:

      This work aims to address a fundamental biological question: how do mammalian cells achieve/lose tolerance to cold exposure? The authors first tried to establish an experimental system for cell cold exposure and evaluation of cell death and then performed genome-scale CRISPR-Cas9 screening on immortalized cell lines from Syrian Hamster (BHK-21) and human (K562) for key genes that are associated with cell survival during prolonged cold exposure. From these screenings, they focused on glutathione peroxidase 4 (GPX4). Using genetic modifications or pharmacological interventions, and multiple cell models including primary cells from various mammalian species, they showed that GPX4 proteins are likely to retain their activities at 4 {degree sign}C, functioning to prevent cold-induced cell ferroptosis.

      Strengths:

      (1) This paper is neatly written and hence easy to follow.

      (2) Experiments are well designed.

      (3) The data showing the overall good cell survival after a prolonged cold exposure or repeated cold-warm cycles are helpful to show the advantages of the experimental instruments and methods the authors used, and hence the validity of their results.

      (4) The CRISPR-Cas9 screening is a great attempt.

      (5) Multiple cell types from hibernating mammals (cold tolerant) and cold-intolerant species are used to test their findings.

      (6) Although some may argue that other labs have published works with different approaches that have pointed out the importance of GPX4 and ferroptosis in hamster cell survival from anoxia-reoxygenation or cold exposure models, hence hurting the novelty of this work, this reviewer thinks that it is highly valuable to have independent research groups and different methods/systems to validate an important concept.

      Weaknesses:

      (1) Only cell death was robustly surveyed; though cell proliferation was evaluated too in some experiments, other cellular functions, such as mitochondrial ATP production vs. glycolysis, and the extent of lipid peroxidation, could have been measured to reflect cellular physiology.

      Validations on complex tissues or in vivo systems would have further strengthened the work and its impact.

      CRISPR-Cas9 screening may have technical limitations as knock-out of some essential genes/pathways may lead to cell lethality during screening, and hence the relevance of these genes/pathways to cell cold tolerance may not be noted. From the data presented in this study, this reviewer thinks that the GPX4 pathway is likely a conserved mechanism for long-term cold survival, but not for cold sensitivity or acute cell death from cold exposure. In line with my such speculation, their CRISPR-Cas9 screening revealed genes in the GPX4 pathway from a relatively cold-sensitive human cell line, but the endogenous GPX4 pathway is seemingly operational in this cold-sensitive cell line. Also, these cells are viable after GPX4 knock-out. Dead cells from the acute cold exposure phase may detached, or their genomic DNAs have been severely damaged by the time of sample collection, hence not giving any meaningful sequencing reads. Crippling other factors/pathways such as FOXO1 (PMID: 38570500) or 5-aminolevulinic acid (ALA) metabolism (PMID: 35401816) have been shown to severely aggravate cold-induced cell death, including TUNEL-revealed DNA damage, within a much shorter time scale, whilst loss-function knockouts of FOXO1 or ALA Synthase 1 (ALAS1) are usually cell lethal. Thus, they and other possible essential genes may not be screenable from the current experimental protocol. These important points need to be taken into consideration by the authors.

      We thank the reviewer for highlighting the novelty of using genome-scale CRISPR-Cas9 screens and the validation of GPX4 function across cell types and mammalian species. 

      We acknowledge that our study primarily focused on measuring cell death using Trypan Blue dye exclusion. To validate the Trypan Blue assay, cell survival data was orthogonally measured using the LDH release assays (Fig. 1g). The proliferation potential of putatively live cells was assessed by counting the increase in live cells following 24 h at 37°C (Fig. 1b). Prompted by your question, we will add additional data to the final version of the manuscript in which we show that following 1 day at 4°C, K562 cells rapidly restarted their cell cycle and double in numbers every 21 hours (Author response image 1). This rate is indistinguishable from the replication rate of cells that were not previously exposed to 4°C, suggesting that the cells following cold exposure are both alive and functionally capable of replicating.

      Author response image 1.

      Population doubling time of K562 cells cultured at 37°C (pink) and cells that are rewarmed to 37°C following 1 day of 4°C exposure

      We agree that assessing additional cellular functions, such as mitochondrial ATP production, glycolysis, lipid metabolism and peroxidation could provide a more comprehensive understanding of cellular physiology under cold stress and would be valuable future studies. Similarly, we appreciate the suggestion to validate our findings in complex tissues or in vivo models. We recognize that such validation could strengthen the implications of our study and enhance its translational potential; however, due to their complexity, we believe that these additional studies are beyond the scope of our current study.

      We agree with the reviewer that CRISPR-Cas9 screens have limitations. For example our screen was designed to identify genes that are preferentially required for cellular fitness at 4°C versus 37°C. There are many genes that are required for cellular survival at 4°C as well as 37°C that are not discussed (Table S2, S5). Also, given that the screen is designed to disrupt a single gene per cell, genes that have redundant functions in cold-tolerance will likely be missed. Given the reviewer’s questions, we will expand the discussion of the paper to highlight limitations of the screen.

      We apologize for any lack of clarity about the methods we employed during the screen and will expand the methods section to provide further details. For example, for the BHK-21 screen we eliminated dead cells by sequencing cells that reattached after rewarming to 37°C for either 30 minutes (15 day cold exposure screen) or 24 hours (4°C cycling screen). Indeed, at the point of cell collection for both BHK-21 and K562 screens, the fraction of live cells was greater than 92% and 95%, respectively.  We respectfully disagree with the reviewer that our screens would miss genes that affect acute cold tolerance. Any cells that would have died either early or late during cold exposure would have not been sequenced, and thus the sgRNAs targeting a specific gene in those cells would appear depleted, regardless of whether these cells died early/acutely or later during cold exposure. 

      We thank the reviewer for pointing out two additionally highly relevant studies. Interestingly, the genes implicated in cold tolerance in these studies, FOXO1 and ALAS1, did not appear essential for survival at 37°C or 4°C  in BHK-21 or K562 cells. There are several possibilities that could explain this finding: 1) our screen may not have successfully knocked out these genes, 2) other proteins may have compensated for their loss, or 3) these pathways may regulate cold tolerance in some but not all cell types. We apologize that in the current version of the manuscript we did not reflect on these recent studies. We will expand our discussion to include their findings. 

      Once again, we are grateful for the reviewer’s insights, which have highlighted key areas for further exploration as well as pointed to specific ways to improve our manuscript.

    1. eLife Assessment

      The presented evidence is compelling given a range of complementary and mutually supportive studies. Experiments are generally robustly conducted and well-presented, supporting the claims regarding miRNA mechanisms converging on EMC10 overexpression with 22q11 Del. This is an important study that works to establish a novel antisense oligonucleotide-based approach to treating 22q11.2 deletion syndrome; the findings are likely to advance therapeutic efforts. The authors provide evidence both in vitro in patient-derived iPSCs and in vivo in a 22q11 Del mouse supporting the knockdown (KD) of EMC10 as an effective strategy for the amelioration of neuronal and behavioral deficits.

    2. Reviewer #1 (Public review):

      Summary:

      This is an important and very well-presented set of experiments following up on prior work from the lab investigating knock-down (KD) of EMC10 in the restoration of neuronal and cognitive deficits in 22q11.2 Del models, including now both human iPSCs and a mouse model in vivo now with ASOs.

      The valuable progress in this current manuscript is the development of ASOs, and the proof of efficacy in vivo in mice of the ASO in knock-down of EMC10 and amelioration of in vivo behavioral phenotypes.

      The experiments include iPSC studies demonstrating elevations of EMC10 in a solid collection of paired iPSC lines. These studies also provide evidence of manipulation of EMC10 by overexpression and inhibition of miRNAs that exist in the 22q11 interval. The iPSC studies also nicely demonstrate the rescue of impairments with KD of EMC10 in neuronal arborization as well as KCl-induced neuronal activity. The major in vivo contributions reflect an impressive demonstration of the efficacy of two ASOs in vivo on both KD of EMC10 in vivo and through improvement in behavioral abnormalities in the 22q11 mouse in a range of different behaviors, including social behavior and learning behaviors.

      Overall, there are many strengths reflected in this study, including in particular the synergy between in vitro studies in human cell models and in vivo studies in the well-characterized mouse model. The experiments are generally rigorously performed, well-powered, and nicely presented. The claims with regard to the mechanisms of EMC10 elevations and the importance of restoration of EMC10 expression to neuronal morphology and behavior are well supported by the data. The work may be further supported in future studies, by investigation of rescue by ASOs of circuit dysfunction in vivo or ex vivo through electrophysiology in the mouse model. Also, in future studies, investigation of the mechanism by which EMC10, an ER protein involved in protein processing, may function in the observed neuronal abnormalities; however, these studies are clearly for future investigations.

      The potential impact of the work is found in the potential value of the ASO approach to the treatment of 22q11, or the pre-clinical evidence that knock-down of this protein may lead to some amelioration of cognitive symptoms. Overall, a very convincing and complementary set of experiments to support EMC10 KD as a therapeutic strategy.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Thakur et. al seeks to establish a novel ASO-based approach to treat 22q11.2 deletion syndrome. Central to this thesis is that an ER membrane complex member called EMC10 is significantly increased in the disorder, which is largely attributed to the loss of miRNA-mediated repression. The authors generated three new iPSC cell lines for the disorder and showed that deletion of EMC10 rescues morphology and Ca-flux deficits. They go on to show that post-symptomatic deletion of Emc10 in mice using a conditional-off tamoxifen allele reverses social memory phenotypes. Finally, in collaboration with Ionis, they developed two new ASOs to knock down EMC10 and show that social and spatial memory phenotypes are rescued, even two months after injection.

      Strengths:

      In general, this represents a substantial undertaking and an impressive body of work. The experiments follow a logical progression and in most cases are well-controlled. The isolation of EMC10 effects relative to the broader miRNA disruption is viewed as impactful. The use of both genetic and ASO approaches to validate the therapeutic strategy is also viewed as highly positive. The authors' contention that EMC10 can be targeted at post-symptomatic time points to reverse 22q11.2 deletion syndrome is supported by the data. Further, they have provided a therapeutic mechanism to do so. These findings are likely to be impactful and lead to further development efforts.

      Weaknesses:

      The primary weaknesses of the manuscript lie in incomplete or inappropriate data analysis, as well as a failure to validate key experiments. For example, both genetic and ASO-mediated EMC10-mediated reductions are assessed at the level of mRNA, but only one experiment, in one brain region, is validated at the protein level. This brain region is the PFC, which is problematic when many of the phenotypes used have a strong hippocampal component. Likewise, the iPSC experiments make the case that excitatory neurons are central to the phenotype, but no effort is made to show that the ASOs are entering that type of neuron, or even any quantification of what percentage of cells in the target brain regions (HPC, PFC, etc.) are positive for the ASO. There is only a single image provided of staining with a phosphorothioate antibody and a claim of robust uptake, which cannot be assumed. The iPSC transcriptomics work would also benefit from a more comprehensive comparison between the EMC10 knockout lines and their parent 22q11 deletion lines. Further, there are other examples where the statistics used are either wrong (Figure 3 t-test vs ANOVA) or missing (Figure S2). These technical and analytical shortcomings make it challenging to fully interpret the data and detract from an otherwise exciting manuscript.

    1. eLife Assessment

      This is an important study that aims to investigate the behavioral relevance of multisensory responses recorded in the auditory cortex. The experiments are elegant and well-designed, and are supported by appropriate analyses of the data. However, the evidence presented for learning-dependent encoding of visual information is incomplete and it is possible that the surprisingly short-latency increases in activity are actually motor-related signals. Demonstrating that they really are visual responses is necessary in order to draw definitive conclusions from this study.

    2. Reviewer #1 (Public review):

      Summary:

      Chang and colleagues used tetrode recordings in behaving rats to study how learning an audiovisual discrimination task shapes multisensory interactions in the auditory cortex. They found that a significant fraction of neurons in the auditory cortex responded to visual (crossmodal) and audiovisual stimuli. Both auditory-responsive and visually-responsive neurons preferentially responded to the cue signaling the contralateral choice in the two-alternative forced choice task. Importantly, multisensory interactions were similarly specific for the congruent audiovisual pairing for the contralateral side.

      Strengths:

      The experiments were conducted in a rigorous manner. Particularly thorough are the comparisons across cohorts of rats trained in a control task, in a unisensory auditory discrimination task, and the multisensory task, while also varying the recording hemisphere and behavioral state (engaged vs. anesthesia). The resulting contrasts strengthen the authors' findings and rule out important alternative explanations. Through the comparisons, they show that the enhancements of multisensory responses in the auditory cortex are specific to the paired audiovisual stimulus and specific to contralateral choices in correct trials and thus dependent on learned associations in a task-engaged state.

      Weaknesses:

      The main result is that multisensory interactions are specific for contralateral paired audiovisual stimuli, which is consistent across experiments and interpretable as a learned task-dependent effect. However, the alternative interpretation of behavioral signals is crucial to rule out, which would also be specific to contralateral, correct trials in trained animals. Although the authors focus on the first 150 ms after cue onset, some of the temporal profiles of activity suggest that choice-related activity could confound some of the results.

      The auditory stimuli appear to be encoded by short transient activity (in line with much of what we know about the auditory system), likely with onset latencies (not reported) of 15-30 ms. Stimulus identity can be decoded (Figure 2j) apparently with an onset latency around 50-75 ms (only the difference between A and AV groups is reported) and can be decoded near perfectly for an extended time window, without a dip in decoding performance that is observed in the mean activity Figure 2e. The dynamics of the response of the example neurons presented in Figures 2c and d and the average in 2e therefore do not entirely match the population decoding profile in 2j. Population decoding uses the population activity distribution, rather than the mean, so this is not inherently problematic. It suggests however that the stimulus identity can be decoded from later (choice-related?) activity. The dynamics of the population decoding accuracy are in line with the dynamics one could expect based on choice-related activity. Also the results in Figures S2e,f suggest differences between the two learned stimuli can be in the late phase of the response, not in the early phase.

      First, it would help to have the same time axis across panels 2,c,d,e,j,k. Second, a careful temporal dissociation of when the central result of multisensory enhancements occurs in time would discriminate better early sensory processing-related effects versus later decision-related modulations.

      In the abstract, the authors mention "a unique integration model", "selective multisensory enhancement for specific auditory-visual pairings", and "using this distinct integrative mechanisms". I would strongly recommend that the authors try to phrase their results more concretely, which I believe would benefit many readers, i.e. selective how (which neurons) and specific for which pairings?

    3. Reviewer #2 (Public review):

      Summary

      In this study, rats were trained to discriminate auditory frequency and visual form/orientation for both unisensory and coherently presented AV stimuli. Recordings were made in the auditory cortex during behaviour and compared to those obtained in various control animals/conditions. The central finding is that AC neurons preferentially represent the contralateral-conditioned stimulus - for the main animal cohort this was a 10k tone and a vertically oriented bar. Over 1/3rd of neurons in AC were either AV/V/A+V and while a variety of multisensory neurons were recorded, the dominant response was excitation by the correctly oriented visual stimulus (interestingly this preference was absent in the visual-only neurons). Animals performing a simple version of the task in which responses were contingent on the presence of a stimulus rather than its identity showed a smaller proportion of AV stimuli and did not exhibit a preference for contralateral conditioned stimuli. The contralateral conditioned dominance was substantially less under anesthesia in the trained animals and was present in a cohort of animals trained with the reverse left/right contingency. Population decoding showed that visual cues did not increase the performance of the decoder but accelerated the rate at which it saturated. Rats trained on auditory and then visual stimuli (rather than simultaneously with A/V/AV) showed many fewer integrative neurons.

      Strengths

      There is a lot that I like about this paper - the study is well-powered with multiple groups (free choice, reversed contingency, unisensory trained, anesthesia) which provides a lot of strength to their conclusions and there are many interesting details within the paper itself. Surprisingly few studies have attempted to address whether multisensory responses in the unisensory cortex contribute to behaviour - and the main one that attempted to address this question (Lemus et al., 2010, uncited by this study) showed that while present in AC, somatosensory responses did not appear to contribute to perception. The present manuscript suggests otherwise and critically does so in the context of a task in which animals exhibit a multisensory advantage (this was lacking in Lemus et al.,). The behaviour is robust, with AV stimuli eliciting superior performance to either auditory or visual unisensory stimuli (visual were slightly worse than auditory but both were well above chance).

      Weaknesses

      I have a number of points that in my opinion require clarification and I have suggestions for ways in which the paper could be strengthened. In addition to these points, I admit to being slightly baffled by the response latencies; while I am not an expert in the rat, usually in the early sensory cortex auditory responses are significantly faster than visual ones (mirroring the relative first spike latencies of A1 and V1 and the different transduction mechanisms in the cochlea and retina). Yet here, the latencies look identical - if I draw a line down the pdf on the population level responses the peak of the visual and auditory is indistinguishable. This makes me wonder whether these are not sensory responses - yet, they look sensory (very tightly stimulus-locked). Are these latencies a consequence of this being AuD and not A1, or ... ? Have the authors performed movement-triggered analysis to illustrate that these responses are not related to movement out of the central port, or is it possible that both sounds and visual stimuli elicit characteristic whisking movements? Lastly, has the latency of the signals been measured (i.e. you generate and play them out synchronously, but is it possible that there is a delay on the audio channel introduced by the amp, which in turn makes it appear as if the neural signals are synchronous? If the latter were the case I wouldn't see it as a problem as many studies use a temporal offset in order to give the best chance of aligning signals in the brain, but this is such an obvious difference from what we would expect in other species that it requires some sort of explanation.

      Reaction times were faster in the AV condition - it would be of interest to know whether this acceleration is sufficient to violate a race model, given the arbitrary pairing of these stimuli. This would give some insight into whether the animals are really integrating the sensory information. It would also be good to clarify whether the reaction time is the time taken to leave the center port or respond at the peripheral one.

      The manuscript is very vague about the origin or responses - are these in AuD, A1, AuV... ? Some attempts to separate out responses if possible by laminar depth and certainly by field are necessary. It is known from other species that multisensory responses are more numerous, and show greater behavioural modulation in non-primary areas (e.g. Atilgan et al., 2018).

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang et al. aims to investigate how the behavioral relevance of auditory and visual stimuli influences the way in which the primary auditory cortex encodes auditory, visual, and audiovisual information. The main result is that behavioral training induces an increase in the encoding of auditory and visual information and in multisensory enhancement that is mainly related to the choice located contralaterally with respect to the recorded hemisphere.

      Strengths:

      The manuscript reports the results of an elegant and well-planned experiment meant to investigate if the auditory cortex encodes visual information and how learning shapes visual responsiveness in the auditory cortex. Analyses are typically well done and properly address the questions raised

      Weaknesses:

      Major

      (1) The authors apparently primarily focus their analyses of sensory-evoked responses in approximately the first 100 ms following stimulus onset. Even if I could not find an indication of which precise temporal range the authors used for analysis in the manuscript, this is the range where sensory-evoked responses are shown to occur in the manuscript figures. While this is a reasonable range for auditory evoked responses, the same cannot be said for visual responses, which commonly peak around 100-120 ms, in V1. In fact, the latency and overall shape of visual responses are quite different from typical visual responses, that are commonly shown to display a delay of up to 100 ms with respect to auditory responses. All traces that the authors show, instead, display visual responses strikingly overlapping with auditory ones, which is not in line with what one would expect based on our physiological understanding of cortical visually-evoked responses. Similarly, the fact that the onset of decoding accuracy (Figure 2j) anticipates during multisensory compared to auditory-only trials is hard to reconcile with the fact that visual responses have a later onset latency compared to auditory ones. The authors thus need to provide unequivocal evidence that the results they observe are truly visual in origin. This is especially important in view of the ever-growing literature showing that sensory cortices encode signals representing spontaneous motor actions, but also other forms of non-sensory information that can be taken prima facie to be of sensory origin. This is a problem that only now we realize has affected a lot of early literature, especially - but not only - in the field of multisensory processing. It is thus imperative that the authors provide evidence supporting the true visual nature of the activity reported during auditory and multisensory conditions, in both trained, free-choice, and anesthetised conditions. This could for example be achieved causally (e.g. via optogenetics) to provide the strongest evidence about the visual nature of the reported results, but it's up to the authors to identify a viable solution. This also applies to the enhancement of matched stimuli, that could potentially be explained in terms of spontaneous motor activity and/or pre-motor influences. In the absence of this evidence, I would discourage the author from drawing any conclusion about the visual nature of the observed activity in the auditory cortex.

      (2) The finding that AC neurons in trained mice preferentially respond - and enhance - auditory and visual responses pertaining to the contralateral choice is interesting, but the study does not show evidence for the functional relevance of this phenomenon. As has become more and more evident over the past few years (see e.g. the literature on mouse PPC), correlated neural activity is not an indication of functional role. Therefore, in the absence of causal evidence, the functional role of the reported AC correlates should not be overstated by the authors. My opinion is that, starting from the title, the authors need to much more carefully discuss the implications of their findings.

      MINOR:

      (1) The manuscript is lacking what pertains to the revised interpretation of most studies about audiovisual interactions in primary sensory cortices following the recent studies revealing that most of what was considered to be crossmodal actually reflects motor aspects. In particular, recent evidence suggests that sensory-induced spontaneous motor responses may have a surprisingly fast latency (within 40 ms; Clayton et al. 2024). Such responses might also underlie the contralaterally-tuned responses observed by the authors if one assumes that mice learn a stereotypical response that is primed by the upcoming goal-directed, learned response. Given that a full exploration of this issue would require high-speed tracking of orofacial and body motions, the authors should at least revise the discussion and the possible interpretation of their results not just on the basis of the literature, but after carefully revising the literature in view of the most recent findings, that challenge earlier interpretations of experimental results.

      (2) The methods section is a bit lacking in details. For instance, information about the temporal window of analysis for sensory-evoked responses is lacking. Another example: for the spike sorting procedure, limited details are given about inclusion/exclusion criteria. This makes it hard to navigate the manuscript and fully understand the experimental paradigm. I would recommend critically revising and expanding the methods section.

    1. eLife Assessment

      This study addresses a significant question in sensory ethology and active sensing in particular. It links the production of a specific signal - electrosensory chirps - to various contexts and conditions to propose that chirps may also serve an active sensing role in addition to their more well-known role in communication. The evidence supporting the role for active sensing is strong. In particular, the evidence showing increased chirping in more cluttered environments and the relationship between chirping and movement are convincing. The study provides a lot of valuable data, and is likely to stimulate follow-up behavioral and physiological studies.

    2. Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      The study provides a wealth of interesting observations of behavior and much of this data constitutes a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth being considered and explored further.

      After the initial reviewers' comments, the authors performed a welcome revision of the way the results are presented. Overall the study has been improved by the revisions.

    3. Reviewer #2 (Public Review):

      Studying Apteronotus leptorhynchus (the weakly electric brown ghost knifefish), the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing wave-like electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. Chirping is a behavior that has been well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation. Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that should have a great impact on the field.

      The authors provide convincing evidence that chirps may function in homeoactive sensing. In particular, the evidence showing increased chirping in more cluttered environments and a relationship between chirping and movement are especially strong and suggestive. Their evidence arguing against a role for chirps in communication is not as strong. However, based on an extensive review of the literature, the authors conclude, I think fairly, that the evidence arguing in favor of a communication function is limited and inconclusive. Thus, the real strength of this study is not that it conclusively refutes the communication hypothesis, but that it calls this hypothesis into question while also providing compelling evidence in favor of an alternative function.

      In summary, although the evidence against a role for chirps in communication is not as strong as the evidence for a role in active sensing, this study presents very interesting data that is sure to stimulate discussion and follow-up studies. The authors acknowledge that chirps could function as both a communication and homeactive sensing signal, and the language arguing against a communication function is appropriately measured. A given electrical behavior could serve both communication and homeoactive sensing. I suspect this is quite common in electric fish (not just in gymnotiforms such as the species studied here, but also in the distantly related mormyrids), and perhaps in other actively sensing species such as echolocating animals.

    4. Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, without and with playback experiments. It applies state-of-the-art methods for reducing the dimensionality of the data and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that the traditionally assumed communication function of chirps may be secondary to its role in environmental assessment and exploration that takes social context into account. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats caused by other fish as well as objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry. The BEM modelling also convincingly predicts how the electric image of a receiver conspecific on a sending fish is enhanced by a chirp.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a primary communication goal for most chirps. Rather, the key determinants of chirping are the difference in frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. The paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-receiver chirp transitions beyond the known increase in chirp frequency during an interaction. The authors carefully submit that the new putative echolocation function of chirps is not mutually exclusive with a possible communication function.

      These conclusions by themselves will be very useful to the field. They will also allow scientists working on other "communication" systems to perhaps reconsider and expand the goals of the probes used in those senses. A lot of data are summarized in this paper, with thorough referencing to past work.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization, and in this sense are self-directed signals. This led to their prediction that environmental complexity ("clutter") should increase chirp rate, which is fact was revealed by their new experiments. The authors also argue that waveform EODs have less power across high spatial frequencies compared to pulse-type fish, with a resulting relatively impoverished power of resolution. Chirping in wave-type fish could temporarily compensate for the lower frequency resolution while still being able to resolve EOD perturbations with a good temporal definition (which pulse-type fish lack due to low pulse rates).

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water. The paper provides a number of experimental avenues to pursue in order to validate the non-communication role of chirps.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This study addresses a question in sensory ethology and active sensing in particular. It links the production of a specific signal - electrosensory chirps - to various contexts and conditions to argue that the main function is to enhance conspecific localization rather than communication as previously believed. The study provides a lot of valuable data, but the methods section is incomplete making it difficult to evaluate the claims.

      We have now added to the methods a new paragraph describing in better detail the analysis done to prepare the data used in figure 7. The figure itself has been substantially changed: we now show EOD fields and electric images using voltage, instead of current and we have better illustrated the comparisons between chirps and beats using statistical analysis.

      Eventually, we are equally grateful to all Reviewers for the constructive criticism and for the time spent in evaluating our manuscript. It certainly helped to improve both the quality of the data presented as well as the readability of the text.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      The study provides a wealth of interesting observations of behavior and much of this data constitutes a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth being considered and explored further.

      After the initial reviewers' comments, the authors performed a welcome revision of the way the results are presented. Overall the study has been improved by the revision. However, one piece of new data is perplexing to me. The new figure 7 presents the results of a model analysis of the strength of the EI caused by a second fish to localize when the focal fish is chirping. From my understanding of this type of model, EOD frequency is not a parameter in the model since it evaluates the strength of the field at a given point in time. Therefore the only thing that matters is the phase relationship and strength of the EOD. Assuming that the second fish's EOD is kept constant and the phase relationship is also the same, the only difference during a chirp that could affect the result of the calculation is the potential decrease in EOD amplitude during the chirp. It is indeed logical that if the focal fish decreased its EOD amplitude the target fish's EOD becomes relatively stronger. Where things are harder to understand is why the different types of chirps (e.g. type 1 vs type 2) lead to the same increase in signal even though they are typically associated with different levels of amplitude modulations. Also, it is hard to imagine that a type 2 chirp that is barely associated with any decrease in EOD amplitude (0-10% maybe), would cause a doubling of the EI strength. There might be something I don't understand but the authors should provide a lot more details on how this result is obtained and convince us that it makes sense.

      We hope we have now resolved the Reviewer’s concerns by applying major edits to Figure 7. We now use voltage - not current - to quantify the impact of chirps on electric images. The effect of chirps is here estimated using the integral of the beat AM, as a broad measure of the potential effects chirping may have on electroreceptors. We underline in the text that this analysis does not represent proof for any type of processing occurring in the fish brain, but we only express in hypothetical terms that - based on the beat perturbations measured - additional spatial information may potentially be available in electric images, as a consequence of chirping. Whether the fish uses this information, or not, needs to be assessed through electrophysiology in future studies.

      Finally, the reviewer is concerned about this sentence in the rebuttal - "The methods section has been edited to clarify the approach (not yet)". This section is unfinished, which suggests that it is difficult to explain the modeling results from a logical point of view. Thus the reviewer's major concern from the previous review remains unresolved. To summarize, the model calculates field strengths at an instant in time and integrates over time with a 500 ms window. This window is 10 times longer than the small chirps, while the longer chirps cover a much larger proportion of the window. Yet, the small chirps have a bigger impact on discriminability than the longer chirps. The authors should attempt to explain this seemingly contradictory result. This remains a major issue because this analysis was the most direct evidence that chirping could impact localization accuracy.

      We added a new method section describing the new figure and hopefully it is explaining more clearly how the effect of chirps is calculated. Since most p-units are affected by the beat cyclic AMs, any change on the electric image caused by a chirp will result in changes in transcutaneous voltage - i.e. the voltage measurable at the receptor level. Overall, this added analysis is not a central point of the manuscript, it is part of an attempt to hint to physiological mechanisms implied which cannot be explored in the current study. We do not mean to propose that these estimates represent alternatives to electrophysiological recordings, rather theoretical evidences which could in fact support this type of investigation. 

      Reviewer #2 (Public Review):

      Studying Apteronotus leptorhynchus (the weakly electric brown ghost knifefish), the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing wave-like electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. Chirping is a behavior that has been well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation. Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that should have a great impact on the field.

      The authors provide convincing evidence that chirps may function in homeoactive sensing. In particular, the evidence showing increased chirping in more cluttered environments and a relationship between chirping and movement are especially strong and suggestive. Their evidence arguing against a role for chirps in communication is not as strong. However, based on an extensive review of the literature, the authors conclude, I think fairly, that the evidence arguing in favor of a communication function is limited and inconclusive. Thus, the real strength of this study is not that it conclusively refutes the communication hypothesis, but that it calls this hypothesis into question while also providing compelling evidence in favor of an alternative function.

      In summary, although the evidence against a role for chirps in communication is not as strong as the evidence for a role in active sensing, this study presents very interesting data that is sure to stimulate discussion and follow-up studies. The authors acknowledge that chirps could function as both a communication and homeactive sensing signal, and the language arguing against a communication function is appropriately measured. A given electrical behavior could serve both communication and homeoactive sensing. I suspect this is quite common in electric fish (not just in gymnotiforms such as the species studied here, but also in the distantly related mormyrids), and perhaps in other actively sensing species such as echolocating animals.

      We are grateful to the Reviewer for the kind assessment.

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, without and with playback experiments. It applies state-of-the-art methods for reducing the dimensionality of the data and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that the traditionally assumed communication function of chirps may be secondary to its role in environmental assessment and exploration that takes social context into account. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats caused by other fish as well as objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry. The BEM modelling also convincingly predicts how the electric image of a receiver conspecific on a sending fish is enhanced by a chirp.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a primary communication goal for most chirps. Rather, the key determinants of chirping are the difference in frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. The paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-receiver chirp transitions beyond the known increase in chirp frequency during an interaction. The authors carefully submit that the new putative echolocation function of chirps is not mutually exclusive with a possible communication function.

      These conclusions by themselves will be very useful to the field. They will also allow scientists working on other "communication" systems to perhaps reconsider and expand the goals of the probes used in those senses. A lot of data are summarized in this paper, with thorough referencing to past work.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization, and in this sense are self-directed signals. This led to their prediction that environmental complexity ("clutter") should increase chirp rate, which is fact was revealed by their new experiments. The authors also argue that waveform EODs have less power across high spatial frequencies compared to pulse-type fish, with a resulting relatively impoverished power of resolution. Chirping in wave-type fish could temporarily compensate for the lower frequency resolution while still being able to resolve EOD perturbations with a good temporal definition (which pulse-type fish lack due to low pulse rates).

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water. The paper provides a number of experimental avenues to pursue in order to validate the non-communication role of chirps.

      We are grateful to the Reviewer for the kind assessment.

    1. eLife Assessment

      This work attempts to demonstrate an ATP-independent non-canonical role of proteasomal component PA28y in the promotion of oral squamous cell carcinoma growth, migration, and invasion. The evidence remains incomplete and the work would benefit from further experimental work. The authors have not adequately addressed the reviewers' comments.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript the authors have tried to dissect the functions of Proteasome activator 28γ (PA28γ) which is known to activate proteosomal function in an ATP independent manner. Although there are multiple works that have highlighted the role of this protein in tumour, this study specifically tried to develop a correlate with Complement C1q binding protein (C1QBp) that is associated with immune response and energy homeostasis.

      Strengths:

      The observations of the authors hint that beyond PA28y association with proteasome, it might also stabilize certain proteins such as C1QBP which influences the energy metabolism.

      Weaknesses:

      The strength of the work also becomes its main drawback. That is, how PA28y stabilizes C1QBP or how C1QBP elicits its pro-tumourigenic role under PA28y OE.

      In most of the experiments the authors have been dependent on the parallel changes in the expression of both the proteins to justify their stabilizing interaction. However, this approach is indirect at best and does not confirm the direct stabilizing effect of this interaction. IP experiments do not indicate direct interaction and have some quality issues. The upregulation of C1QBP might be indirect at best. It is quite possible that PA28y might be degrading some secondary protein/complex which is responsible for C1QBP expression. Since the core idea of the work is PA28y direct interaction with C1QBP stabilizing it, the same should be demonstrated in more convincing manner.

      In all of the assays C1QBP has been detected as doublet. However, the expression pattern of the two bands vary depending on the experiment. In some cases the upper band is intensely stained and in some the lower bands. Does C1QBP isoforms exist and whether they are differentially regulated depending on experiment conditions/tissue types?

      Problems with the background of the work: Line 76. This statement is far-fetched. There are presently a number of literatures that have dealt with metabolic programming of OSCC including identification of specific metabolites. Moreover, beyond estimation of OCR, the authors have not conducted any experiments related to metabolism. In the Introduction, significance of this study and how it will extend our understanding of OSCC needs to be elaborated.

      Review of Revised Version:

      Although the authors have partly corrected the manuscript by removing the mislabeling in their Co-IP experiments, my primary concern on the actual functional connotations and direct interaction between PA28y and C1QBP still remains unaddressed. As already mentioned in my previous review, since the core idea of the work is PA28y's direct interaction with C1QBP, stabilizing it, the same should be demonstrated in a more convincing manner.

      My other observation on the detection of C1QBP as a doublet has been addressed by usage of anti-C1QBP Monoclonal antibody against the polyclonal one used before. C1QBP doublets have not been observed in the present case.

      The authors have also worked on the presentation of the background by suitably modifying the statements and incorporating appropriate citations.

      However, the authors are requested to follow the recommendations provided to them by the reviewers to address the major concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tried to determine how PA28g functions in oral squamous cell carcinoma (OSCC) cells. They hypothesized it may act through metabolic reprogramming in the mitochondria.

      Strengths:

      They found that the genes of PA28g and C1QBP are in an overlapping interaction network after an analysis of a genome database. They also found that the two proteins interact in coimmunoprecipitation and pull-down assays using the lysate from OSCC cells with or without expression of the exogenous genes. They used truncated C1QBP proteins to map the interaction site to the N-terminal 167 residues of C1QBP protein. They observed the levels of the two proteins are positively correlated in the cells. They provided evidence for the colocalization of the two proteins in the mitochondria and the effect on mitochondrial form and function in vitro and in vivo OSCC models, and the correlation of the protein expression with the prognosis of cancer patients.

      Weaknesses:

      Many data sets are shown in figures that cannot be understood without more descriptions either in the text or the legend, e.g., Fig. 1A. Similarly, many abbreviations are not defined.

      The revision addressed these issues.

      Some of the pull-down and coimmunoprecipitation data do not support the conclusion about the PA28g-C1QBP interaction. For example, in Appendix Fig. 1B the Flag-C1QBP was detected in the Myc beads pull-down when the protein was expressed in the 293T cells without the Myc-PA28g, suggesting that the pull-down was not due to the interaction of the C1QBP and PA28g proteins. In Appendix Fig. 1C, assume the SFB stands for a biotin tag, then the SFB-PA28g should be detected in the cells expressing this protein after pull-down by streptavidin; however, it was not. The Western blot data in Fig. 1E and many other figures must be quantified before any conclusions about the levels of proteins can be drawn.

      The revision addressed these problems.

      The immunoprecipitation method is flawed as it is described. The antigen (PA28g or C1QBP) should bind to the respective antibody that in turn should binds to Protein G beads. The resulting immunocomplex should end up in the pellet fraction after centrifugation, and analyzed further by Western blot for coprecipitates. However, the method in the Appendix states that the supernatant was used for the Western blot.

      The revision corrected this method.

      To conclude that PA28g stabilizes C1QBP through their physical interaction in the cells, one must show whether a protease inhibitor can substitute PA28q and prevent C1QBP degradation, and also show whether a mutation that disrupt the PA28g-C1QBP interaction can reduce the stability of C1QBP. In Fig. 1F, all cells expressed Myc-PA28g. Therefore, the conclusion that PA28g prevented C1QBP degradation cannot be reached. Instead, since more Myc-PA28g was detected in the cells expressing Flag-C1QBP compared to the cells not expressing this protein, a conclusion would be that the C1QBP stabilized the PA28g. Fig. 1G is a quantification of a Western blot data that should be shown.

      The binding site for PA28g in C1QBP was mapped to the N-terminal 167 residues using truncated proteins. One caveat would be that some truncated proteins did not fold correctly in the absence of the sequence that was removed. Thus, the C-terminal region of the C1QBP with residues 168-283 may still bind to the PA29g in the context of full-length protein. In Fig. 1I, more Flag-C1QBP 1-167 was pull-down by Myc-PA28g than the full-length protein or the Flag-C1QBP 1-213. Why?

      The interaction site in PA28g for C1QBP was not mapped, which prevents further analysis of the interaction. Also, if the interaction domain can be determined, structural modeling of the complex would be feasible using AlphaFold2 or other programs. Then, it is possible to test point mutations that may disrupt the interaction and if so, the functional effect.

      The revision added AlphaFold models for the protein interaction. However, the models were not analyzed and potential mutations that would disrupt the interact were not predicted, made and tested. The revision did not addressed the request for the protease inhibitor.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors have tried to dissect the functions of Proteasome activator 28γ (PA28γ) which is known to activate proteasomal function in an ATP-independent manner. Although there are multiple works that have highlighted the role of this protein in tumours, this study specifically tried to develop a correlation with Complement C1q binding protein (C1QBp) that is associated with immune response and energy homeostasis.

      Strengths:

      The observations of the authors hint that beyond PA28y's association with the proteasome, it might also stabilize certain proteins such as C1QBP which influences energy metabolism.

      Weaknesses:

      The strength of the work also becomes its main drawback. That is, how PA28y stabilizes C1QBP or how C1QBP elicits its pro-tumourigenic role under PA28y OE.<br /> In most of the experiments, the authors have been dependent on the parallel changes in the expression of both the proteins to justify their stabilizing interaction. However, this approach is indirect at best and does not confirm the direct stabilizing effect of this interaction. IP experiments do not indicate direct interaction and have some quality issues. The upregulation of C1QBP might be indirect at best. It is quite possible that PA28y might be degrading some secondary protein/complex that is responsible for C1QBP expression. Since the core idea of the work is PA28y direct interaction with C1QBP stabilizing it, the same should be demonstrated in a more convincing manner.

      Thank you very much for the important comments. Using AlphaFold 3, we found that interaction between PA28γ and C1QBP may depend on amino acids 1-167 and 1-213 (Revised Appendix Figure 1D-H), which was confirmed by our immunoprecipitation (Revised Figure 1I). In the future, we will use nuclear magnetic resonance spectroscopy to analyze protein-protein interaction between PA28γ and C1QBP and demonstrate it by GST pull down in vitro experiments.

      In all of the assays, C1QBP has been detected as doublet. However, the expression pattern of the two bands varies depending on the experiment. In some cases, the upper band is intensely stained and in some the lower bands. Do C1QBP isoforms exist and are they differentially regulated depending on experiment conditions/tissue types?

      Thank you very much for the important comments. We have rechecked the experimental results with two bands, which may have been caused by using polyclonal antibody of C1QBP (Abcam: ab101267). Therefore, we conducted the experiment with monoclonal antibody of C1QBP (Cell Signaling Technology: #6502) and replaced the corresponding images in revised figure (Revised Figure 1E and Revised Appendix Figure 3D).

      Problems with the background of the work: Line 76. This statement is far-fetched. There are presently a number of works of literature that have dealt with the metabolic programming of OSCC including identification of specific metabolites. Moreover, beyond the estimation of OCR, the authors have not conducted any experiments related to metabolism. In the Introduction, the significance of this study and how it will extend our understanding of OSCC needs to be elaborated.

      Thank you very much for the important comments. Based on your suggestion, we have revised the content and updated the references (“Introduction”, Paragraph 2, Line 13-17 and Paragraph 4, Line 5-8). In addition, we plan to conduct experiments to investigate the regulation of metabolism by PA28γ and C1QBP and update our data in the future.

      The modified content is as follows:

      “Current research on metabolic reprogramming in OSCC primarily focused on mechanism of glycolytic metabolism and metabolic shift from glycolysis to oxidative phosphorylation (OXPHOS) of oral squamous cell carcinoma, which lays the groundwork for novel therapeutic interventions to counteract OSCC (Chen et al., 2024; Zhang et al., 2020).”

      “It is the first study to describe the undiscovered role of PA28γ in promoting the malignant progression of OSCC by elevating mitochondrial function, providing new clinical insights for the treatment of OSCC.”

      Reviewer #2 (Public review):

      Summary:

      The authors tried to determine how PA28g functions in oral squamous cell carcinoma (OSCC) cells. They hypothesized it may act through metabolic reprogramming in the mitochondria.

      Strengths:

      They found that the genes of PA28g and C1QBP are in an overlapping interaction network after an analysis of a genome database. They also found that the two proteins interact in coimmunoprecipitation and pull-down assays using the lysate from OSCC cells with or without expression of the exogenous genes. They used truncated C1QBP proteins to map the interaction site to the N-terminal 167 residues of C1QBP protein. They observed the levels of the two proteins are positively correlated in the cells. They provided evidence for the colocalization of the two proteins in the mitochondria, the effect on mitochondrial form and function in vitro and in vivo OSCC models, and the correlation of the protein expression with the prognosis of cancer patients.

      Weaknesses:

      Many data sets are shown in figures that cannot be understood without more descriptions, either in the text or the legend, e.g., Figure 1A. Similarly, many abbreviations are not defined.

      Thank you very much for the important comments. We have revised the descriptions in the legend to make it easier to understand.

      Some of the pull-down and coimmunoprecipitation data do not support the conclusion about the PA28g-C1QBP interaction. For example, in Appendix Figure 1B the Flag-C1QBP was detected in the Myc beads pull-down when the protein was expressed in the 293T cells without the Myc-PA28g, suggesting that the pull-down was not due to the interaction of the C1QBP and PA28g proteins. In Appendix Figure 1C, assume the SFB stands for a biotin tag, then the SFB-PA28g should be detected in the cells expressing this protein after pull-down by streptavidin; however, it was not. The Western blot data in Figure 1E and many other figures must be quantified before any conclusions about the levels of proteins can be drawn.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure (Revised Appendix Figure 1B, C). In addition, we have conducted a quantitative analysis of gray values to confirm the results of western blot data are accurate by Image J software.

      The immunoprecipitation method is flawed as it is described. The antigen (PA28g or C1QBP) should bind to the respective antibody that in turn should binds to Protein G beads. The resulting immunocomplex should end up in the pellet fraction after centrifugation and be analyzed further by Western blot for coprecipitates. However, the method in the Appendix states that the supernatant was used for the Western blot.

      Thank you very much for the careful review. We have corrected it in the revised appendix file (“Supplemental Materials and Methods”, Part“Immunoprecipitation assay”, Line 4-6).

      The modified content is as follows:

      The sample was shaken on a horizontal shaker for 4 h, after which the deposit was collected for western blotting.

      To conclude that PA28g stabilizes C1QBP through their physical interaction in the cells, one must show whether a protease inhibitor can substitute PA28q and prevent C1QBP degradation, and show whether a mutation that disrupts the PA28g-C1QBP interaction can reduce the stability of C1QBP. In Figure 1F, all cells expressed Myc-PA28g. Therefore, the conclusion that PA28g prevented C1QBP degradation cannot be reached. Instead, since more Myc-PA28g was detected in the cells expressing Flag-C1QBP compared to the cells not expressing this protein, a conclusion would be that the C1QBP stabilized the PA28g. Figure 1G is a quantification of Western blot data that should be shown.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure. Compared with the control group, the presence of Myc-PA28γ significantly increased the expression level of Flag-C1QBP (Revised Figure 1F). Gray value analysis showed that in cells transfected with Myc-PA28γ, the decay rate of Flag-C1QBP was significantly slower than that of the control group (Revised Figure 1G), suggesting that PA28γ can delay the protein degradation of C1QBP and stabilize its protein level. This indicates that an increase in the level of PA28γ protein can significantly enhance the expression level of C1QBP protein, while PA28γ can slow down the degradation rate of C1QBP and improve its stability. In addition, we plan to conduct experiments to investigate the effects of protease inhibitors and PA28γ mutants on the stability of C1QBP and update our data in the future.

      The binding site for PA28g in C1QBP was mapped to the N-terminal 167 residues using truncated proteins. One caveat would be that some truncated proteins did not fold correctly in the absence of the sequence that was removed. Thus, the C-terminal region of the C1QBP with residues 168-283 may still bind to the PA29g in the context of full-length protein. In Figure 1I, more Flag-C1QBP 1-167 was pulled down by Myc-PA28g than the full-length protein or the Flag-C1QBP 1-213. Why?

      Thank you very much for the important comments. Immunoprecipitation is a qualitative experiment. Using AlphaFold 3, we found that interaction between PA28γ and C1QBP may depend on amino acids 1-167 and 1-213 (Revised Appendix Figure 1D-H), which was confirmed by our immunoprecipitation (Revised Figure 1I).

      The interaction site in PA28g for C1QBP was not mapped, which prevents further analysis of the interaction. Also, if the interaction domain can be determined, structural modeling of the complex would be feasible using AlphaFold2 or other programs. Then, it is possible to test point mutations that may disrupt the interaction and if so, the functional effect.

      Thank you very much for the important comments. Based on your suggestion, we have added relevant content to the revised appendix figure. (Revised Appendix Figure 1D-H).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) There are a lot of typos in the figure and manuscript that need to be addressed.

      Thank you very much for the important comments. We have corrected the typos in the revised figure and manuscript.

      (2) Figure 1A: The amount of protein that has been immunoprecipitated is more than the actual amount present in the lysate. The authors should calculate the efficiency of the precipitation to support their results.

      Thank you very much for the important comments. Immunoprecipitation is a qualitative experiment. Moreover, it can enrich specific proteins and their binding partners, increase their concentration in the sample, and thus improve the sensitivity of detection.

      (3) Figure 1D: The relative expression levels of C1QBP look similar in almost all cell lines except for HN12. It seems that the relation of PA28y with C1QBP is more of a cell type-specific effect. It would be better if the blots were quantified, and the differences were statistically determined.

      Thank you very much for the important comments. We have conducted a quantitative analysis of gray values to confirm the results of western blot data are accurate by Image J software.

      (4) Figure 1E: How do the authors quantify the expression of the protein in absolute terms? From the methods, it is understood that the flag-tagged construct is stably expressed. Under such conditions, how the authors observed the variable expression of the protein should be elaborated.

      Thank you very much for the important comments. We transfected Flag-PA28γ plasmids at 0ug, 0.5ug, 1ug, and 2ug in 293T cells. After collecting the protein for Western Blot, we found that the protein expression of Flag-PA28γ gradually increased. Moreover, the increased protein expression of C1QBP is consistent with the expression of Flag-PA28γ, which indicated a dose-dependent relationship between the two proteins.

      (5) Figures 1F, G: The data does not correlate with the arguments presented in the text. The authors propose that interaction with PA28y increases the stability of C1QBP. However, the experiment lacks appropriate controls. Ideally, the expression of C1QBP should be tested in the presence and absence of PA28y. Moreover, the observed difference in expression between lanes 1-4 and 5-8 for myc-PA28y needs to be explained. Are the samples from different sources with variable PA28y expression? Figure 1G quantification for C1QBP does not correlate with the figure presented in F since the expression of the protein in the first four lanes is undetectable.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure. Compared with the control group, the presence of Myc-PA28γ significantly increased the expression level of Flag-C1QBP (Revised Figure 1F). Gray value analysis showed that in cells transfected with Myc-PA28γ, the decay rate of Flag-C1QBP was significantly slower than that of the control group (Revised Figure 1G), suggesting that PA28γ can delay the protein degradation of C1QBP and stabilize its protein level. This indicates that an increase in the level of PA28γ protein can significantly enhance the expression level of C1QBP protein, while PA28γ can slow down the degradation rate of C1QBP and improve its stability. In addition, we plan to conduct experiments to investigate the effects of protease inhibitors and PA28γ mutants on the stability of C1QBP and update our data in the future.

      (6) Appendix Figure 1B: Lane 1 does not express Myc-tagged protein but pull-down has been performed using Myc beads. Then how come flag-C1qbp is getting pulled down in lane 1 if there is no PA28y? This indicates a non-specific interaction of C1qbp with the substrata under the experimental conditions used. Similarly, in Figure 1C SFB-PA28y is expressed in both lanes but is reflected only in lane 2 and not in lane 1 even when pull-down is being performed using SFB beads, again reflecting the non-specificity of the interactions shown through immunoprecipitated.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure (Revised Appendix Figure 1B, C).

      (7) Figure 2A: Figure 2A the co-localization of P28y with C1QBP in mitochondria is not very convincing. The authors are urged to provide high-resolution images for the same along with quantification of co-localization coefficients.

      Thank you very much for the important comments. We plan to obtain high-resolution images of co-localization of PA28γ with C1QBP in mitochondria and add the quantification analysis. We will update our data in the future.

      (8) Figure 2C: Mitochondria dynamics is an interplay of multiple factors. From the images, it seems that PA28y OE elevates mitochondria biogenesis in general which is having an umbrella effect on mitochondria fusion/fission and OCR. Images also do not convincingly indicate changes in mitochondrial length. The role of PA28y on mitochondria dynamics requires further justification. However, the presented data does not underline whether the changes in mitochondria behaviour are a consequence of PA28y and C1QBP interaction. Correlating higher mitochondria respiration with ROS generation is a far-fetched conclusion since, at present, there are multiple reports that suggest otherwise.

      Thank you very much for the important comments. We plan to knock out the interaction regions between PA28γ and C1QBP (like amino acids 1-167 and 1-213) to confirm whether PA28γ affects mitochondrial function through C1QBP and update our data in the future.

      (9) Line 157: The presented data does not substantiate the claims made that Pa28y regulates mitochondrial function through C1QBP.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Results”, Part “PA28γ and C1QBP colocalize in mitochondria and affect mitochondrial functions”, Paragraph 3, Line 1-2).

      The modified content is as follows:

      “Collectively, these data suggest that PA28γ, which co-localizes with C1QBP in mitochondria, may involve in regulating mitochondrial morphology and function.”

      (10) Line 159: From the past data it is not very clear how PA28y upregulates C1QBP, hence the statement is not well supported. The presented data indicates the presence of a functional association between the two proteins.

      Thank you very much for the important comments. We detected the expression of C1QBP in two PA28γ-overexpressing OSCC cells (UM1 and 4MOSC2) and found an increase in C1QBP expression (Revised Figure 4B). Based on the results of the protein levels of the mitochondrial respiratory chain complex and other mitochondrial functional proteins, we believe that PA28γ regulates mitochondrial function by upregulating C1QBP.

      (11) Figure 4A, B: Given the mitochondrial role of C1QBP, the lesser levels of mitochondrial proteins upon C1QBP silencing are expected. Does it get phenocopied upon PA28y silencing? Similarly, all the subsequent mitochondrial phenotypes in D should be seen in a PA28y-depleted background.

      Thank you very much for the important comments. We plan to detect the mitochondrial protein expressions and OCRs of PA28γ-silenced OSCC cells. We will update our data in the future.

      (12) Line 198: The presented data do indicate a functional association between these two proteins but it does not provide a solid evidence for the same.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 1, Line 9-10).

      The modified content is as follows:

      “Excitingly, we found the evidence that PA28γ interacts with and stabilizes C1QBP.”

      (13) Line 218-220: In this work, the authors highlight the non-degradome role of PA28y and hence, this fact should be treated appropriately in discussion in line with the presented data.

      Thank you very much for the important comments. Based on your suggestion, we have added relevant content to the revised manuscript (“Discussion”, Paragraph 2, Line 16-19).

      The modified content is as follows:

      “In addition, PA28γ can also play as a non-degradome role on tumor angiogenesis. For example, PA28γ can regulate the activation of NF-κB to promote the secretion of IL-6 and CCL2 in OSCC cells, thus promoting the angiogenesis of endothelial cells ( S. Liu et al., 2018).”

      (14) Line 236-240: Although the authors' statement on organ heterogeneity being the cause for getting the contrasting result is justifiable but here there is no direct evidence of PA28y involvement in regulation of OXPHOS and its impact on cellular metabolism (glycolysis, metabolic signalling, etc).

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 3, Line 7-9).

      The modified content is as follows:

      “Therefore, PA28γ's regulation of OXPHOS may impact cellular energy metabolism.”

      (15) Line 249: No conclusive data supporting this statement.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 5, Line 1-3).

      The modified content is as follows:

      “Furthermore, our study reveals that PA28γ can regulate C1QBP and influence mitochondrial morphology and function by enhancing the expression of OPA1, MFN1, MFN2 and the mitochondrial respiratory complex.”

      Reviewer #2 (Recommendations for the authors):

      (1) The images shown in Figure 2A need to be quantified before the conclusion about the mitochondrial colocalization of the two proteins can be drawn. In Figure 2B and Appendix Figure 2A, the mitochondrial vacuoles and ridge should be indicated for general readers, and quantification should be performed before the conclusion is drawn.

      Thank you very much for the important comments. We will update our data in the future.

      (2) The OCR data from two cell lines are shown in Figure 2E and F. Which is which? The sentence, "The results indicated ... compared to control cells" in lines 130-132, was confusing; perhaps, it would be clear if "were significantly greater" could be deleted.

      Thank you very much for the important comments. We have re-labeled the Figure 2E and F to make it clearly (Revised Figure 2E, F). Based on your suggestion, we have deleted the words in revised manuscript. (“Results”, Part “PA28γ and C1QBP colocalize in mitochondria and affect mitochondrial functions”, Paragraph 1, Line 9-11).

      The modified content is as follows:

      “The results indicated significantly higher basal respiration, maximal OCRs and ATP production in PA28γ-overexpressing cells compared to control cells (Fig. 2G-I and Appendix Fig. 2B-D).”

      (3) Figures 4E-H show the migration, invasive, and proliferation capabilities of the cells. Which for which?

      Thank you very much for the important comments. We have re-labeled the Figure 4F-H to make it clearly (Revised Figure 4F-H).

      (4) In the Discussion, lines 198-201, it states that "C1QBP enhances ... function of OPA1, MNF1, MFN2..." What is the evidence? In lines 222-224, it says that "the binding sites ... may mask the specific ... modification sites". Please justify. In lines 253-254, "fuse" and fuses" are misleading, Did the authors mean "localize" and "localizes"?

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 1, Line 9-13, Paragraph 2, Line 20-23, and Paragraph 5, Line 3-6).

      The modified content is as follows:

      “Excitingly, we found the evidence that PA28γ interacts with and stabilizes C1QBP. We speculate that aberrantly accumulated C1QBP enhances the function of mitochondrial OXPHOS and leads to the production of additional ATP and ROS by activating the expression and function of OPA1, MNF1, MFN2 and mitochondrial respiratory chain complex proteins.”

      “Our study reveals that PA28γ interacts with C1QBP and stabilizes C1QBP at the protein level. Therefore, we speculate that the binding sites of PA28γ and C1QBP may mask the specific post-translational modification sites of C1QBP and inhibit its degradation.”

      “Mitochondrial fusion, crucial for oxidative metabolism and cell proliferation, is regulated by MFN1, MFN2, and OPA1. The first two fuse with the outer mitochondrial membrane, while the last fuses with the inner mitochondrial membrane (Westermann, 2010).”

      (5) Figure 6 was not referred to in the text. In this figure, PA28g and C1QBP are located in the inner membrane and matrix. Has this been determined? What is the blue ovals that are intermediaries of PA28g/C1QBP and OPA1/MFN1/MFN2?

      Thank you very much for the important comments. According to our immunofluorescence assay (Figure 2A), PA28γ is in both the nucleus and cytoplasm. A recent study has demonstrated that PA28γ can shuttle between the nucleus and cytoplasm, participating in various cellular processes. Furthermore, GeneCard information indicates that the subcellular localization of PA28γ includes the nucleus, cytoplasm and mitochondria (Author response image 1). In this article, we mainly focus on the functions of PA28γ and C1QBP located in the cytoplasm. Therefore, figure 6 mainly displays PA28γ and C1QBP in the cytoplasm. Based on your suggestion, we have made some modifications to make it more accurate in revised figure (Revised Figure 6).

      Author response image 1.

    1. eLife Assessment

      This study provides valuable insights into our understanding of the development of the enteric nervous system. The authors use genetically engineered mice to study the behavior of stem cells in organizing the enteric nervous system and the secreted signals that regulate these cells. The study rests on a degree of incomplete evidence since the characterization of some of the mouse resources is not complete in the current version.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Poltavski and colleagues describes the discovery of previously unreported enteric neural crest-derived cells (ENCDC) which are marked by Pax2 and originating from the Placodes. By creating multiple conditional mouse mutants, the authors demonstrate these cells are a distinct population from the previously reported ENCDCs which originate from the Vagal neural crest cells and express Wnt1.

      These Pax2-positive ENCDCs are affected due to the loss of both Ret and Ednrb highlighting that these cells are also ultimately part of the canonical processes governing ENCDC and enteric nervous system (ENS) development. The authors also make explant cultures from the mouse GI tract to detect how Ednrb signaling is important for Ret signaling pathways in these cells and rediscovers the interactions between these 2 pathways. One important observation the authors make is that CGRP-positive neurons in the adult distal colon seem to be primarily derived from these Pax2-positive ENCDCs, which are significantly reduced in the Ednrb mutants, thus highlighting the role of Ednrb in maintaining this neuronal type.

      Comments on latest version:

      Author response: We disagree that the datasets from previous studies provide additional insights that are relevant to the current study. It must be appreciated that Wnt1Cre and Pax2Cre are genetic lineage tracers and that migratory ENS progenitor cells labeled with these reagents do not maintain expression of Wnt1 and Pax2 mRNA or protein. The Wnt1 and Pax2 genes are only transiently expressed within their distinct regions of the ectoderm, and their expression turns off as cells delaminate and begin migration. Thus, Pax2Cre-labeled ENS progenitor cells are not Pax2-positive thereafter. The single cell RNA-Seq studies suggested by the reviewer were collected from older embryos and postnatal mice, and do not represent the E10.5-E11.5 period that accounts for genesis of Ret-mediated and Ednrb-mediated Hirschsprung disease pathology. Even with the most recent work by Zhou et al (Dev Cell, 2024) that included E10.5 cells, this analysis only evaluated neural crest-derived Sox10Cre lineage cells, which does not include the placode-derived Pax2Cre lineage (as we show explicitly in Fig. 2-figure supplement 2). Consequently, it would not be possible to find the "Pax2-positive cells" in these datasets. Performing a new transcriptomic analysis by isolating Pax2Cre-lineage and Wnt1Cre-lineage cells at the appropriate developmental time points could be the basis of future studies, but we think these are beyond the scope of the present paper.

      Reviewer comment: Since these cells are a completely new discovery, additional validation would be beneficial. Whole early GI tract datasets are available, such as human 6-week fetal gut data (PMID: 29802404) and whole mouse embryo studies spanning development that include ENS (PMID: 38355799). If the authors believe that none of these existing datasets can detect these cells in their developmental state and that targeted cell studies with specific Cre drivers would be required, they should make this explicitly clear.

      A key advantage of discovering a new cell type, particularly in the relatively understudied field of ENS, is the opportunity for the broader community to leverage this finding to inform their own research. If these cells are absent from current datasets, even those covering the whole GI tract, this should be clearly communicated.

      I aim to support the authors here. New discoveries in science require robust validation to enhance their impact. The authors have generated an important reagent with great potential for broader use, and addressing these straightforward requests would strengthen the study and make it more valuable to the scientific community.

      Author response: The observation that human mutations in RET and EDNRB both cause Hirschsprung disease is decades old, and of course numerous studies in human, mouse, and cells have addressed the relation between the two signaling pathways. We did not mean to imply that we were the first to discover that Ret and Ednrb signaling pathways interact. The reviewer cites a number of papers all from the Chakravarti lab that address this phenomenon; while these are a valuable contribution to the field, there is still more to be learned. The model elaborated in PMID: 31313802, in which Ret and Ednrb are both enmeshed in a common gene regulatory network, does not readily explain why each has a different phenotypic manifestation and doesn't take into account the importance of the placodal lineage. The main new contributions of our paper are the existence of a new cell lineage that contributes to the ENS, and that the placodal and neural crest lineages utilize Ret and Ednrb signaling differently. The clarification of how these elements are differentially used by the two lineages explains long-segment and short-segment Hirschsprung disease (Ret and Ednrb mutants, respectively) far better than in past studies. The reviewer unfortunately dismisses these insights and seems to feel that a biochemical exploration of one specific component of the signaling interaction (Y1015 phosphorylation) would be more relevant. This should be the basis of future studies and are beyond the scope of the new findings reported in the present paper

      Reviewer comment: The authors completely miss the point. There is no association between phenotypic severity (L-HSCR, S-HSCR, or TCA) and mutations in a given gene in HSCR. EDNRB, for example, has a syndromic association with Waardenburg-Shah syndrome (WS4-A), which includes pigmentation anomalies due to EDNRB expression in neural crest cells that give rise to pigment cells.

      The authors' discovery reinforces the current paradigm that nearly all HSCR is mediated by mutations in genes within the GRN, accounting for 72% of the population attributable risk. This is valuable; reinforcing established paradigms with new data is crucial, and the authors should appreciate the significance of this contribution.

      The discovery of the signaling interaction is particularly important, as it offers a potential explanation for disease severity and provides a basis for classifying patients in future sequencing studies. It is surprising that the authors seem reluctant to highlight this novel finding, as it could greatly benefit future research, including the development of specific mouse mutants and advancing human genetics studies.

      Author response: The reviewer overlooked that one of the review articles that we cited (Chen, Hsu, & Hung, 2020) has a dedicated paragraph for RET (section 3.14), which summarizes the work by Barheri-Yarmand et al (PMID: 25795775) which is the very paper noted by the reviewer in the comment above. The reviewer also somewhat misstated the results of the Barheri-Yarmand et al study. By immunostaining, this paper showed nuclear localization of endogenous Ret, albeit a version of Ret with a disease-associated mutation that makes it constitutively active by constitutive autophosphorylation. Nonetheless, this was endogenous Ret. The paper also used overexpression of GFP-tagged RET in HEK293 cells to show that wildtype RET can behave in a similar manner, at least under these circumstances. Our point is simply that Ret (and other receptor tyrosine kinases) can be found in the nucleus in certain biological contexts, and our observations are consistent with this precedent. The reviewer also suggests a biochemical follow-up analysis related to this observation, which we agree would be of interest. Such an investigation however is beyond the scope of the present study.

      Reviewer comment: As the authors themselves now highlight from the cited paper that any evidence of RET entering the nucleus is of a mutant RET protein, How does this align with their discovery for wildtype protein?

      This finding of nuclear localization of RET is both intriguing and unprecedented. Despite extensive biochemical studies on RET, given its role as an oncogene, this feature has not been identified before. If validated, this discovery could significantly advance the field and improve interpretation of future studies. I reiterate my previous point: a novel finding that challenges the current paradigm requires additional evidence.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Poltavski and colleagues explores the relative contributions of Pax2- and Wnt1- lineage derived cells in the enteric nervous system (ENS) and how they are each affected by disruptions in Ret and Endrb signaling. The current understanding of ENS development in mice is that vagal neural crest progenitors derived from a Wnt1+ lineage migrate into and colonize the developing gut. The sacral neural crest was thought to make a small contribution to the hindgut in addition but recent work has questioned that contribution and shown that the ENS is entirely populated by vagal crest (PMID: 38452824). GDNF-Ret and Endothelin3-Ednrb signaling are both known to be essential for normal ENS development and loss of function mutations are associated with a congenital disorder called Hirschsprung's disease. The transcription factor Pax2 has been studied in CNS and cranial placode development but has not been previously implicated in ENS development. In this work, the authors begin with the unexpected observation that conditional knockout of Ednrb in Pax2-expressing cells causes a similar aganglionosis, growth retardation, and obstructed defecation as conditional knockout of Ednrb in Wnt1-expressing cells. The investigators then use the Pax2 and Wnt1 Cre transgenic lines to lineage-trace ENS derivatives and assess the effects of loss of Ret or Ednrb during embryonic development in these lineages. Finally, they use explants from the corresponding embryos to examine the effects of GDNF on progenitor outgrowth and differentiation.

      Strengths:

      - The manuscript is overall very well illustrated with high resolution images and figures. Extensive data are presented.

      - The identification of Pax2 expression as a lineage marker that distinguishes a subset of cells in the ENS that may be distinct from cells derived from Wnt1+ progenitors is an interesting new observation that challenges current understanding of ENS development

      - Pax2 has not been previously implicated in ENS development - this manuscript does not directly test that role but hints at the possibility

      - Interrogation of two distinct signaling pathways involved in ENS development and their relative effects on the two purported lineages

      Weaknesses:

      - The major challenge with interpreting this work is the use of two transgenic lines, Wnt1-Cre and Pax2-Cre, which are not well characterized in terms of fidelity to native gene expression and recombination efficiency in the ENS. If 100% of cells that express Wnt1 do not express Cre or if the Pax2 transgene is expressed in cells that do not normally express Pax2, then these observations would have very different interpretations and would not support the conclusions made. The two lineages are never compared in the same embryo, which also makes it difficult to assess relative contributions and renders the evidence more circumstantial than definitive.

      - Visualization of the Pax2-Cre and Wnt-1Cre induced recombination in cross-sections at postnatal ages would help with data interpretation. If there is recombination evident in the mesenchyme, this would particularly alter interpretation of Ednrb mutant experiments, since that pathway has been shown to alter gut mesenchyme and ECM, which could indirectly alter ENS colonization.

      - The data on distinct lineages in Fig 3 is somewhat weak and the description in the Results section tends to over-interpretation. For example, "A minimum number (approx. 3%) of CGRP+ neurons were labeled by Wnt1Cre ... which indicates that Wnt1Cre-derived cells have little or no commitment to a mechanosensory fate in the distal colon." The data panel in Fig 3f shows that most of the CGRP-IR cells in Wnt1-Cre-Tomato mice are tdTomato+ though their tdTomato fluorescence is less intense than in neighboring smaller, likely glial cells. This suggests that CGRP+/Tomato+ neurons were likely undercounted. IHC for tdTomato to ensure detection of low levels of Tomato expression and quantification of observations would strengthen the authors' claim. CGRP+ enteric neurons have been visualized and functionally described by several investigators in the field using Wnt1-Cre-GCaMP mice, which also challenges the authors' conclusions. Finally, quantification of CGRP+ enteric neurons by measuring CGRP mucosal fiber immunoreactivity is not accurate because it would reflect both ENS CGRP-expressing neurons and visceral afferents from DRG. Moreover, it is not known if all CGRP+ enteric neurons project to the mucosa or if all mucosal-projecting neurons are mechanosensory. Finally, most of the signal seems to be non-specific background staining in the mucosa and quantification of mucosal signal in this context does not seem meaningful.

      - No consideration of glia - are these derived from both lineages?

      - No discussion of how these observations may fit in with recent work that suggests a mesenchymal contribution of enteric neurons (PMID: 38108810)

      - Phospho-RET staining in Figure 7 is difficult to discern and interpret with high background. Positive and negative controls would strengthen these data.

      Comments on revised version:

      The authors have responded to the weaknesses identified above. Based on my own assessment of the revised manuscript, my assessment is unchanged because the manuscript is largely unchanged.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Poltavski and colleagues describes the discovery of previously unreported enteric neural crestderived cells (ENCDC) which are marked by Pax2 and originating from the Placodes. By creating multiple conditional mouse mutants, the authors demonstrate these cells are a distinct population from the previously reported ENCDCs which originate from the Vagal neural crest cells and express Wnt1.

      These Pax2-positive ENCDCs are affected due to the loss of both Ret and Ednrb highlighting that these cells are also ultimately part of the canonical processes governing ENCDC and enteric nervous system (ENS) development. The authors also make explant cultures from the mouse GI tract to detect how Ednrb signaling is important for Ret signaling pathways in these cells and rediscovers the interactions between these 2 pathways. One important observation the authors make is that CGRP-positive neurons in the adult distal colon seem to be primarily derived from these Pax2-positive ENCDCs, which are significantly reduced in the Ednrb mutants, thus highlighting the role of Ednrb in maintaining this neuronal type.

      I appreciate the amount of work the authors have put into generating the mouse models to detect these cells, but there isn't any new insight on either the nature of ENCDC development or the role of Ret and Ednrb. Also, there are sophisticated single-cell genomics methods to detect rare cell type/states these days and the authors should either employ some of those themselves in these mouse models or look at extensively publicly available single-cell datasets of the developing wildtype and mutant mouse and human ENS to map out the global transcriptional profile of these cells. A more detailed analysis of these Pax2-positive cells would be really helpful to both the ENS community as well as researchers studying gut motility disorders.

      We would like to point out that the reviewer’s comments in both Public Review and in some cases reiterated in Recommendations for the Authors are rooted in several misunderstandings. The reviewer writes “Pax2-positive ENCDCs”, as if the Pax2 lineage (properly, the Pax2Cre-labeled lineage) of the ENS is a subset of neural crest, and states that “there isn’t any new insight” from our study on ENS development. Our conclusion is quite different, that the Pax2Cre lineage (placode-derived) is distinct from the neural crest-derived cell lineage. The reviewer may not have appreciated that our study establishes a fundamental reinterpretation of the very long-standing dogma that the ENS is derived solely from neural crest. We believe that finding and characterizing the unique contribution of an independent cell lineage to the ENS provides critical new perspectives into ENS development and the etiology of Hirschsprung disease. One feature of the Pax2Cre (placodal) lineage is as the source of CGRP-positive mechanosensory neurons in the colon (as the reviewer mentioned), but this is one feature of the larger conceptual discovery of the existence of a separate lineage contribution to the ENS, not the most important observation in and of itself.

      The reviewer continues by saying that we “rediscovered” the interaction between Ednrb and Ret in ENS development. In our study we show that the two lineages (placode-derived and neural crest-derived) employ Ednrb and Ret signaling in distinct ways. This isn’t simply rediscovery, this is new insight. To the extent that both lineages utilize both signaling axes (albeit with mechanistic differences) is a primary reason why the unique placodal lineage contribution to the ENS remained unsuspected until now. We have revised the text to make these points more clear in our revised manuscript.

      The reviewer also suggests single cell genomic methods, which is addressed below in our response to the reviewer’s first recommendation.

      Reviewer #2 (Public Review):

      This manuscript by Poltavski and colleagues explores the relative contributions of Pax2- and Wnt1- lineagederived cells in the enteric nervous system (ENS) and how they are each affected by disruptions in Ret and Endrb signaling. The current understanding of ENS development in mice is that vagal neural crest progenitors derived from a Wnt1+ lineage migrate into and colonize the developing gut. The sacral neural crest was thought to make a small contribution to the hindgut in addition but recent work has questioned that contribution and shown that the ENS is entirely populated by the vagal crest (PMID: 38452824). GDNF-Ret and Endothelin3-Ednrb signaling are both known to be essential for normal ENS development and loss of function mutations are associated with a congenital disorder called Hirschsprung's disease. The transcription factor Pax2 has been studied in CNS and cranial placode development but has not been previously implicated in ENS development. In this work, the authors begin with the unexpected observation that conditional knockout of Ednrb in Pax2-expressing cells causes a similar aganglionosis, growth retardation, and obstructed defecation as conditional knockout of Ednrb in Wnt1-expressing cells. The investigators then use the Pax2 and Wnt1 Cre transgenic lines to lineage-trace ENS derivatives and assess the effects of loss of Ret or Ednrb during embryonic development in these lineages. Finally, they use explants from the corresponding embryos to examine the effects of GDNF on progenitor outgrowth and differentiation.

      Strengths:

      -  The manuscript is overall very well illustrated with high-resolution images and figures. Extensive data are presented.

      -  The identification of Pax2 expression as a lineage marker that distinguishes a subset of cells in the ENS that may be distinct from cells derived from Wnt1+ progenitors is an interesting new observation that challenges the current understanding of ENS development.

      -  Pax2 has not been previously implicated in ENS development - this manuscript does not directly test that role but hints at the possibility.

      -  Interrogation of two distinct signaling pathways involved in ENS development and their relative effects on the two purported lineages.

      The reviewer provided a succinct and accurate summary of our analysis. We correct just the one statement that the ENS is entirely populated by vagal crest. The paper cited by the reviewer (PMID: 38452824) used Wnt1DreERT2 to lineage label the NC population, so of course only looked at neural crest (comparing vagal vs. sacral NC). The advance in our study is to newly document the independent contribution of the placodal lineage.

      Weaknesses:

      -  The major challenge with interpreting this work is the use of two transgenic lines, rather than knock-ins, Wnt1Cre and Pax2-Cre, which are not well characterized in terms of fidelity to native gene expression and recombination efficiency in the ENS. If 100% of cells that express Wnt1 do not express this transgene or if the Pax2 transgene is expressed in cells that do not normally express Pax2, then these observations would have very different interpretations and not support the conclusions made. The two lineages are never compared in the same embryo, which also makes it difficult to assess relative contributions and renders the evidence more circumstantial than definitive.

      We do not agree that the Cre lines being transgenics rather than knock-ins changes the utility of these reagents or the interpretation of the results; there are also potential problems with knock-in alleles. Wnt1Cre has been in use for 25 years as a pan-neural crest lineage cell marker with exceptional efficiency and specificity (including numerous studies of the ENS), so we disagree that it is not well characterized. Pax2Cre of course has not previously been studied in the ENS, but it has been broadly used in other contexts (e.g., craniofacial, kidney). That said, and as noted in our original manuscript, we are aware that an issue of this study is the uniqueness of the recombination domains of the two Cre lines.  As we wrote, Wnt1Cre and Pax2Cre cannot be combined into the same embryo because they are both Cre lines, and we do not have a suitable nonCre recombinase line to substitute for either. Instead, we demonstrate that the two lines recombine in distinct territories of the early embryonic ectoderm, and that the two lineages thus labeled are distinct in marker expression at the initial onset of their delamination, utilize Edn3-Ednrb and GDNF-Ret in distinct ways during their migration to the hindgut, and contribute to different terminal cell fates in the colon. We think this evidence of the distinct nature of the two lineages from start to finish is compelling rather than merely circumstantial.

      -  Visualization of the Pax2-Cre and Wnt-1Cre induced recombination in cross-sections at postnatal ages would help with data interpretation. If there is recombination induced in the mesenchyme, this would particularly alter the interpretation of Ednrb mutant experiments, since that pathway has been shown to alter gut mesenchyme and ECM, which could indirectly alter ENS colonization.

      We have several thoughts about this comment. First, we are uncertain why postnatal analysis would be informative, as ENS colonization occurs (or fails to occur in mutants) during embryogenesis. The reviewer might be thinking of a juvenile stage additional contribution to the ENS, which is addressed below (responses to Recommendations for the Authors) but as we discuss there is not relevant to our analysis. Second, we did examine recombination in the distal hindgut at E12.5 during ENS colonization (Fig. 1f and 1h) and did not see overlap between either Cre recombination domain and Edn3 mRNA expression (which is expressed by the nonENS mesenchyme). Furthermore, Ednrb is not expressed in the gut mesenchyme during ENS colonization (Fig. 7figure supplement 1), thus ectopic mesenchymal Cre expression, if any, by either line would have no impact in Cre/Ednrb mutants. Lastly, the reviewer’s idea could have been a plausible hypothesis at the onset of the project, but here we show positive evidence for a different explanation. We do not rigorously exclude the reviewer’s hypothesis, nor other theoretically possible models, but we think we have provided a strong case to support the direct involvement of Ret and Ednrb in ENS progenitors rather than in surrounding non-neural mesenchyme.

      -  No consideration of glia - are these derived from both lineages?

      To properly address this question would require new reagents and analyses that we have not yet initiated. While an interesting question from a developmental biology standpoint, we don’t think that this investigation would change any of the interpretations that we make in the manuscript.

      -  No discussion of how these observations may fit in with recent work that suggests a mesenchymal contribution of enteric neurons (PMID: 38108810).

      The recent paper cited by the reviewer is very explicit in describing this mesenchymal contribution to the ENS as occurring after postnatal day P11. Other than the terminal Hirschsprung phenotype, all of our analysis of cell lineage migration and fate and colonic aganglionosis was conducted at embryonic or early (P9) postnatal stages. We therefore do not see a relation of our work to this study. In light of this paper, however, we do agree that it would be worthwhile in a future study to explore Wnt1Cre and Pax2Cre lineage dynamics in the ENS of older mice.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should reanalyze multiple single-cell RNA-seq datasets available now, to see if these cells are detected in those studies and then look at the global transcriptional profile of these Pax2-positive cells compared to the other vagal neural crest-derived ENCDCs. Some of these datasets can be found here - PMIDs: 33288908, 37585461, and https://www.gutcellatlas.org/.

      We disagree that the datasets from previous studies provide additional insights that are relevant to the current study. It must be appreciated that Wnt1Cre and Pax2Cre are genetic lineage tracers and that migratory ENS progenitor cells labeled with these reagents do not maintain expression of Wnt1 and Pax2 mRNA or protein. The Wnt1 and Pax2 genes are only transiently expressed within their distinct regions of the ectoderm, and their expression turns off as cells delaminate and begin migration. Thus, Pax2Cre-labeled ENS progenitor cells are not Pax2-positive thereafter. The single cell RNA-Seq studies suggested by the reviewer were collected from older embryos and postnatal mice, and do not represent the E10.5-E11.5 period that accounts for genesis of Ret-mediated and Ednrb-mediated Hirschsprung disease pathology. Even with the most recent work by Zhou et al (Dev Cell, 2024) that included E10.5 cells, this analysis only evaluated neural crest-derived Sox10Cre lineage cells, which does not include the placode-derived Pax2Cre lineage (as we show explicitly in Fig. 2-figure supplement 2).  Consequently, it would not be possible to find the “Pax2-positive cells” in these datasets. Performing a new transcriptomic analysis by isolating Pax2Cre-lineage and Wnt1Cre-lineage cells at the appropriate developmental time points could be the basis of future studies, but we think these are beyond the scope of the present paper. 

      (2) Even in their current quantification method of using immunofluorescent cells in a microscopic field, the authors count very few cells. The quantification in Figures 2v-2z is only from 4 embryos and is in the hundreds. This leads to misrepresentation of cell numbers and is best reflected in Figure 2x, where Wnt1Cre/Ret GI tracts have 0 Ret +ve cells, which we now know is not true even in ubiquitous Ret null embryos, where Ret null cells are detected as late as E14.5 (PMID 37585461)

      Because of the reviewer’s comment, we recognize that the specific detail about cell numbers wasn’t properly written. We didn’t count a few hundred cells total, it was a few hundred cells per embryo. Exact numbers are provided in the revised figure legend where “cells/embryo” is now explicitly stated. Multiplied by the number of embryos, this means that we evaluated approx. 1000 total cells per genotype and time point in cases where Ret+ and/or GFP+ (lineage+) cells were found. The total absence of such cells in Wnt1Cre/Ret mutants is a rigorous conclusion. Our results do not misrepresent nor contradict the study by Vincent et al (PMID 37585461). Our analyses were performed on gut tissue isolated at E10.5 and E11.5 stages, which is long before Schwann cell precursors (SCPs, the primary focus of the Vincent et al study) colonize the gut (E14.5; Uesaka et al, 2015. PMID: 26156989). Indeed, as the reviewer notes, SCPs migrate into the gut in a Retindependent manner. For being at a much earlier time point, our focus is on the cranial ectoderm sources of ENS progenitors. We have adjusted the text associated with Fig. 2 to make this more clear.

      (3) There are multiple sections in the manuscript that rehash already known facts, like the whole section about Wnt1 conditional Ret null mice which show failure of migration of ENCDCs. This has been shown multiple times and doesn't add anything to the author's story.

      We think this comment stems from the reviewer’s perception that the Pax2Cre lineage is a subset of neural crest. The Wnt1Cre data (including Ret-deficient and Ednrb-deficient embryos) presented in the manuscript are not intended to rehash what is already known but to establish important similarities and differences between the newly identified placode-derived and the well-established neural crest-derived ENS progenitor cells. In light of the reviewer’s suggestion #8 below, to move the Wnt1Cre lineage analysis to a supplement, this information remains in the main text to provide proper comparison to the Pax2Cre-lineage profile. We think we were fair in the text to the legacy of work on neural crest and ENS development and were explicit in using our Wnt1Cre analysis to compare to the Pax2Cre lineage. Finally, we point out that our analysis was conducted on a different genetic background (outbred ICR) compared to previous studies, and there are strain-specific differences in Hirschsprung-associated lethality between our background and previous studies, so it was not impossible that the behavior of the neural crest cell lineage in the ICR background could be different from past observations on different backgrounds. Although we did not identify any major differences, it is important that the information on NC behavior in this background be presented. 

      (4) Also, the conclusion drawn for Figure 5C "this indicates that the Wnt1Cre-derived cells do not harbor a cellautonomous response to GDNF" seems to suggest the authors are not very well versed with the ENS literature. GDNF as well as EDN3 are expressed from surrounding mesenchyme and are cell non-autonomous.

      The reviewer seems to have misread or misunderstood the specific statement as well as the more important broader conclusion of the experiment. First, of course the source of GDNF ligand in vivo is the mesenchyme. The explant assay was designed to eliminate this and then to substitute GDNF as provided experimentally. The focus of the experiment was to address the response to GDNF, not the source of GDNF. But more importantly, the experiment revealed a surprising outcome that the reviewer did not appreciate. In Pax2Cre/Ret mutants, the Wnt1Cre lineage still expresses Ret, yet does not grow out from the gut explant when provided with GDNF. This shows that the neural crest lineage requires Ret function in placode-derived cells in order to respond to GDNF. In other words, despite expressing Ret, the NC lineage does not harbor a cellautonomous response to GDNF, as we wrote. Because this might be confusing to some readers, we have revised the description of this analysis to hopefully be more clear.

      (5) The fact that Ret and Ednrb signaling pathways interact is not a novel finding and has been reported multiple times in Ret and Ednrb mutant mice and cell lines (PMID: 12355085, 12574515 , 27693352, 31818953), potentially through shared transcription factors (PMID:31313802).It would have been more relevant if the authors could show how the specific tyrosine residue (Y 1015) in Ret is phosphorylated in the presence of Ednrb.

      The observation that human mutations in RET and EDNRB both cause Hirschsprung disease is decades old, and of course numerous studies in human, mouse, and cells have addressed the relation between the two signaling pathways. We did not mean to imply that we were the first to discover that Ret and Ednrb signaling pathways interact. The reviewer cites a number of papers all from the Chakravarti lab that address this phenomenon; while these are a valuable contribution to the field, there is still more to be learned. The model elaborated in PMID: 31313802, in which Ret and Ednrb are both enmeshed in a common gene regulatory network, does not readily explain why each has a different phenotypic manifestation and doesn’t take into account the importance of the placodal lineage. The main new contributions of our paper are the existence of a new cell lineage that contributes to the ENS, and that the placodal and neural crest lineages utilize Ret and Ednrb signaling differently. The clarification of how these elements are differentially used by the two lineages explains long-segment and short-segment Hirschsprung disease (Ret and Ednrb mutants, respectively) far better than in past studies. The reviewer unfortunately dismisses these insights and seems to feel that a biochemical exploration of one specific component of the signaling interaction (Y1015 phosphorylation) would be more relevant. This should be the basis of future studies and are beyond the scope of the new findings reported in the present paper. 

      (6) What is the mechanism of the presence of Y1015 phosphorylation in 33% of Ednrb deficient Pax2Cre cells? It appears to me what the authors report as absent phosphorylation in the 67% of cells could be just weak staining or cells missing in prep.

      The reviewer, referring to Fig. 7q, presumably meant to say Wnt1Cre rather than Pax2Cre. The reviewer overlooked that we provided an explanation for this observation in our original manuscript. This sentence reads “Because Ednrb is expressed only in a subset of Wnt1Cre-derived enteric progenitor cells (Figure 7 – figure supplement 1), the residual Y1015 phosphorylation observed in Wnt1Cre/Ednrb mutant cells is likely to occur in the Ednrb-negative Wnt1Cre-derived cell population”. The sentence is retained unchanged in the revised manuscript. The explanation is not because of weak staining or problems with tissue preparation.

      (7) The references the authors cite regarding the previous discovery of Ret expression in the nucleus are incorrect. The review articles the authors cite do not mention anything about Ret expression in the nucleus. The evidence of nuclear localization of Ret previously comes from overexpression studies in HEK293 cells (PMID: 25795775). Such overexpression studies are fraught with generating noisy data for well-documented reasons. But if this observation is correct, the authors miss a great opportunity to identify what the Ret protein is doing in the nucleus. Is it in direct contact with its known transcription factors like Sox10 and Rarb? This would shed a lot of light on the possible mechanism of Ret LoF observed in Ret mutant mice

      The reviewer overlooked that the one of the review articles that we cited (Chen, Hsu, & Hung, 2020) has a dedicated paragraph for RET (section 3.14), which summarizes the work by Barheri-Yarmand et al (PMID: 25795775) which is the very paper noted by the reviewer in the comment above. The reviewer also somewhat misstated the results of the Barheri-Yarmand et al study. By immunostaining, this paper showed nuclear localization of endogenous Ret, albeit a version of Ret with a disease-associated mutation that makes it constitutively active by constitutive autophosphorylation. Nonetheless, this was endogenous Ret. The paper also used overexpression of GFP-tagged RET in HEK293 cells to show that wildtype RET can behave in a similar manner, at least under these circumstances. Our point is simply that Ret (and other receptor tyrosine kinases) can be found in the nucleus in certain biological contexts, and our observations are consistent with this precedent.

      The reviewer also suggests a biochemical follow-up analysis related to this observation, which we agree would be of interest. Such an investigation however is beyond the scope of the present study.

      (8) The manuscript could benefit from a major rewrite by reorganizing sections to make it easy for the readers to follow the narrative.

      Many sections about the role of Ret and Ednrb in Wnt1cre-derived ENCDCs can be moved to a supplement. These facts are well-documented and have been proven before.

      This was addressed in our response to comment #3 of this reviewer. The figures have been kept as main figures in the revised manuscript to allow side-by-side comparison to parallel analysis of the Pax2Cre lineage.

      - The observation that only a handful of Pax2Cre cells at E10.5 express Ret and the observation that conditional Ret null abrogates these cells at E11.5, are not presented together and makes connecting these two facts difficult.

      Ret expression at E10.5 and E11.5 are both shown in the same figure (Fig. 2). In the presentation of these results, we first describe in normal development that Ret is expressed differently in E10.5 ENS progenitors between the Pax2Cre and Wnt1Cre lineages. This is additional support for the argument that the two lineages are molecularly distinct. Then comes evaluation of postnatal fates with different markers before we return to embryonic Ret expression. We acknowledge that this can make it difficult to connect these observations. We decided to retain the original organization in order to not lose this important conclusion. However, we have revised the text to hopefully make this connection between the sections more congruent.

      Reviewer #2 (Recommendations For The Authors):

      - The labeling of some as "figure supplements" is really hard to follow in the text and confusing to interpret when a main figure or supplemental figure is being referenced, and which one.

      We understand this comment, but this is journal style and outside of our control. We have kept the journal format in the revised manuscript.

      - The data in Figures 3b-c is well established in the field and somewhat misinterpreted. NOS1 neurons in the mouse ENS and their projections have been well described (Sang and Young, 1996, and other studies). CGRP immunoreactivity would reflect both ENS CGRP-expressing neurons and visceral afferents from DRG.

      There of course is a history of analysis of NOS1, CGRP, and other markers in the ENS. The focus of the analysis in Fig. 3 is to demonstrate how the cells that express these markers are impacted by gene manipulation in the Wnt1Cre and Pax2Cre lineages. For the giant migrating contractions that are associated with defecation, ample past electrophysiological studies have established that mechanosensory CGRP+ neurons trigger NOS+ inhibitory neurons (and ACh+ excitatory neurons) of the myenteric plexus to propel colonic contents. Thus, these are the relevant markers to explain the lack of colonic peristalsis in Ednrb-deficient mice. To our awareness, our results with NOS1 do not contradict any past study, including the Sang and Young 1996 description. Regarding CGRP, indeed the reviewer is correct that this marker is expressed by both neuronal subtypes. Two arguments support the specific derivation of ENS mechanosensory neurons from the Pax2 lineage. First, the ENS and DRG neurons can be distinguished by the location of their cell bodies and their axon extensions in the gut wall; only the ENS neurons are deficient in Pax2Cre/Ednrb mutants (as documented in Fig. 3). Second, the DRG population is derived from neural crest and is not labeled by Pax2Cre. If this population of CGRP+ neurons had functional relevance to colonic peristalsis, this would not be altered in Pax2Cre/Ednrb mutants. Indeed, the CGRP+ afferent nerve endings of DRG origin in the distal colon are mechanical distension sensors but do not modulate either ENS or autonomic nervous system activity (PMID: 37541195). We believe that our interpretation is correct.

      - The evidence in Figure 3 supporting the claim that NOS1 and CGRP-expressing enteric neurons come from distinct lineages is weak. IHC for CGRP is notoriously poor at labeling soma in the ENS. IHC for tdTomato to ensure the detection of low levels of Tomato expression and quantification of observations would strengthen this claim.

      CGRP is a vesicular peptide which is stored and transported in vesicles, therefore the antibody against CGRP labels vesicular particles of soma and synaptic vesicles along the axons of those CGRP-producing neurons.

      It is not expected to label the entire cytoplasm (or the range of subcellular organelles) as NOS antibody does. We did included quantification of data in Figure 3-figure supplement 1 in the manuscript to support the claim of lineage derivation. As described in the Methods section of the manuscript, we used binary threshold selection for Tomato+ cell count using Fiji-Image J, which detects both TomatoHigh and TomatoLow cells as Tomato+; we feel this is equal to or even superior to IHC for this analysis. 

      - IHC panels in Figures 3h-o are largely uninterpretable. Most of the signal seems to be non-specific background staining in the mucosa and quantification of mucosal signal in this context does not seem meaningful.  

      We disagree with the reviewer’s comment. As described in the response above, CGRP+ mechanosensory neurons send their peripheral axon projections to innervate mucosa (sensory epithelial cells), and NOS+ inhibitory motor axons innervate the circular muscle. Thus, panels h-o of Fig. 3 focus on the axonal profile and are not intended to visualize soma, which is why sagittal views are presented instead of flatmount views. All of the controls were performed side-by-side to confirm that the signal is real and interpretable.

      Note also that the colon does not have villi so this annotation should be revised.

      We appreciate that the reviewer brought this misstatement to our attention. We corrected this error in the revised manuscript.

      - Phospho-RET staining in Figure 7 is difficult to discern and interpret with high background. Positive and negative controls would strengthen these data.

      Fig. 7 shows phospho Ret-Y1015 staining in lineage-labeled Wnt1Cre/Ednrb/R26nTnG mutants. The strength of the signal to noise in the figure is a matter of Ret expression level and the quality of the anti-pY1015 antibody. We are not aware of a meaningful positive control that has been validated in the literature that we could use for comparison. The ideal negative control would be to perform the same analysis in Wnt1Cre/Ret/R26nTnG mutants, but because this manipulation eliminates the entire NC cell lineage from the colon, there would be no NC cells in which to visualize background staining in this lineage with this antibody when Ret protein is not present. We note that anti-pY1096 did not show a difference in staining between control and mutant, which supports the interpretation of a specific impact on pY1015. We also point out here, as in the text, that we do not yet have any validation that phosphorylation of Y1015 is functionally important in NC migration to the distal colon. Clearly, more work to address this role and to demonstrate the mechanism of phosphorylation of this specific residue in response to Edn3-Ednrb signaling will be needed.

    1. eLife Assessment

      This fundamental study addresses the regulation of the MICAL-family of actin regulators by Rab GTPases, which play a key role in directing membrane trafficking within cells. The compelling evidence explains how Rab8 family members bind at two sites to allosterically regulate MICAL1, and relieve an auto-inhibited state unable to bind actin. This study lays the basis for further progress in understanding membrane trafficking and cytoskeleton dynamics in eukaryotic cells.

    2. Reviewer #1 (Public review):

      The manuscript describes comprehensive structure-function studies combining structural studies, Alphafold2-based modelling, and extensive structural validation by mutagenesis and biochemical experiments. Consequently, a sophisticated activation mechanism of Mical1 as a representative of the MICAL family is elucidated at the molecular level. Since MICAL proteins are important regulators of membrane trafficking and cytoskeleton dynamics, the study is of high relevance for many groups. Structural data are of high quality, the modelling data appear to be sound, and the subsequent biochemical analyses are carried out in great detail, yielding a complete story. I have little to criticize on this beautiful work.

    3. Reviewer #2 (Public review):

      Summary:

      Rai and coworkers have studied the regulation of the MICAL-family of actin regulators by Rab 8 family GTPases. Their work uses a combination of structural biology, biochemistry, and modelling approaches to identify the regions and specific residues interacting with Rabs and understand the consequences of MICAL1 regulation. The study extends previous work on individual domains by incorporating analysis of the full-length MICAL1 protein and provides compelling evidence for allosteric regulation by Rab binding to two low and high-affinity regulatory sites.

      Strengths:

      Excellent biochemical and structural analysis.

      Weaknesses:

      Additional data to test the model for Rab regulation of MICAL1 in the actin-pelleting assay would enhance the study.

    1. eLife Assessment

      This study offers valuable insights into the molecular mechanisms by which Osx influences osteocyte function, particularly through its regulation of Cx43. However, the evidence supporting the authors' claims is incomplete, necessitating additional experimental data and further investigation to fully substantiate these findings. While this study presents a new perspective on the complex role of Osx in bone biology, it also raises significant questions about the intricacies of its regulatory network.

    2. Reviewer #1 (Public review):

      The manuscript "Osterix Facilitates Osteocytic Communication by Targeting Connexin43" investigates the role of Osterix (Osx) in osteocytes using a Col1α1-CreER;Osxfl/fl mouse model and cultured cells. The study reveals that Osx is expressed in osteocytes, and its deletion in vitro leads to a significant reduction in osteocyte dendrite formation, highlighting its critical role in maintaining cellular communication. Through ChIP-seq analysis, the authors identified Connexin43 (Cx43) as a direct downstream target of Osx. Moreover, treatment with all-trans retinoic acid (ATRA), a known agonist of Cx43, was able to rescue the dendritic network in osteocytes, restoring their communication capabilities in vitro.

      This research provides valuable insights into the molecular mechanisms by which Osx influences osteocyte function, particularly through its regulation of Cx43. However, despite these findings, the study does not fully elucidate all the mechanisms involved in Osx-mediated osteocytic communication. Several conclusions, particularly those related to the broader signaling pathways, require additional experimental evidence and further investigation to be fully substantiated. This study provides a new aspect in understanding the complex role of Osx in bone biology but leaves open questions regarding the intricacies of its regulatory network.

      Major Comments:

      (1) In the Col1a1-CreER;tdTomato mice, the number of tdTomato+ cells in the cortical bone appears lower compared to Osx+ cells. The overlap between tdTomato+ and Osx+ cells in Figure 1 is limited. Could this affect the knockout efficiency? Can the authors provide data on Osx knockout efficiency in vivo? While immunostaining of Osx is shown in both control and mutant mice in Figure 2A, the Osx expression pattern differs from Figure 1A. Osx expression is relatively low in the bone marrow in Figure 1A, but much stronger in Figure 2A.

      Additionally, Osx+ cells in Figure 1A seem confined to the bone surface, whereas Figure 2A shows a broader distribution. What developmental stage of mice was used in Figure 1? Could the authors also provide co-staining with other osteocyte markers alongside Osx?

      (2) The authors mentioned using both siRNA and Lenti-Osx to modulate Osx expression. What was the specific purpose of these experiments? If the authors aim to demonstrate that Osx plays a critical role in osteocytes, they should provide data on downstream targets or markers relevant to osteocyte function. Additionally, did these treatments affect processes like differentiation or cell viability in osteocytes? The current results only demonstrate that siRNA and Lenti-Osx can successfully modulate Osx expression in vitro, but further evidence is needed to support broader functional conclusions.

      (3) Osx knockout mice exhibited a decreased osteocyte dendritic network both in vivo and in vitro. To better understand how this affects overall bone health, could the authors provide additional parameters, such as bone thickness, bone strength, and other relevant metrics? Furthermore, to determine whether these phenotypes are primarily due to defects in the osteocyte dendritic network or a reduction in osteocyte numbers, the authors should also assess the number of osteocytes in the knockout mice Figure 2.

      (4) Regarding the Lucifer Yellow Dye Transfer Assay in Figure 3, the authors should provide data on cell density and cell viability for both control and mutant groups. Additionally, although less dye is observed in the mutant group, the migration distance appears comparable to the control group. Could the authors explain this result? Furthermore, how was transmission speed between the groups evaluated in Figure 3D? More details on the method used to assess transmission speed would be helpful.

      (5) For a more comprehensive and unbiased analysis of Osx function in osteocytes, the authors should present a full analysis of differentially expressed genes, rather than focusing solely on integrins. Additionally, it would be beneficial to include an analysis of the knockdown group alongside the other groups, considering the animal model used in this study involves knockout mice.

      (6) In the immunofluorescence staining of integrin αvβ1 in the si-Osx and Lenti-Osx groups, the cellular localization of integrin αvβ1 appears altered. Unlike the control group, where it is mainly localized in the cytoplasm, positive signals are observed in the nucleus of the si-Osx and Lenti-Osx groups. Additionally, since integrin αvβ1 is a membrane protein, shouldn't it primarily be observed on the cell membrane rather than in the cytoplasm? Could the authors clarify this observation?

      (7) The results regarding Cx43 expression after Lenti-Osx treatment are questionable. It appears that the images for the Lenti-GFP and Lenti-Osx groups have been misrepresented. The merged images for the Lenti-GFP control group seem to belong to the Lenti-Osx group, and vice versa. If the images were presented in their correct order, the conclusions would contradict the authors' claims. This issue needs to be addressed to ensure an accurate interpretation of the data.

      (8) The authors demonstrated that ATRA treatment elevates Cx43 protein levels in the control group, where Osx function is normal. However, can ATRA also restore Cx43 protein levels in the si-Osx treated group, where Osx transcriptional function is impaired? Theoretically, Cx43 protein levels should not be restored in the si-Osx group. Could the observed rescue phenotype be due to effects downstream of Cx43? This possibility should be considered and clarified.

      (9) Does the Cx43 mutation of knockout cause similar phenotypes in the animal model? Can restoration of Cx43 rescue the bone phenotype?

    3. Reviewer #2 (Public review):

      This study shows that Osx plays a pivotal role in the dendritic network and intercellular communication of Col1α1-positive osteocytes via targeting Connexin43 (Cx43). It provides solid evidence to broaden our understanding of Osx's roles during bone homeostasis. This work will be of interest to investigators studying bone diseases involving osteocytes, such as delayed fracture healing or osteoporosis.

      Comments:

      (1) In Figure 1, it appears that the Osx- and Col1α1-positive cells may not be exclusively expressed by osteocytes. Possibly periosteum cells and osteoblasts are also included. This could potentially impact the interpretation of results. The authors should provide a clearer analysis to distinguish the cell types precisely.

      (2) Jialiang S. Wang et al. (Nat Commun. 2021 Nov 1;12(1):6274.) have previously reported on the direct role of Osx in osteocytes. In light of this prior research, it is essential for the authors to thoroughly discuss how this study differs from previous findings.

      (3) In the methods section, it is crucial to provide detailed information about the manufacturer and country of origin of reagents, like ATRA.

      (4) The morphology of osteocytes in cortical bone can vary between the metaphysis site and the middle shaft site of long bones. For SEM data of osteocytes in Figure 2, it is necessary to address this issue. The authors should clarify whether morphological difference was observed between these sites and, if so, how these differences might impact the interpretation of the data.

      (5) In the bone research field, two different Col1α1 - CreER mice were used. The authors should specify which type of Col1α1 - CreER mice were utilized in this research.

      (6) A more detailed description of the statistical method used in Figure 2G - I is required, particularly with regard to quantifying the number of osteocyte dendritic processes.

      (7) In Figure 6C and Figure 6D, while the legend indicates N = 3, there are five data points presented in the statistical graph.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigated the expression of Osterix (Osx) not only in osteoblasts but also significantly in osteocytes. Through Osx knockout, the osteocytic dendritic network was damaged, leading to communication disruption. This study investigated the regulatory role of Osx on osteoblast dendrites through Cx43.

      Strengths:

      This paper provides a good explanation of the role of Osx in osteocyte synapse and cell communication, enriching the understanding of Osx's functional significance. The results of the experiment support the conclusions of the study. This is an interesting study with a clear logical structure.

      Weaknesses:

      Some experimental results need to be supplemented, and there are still some details and errors in the text that need to be revised.

    1. eLife Assessment

      This work presents an important method for depleting ribosomal RNA from bacterial single-cell RNA sequencing libraries, enabling the study of cellular heterogeneity within microbial biofilms. The approach convincingly identifies a small subpopulation of cells at the biofilm's base with upregulated PdeI expression, offering invaluable insights into the biology of bacterial biofilms and the formation of persister cells. Further integrated analysis of gene interactions within these datasets could deepen our understanding of biofilm dynamics and resilience.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community.

      Strengths:

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single cell RNA-seq.

    3. Reviewer #2 (Public review):

      Summary:

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment.

      Strengths:

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. This finding highlights the potentially complex role of PdeI in regulation of c-di-GMP levels and persister formation in microbial biofilms.

      Weaknesses:

      Given many current methods that also introduce different techniques for ribosomal RNA depletion in bacterial single-cell RNA sequencing, it is unclear what is the place and role of RiboD-PETRI. The efficiency of rRNA depletion varies greatly between species for the majority of the available methods, so it is not easy to select the best fitting technique for a specific application.

      Despite transcriptome-wide coverage, the authors focused on the role of a single heterogeneously expressed gene, PdeI. A more integrated analysis of multiple genes and\or interactions between them using these data could reveal more insights into the biofilm biology.

      The authors should also present the UMIs capture metrics for RiboD-PETRI method for all cells passing initial quality filter (>=15 UMIs/cell) both in the text and in the figures. Selection of the top few cells with higher UMI count may introduce biological biases in the analysis (the top 5% of cells could represent a distinct subpopulation with very high gene expression due to a biological process). For single-cell RNA sequencing, showing the statistics for a 'top' group of cells creates confusion and inflates the perceived resolution, especially when used to compare to other methods (e.g. the parent method PETRI-seq itself).

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      The work introduces a valuable new method for depleting the ribosomal RNA from bacterial single-cell RNA sequencing libraries and shows that this method is applicable to studying the heterogeneity in microbial biofilms. The evidence for a small subpopulation of cells at the bottom of the biofilm which upregulates PdeI expression is solid. However, more investigation into the unresolved functional relationship between PdeI and c-di-GMP levels with the help of other genes co-expressed in the same cluster would have made the conclusions more significant. 

      Many thanks for eLife’s assessment of our manuscript and the constructive feedback. We are encouraged by the recognition of our bacterial single-cell RNA-seq methodology as valuable and its efficacy in studying bacterial population heterogeneity. We appreciate the suggestion for additional investigation into the functional relationship between PdeI and c-di-GMP levels. We concur that such an exploration could substantially enhance the impact of our conclusions. To address this, we have implemented the following revisions: We have expanded our data analysis to identify and characterize genes co-expressed with PdeI within the same cellular cluster (Fig. 3F, G, Response Fig. 10); We conducted additional experiments to validate the functional relationships between PdeI and c-di-GMP, followed by detailed phenotypic analyses (Response Fig. 9B). Our analysis reveals that while other marker genes in this cluster are co-expressed, they do not significantly impact biofilm formation or directly relate to c-di-GMP or PdeI. We believe these revisions have substantially enhanced the comprehensiveness and context of our manuscript, thereby reinforcing the significance of our discoveries related to microbial biofilms. The expanded investigation provides a more thorough understanding of the PdeI-associated subpopulation and its role in biofilm formation, addressing the concerns raised in the initial assessment.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single-cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community. 

      Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details, which are constrained by the requirements of the Short Report format in eLife. We have addressed these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We have provided a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step (lines 435-453) and data analysis pipeline (lines 471-528). This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We have elaborated on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion (lines 99-120), and the methods employed for identifying and characterizing subpopulations (lines 155-159, 161-163 and 163-167).

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we have included additional supplementary information. This will cover extended methodology (lines 311-318, 321-323, 327-340, 450-453, 533, and 578-589), detailed statistical analyses (lines 492-493, 499-501 and 509-528), and comprehensive data tables to support our findings.

      We believe these changes significantly improved the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      Summary: 

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment. 

      Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). 

      There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI is predicted to function as a phosphodiesterase involved in c-di-GMP degradation, based on sequence analysis demonstrating the presence of an intact EAL domain, which is known for this function. However, it is important to note that PdeI also harbors a divergent GGDEF domain, typically associated with c-di-GMP synthesis. This dual-domain structure indicates that PdeI may play complex regulatory roles. Previous studies have shown that knocking out the major phosphodiesterase PdeH in E. coli results in the accumulation of c-di-GMP. Moreover, introducing a point mutation (G412S) in PdeI's divergent GGDEF domain within this PdeH knockout background led to decreased c-di-GMP levels2. This finding implies that the wild-type GGDEF domain in PdeI contributes to maintaining or increasing cellular c-di-GMP levels.

      Importantly, our single-cell experiments demonstrated a positive correlation between PdeI expression levels and c-di-GMP levels (Figure 4D). In this revision, we also constructed a PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite an increase in BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Figure 4D). This experimental evidence, coupled with domain analyses, suggests that PdeI may also contribute to c-di-GMP synthesis, rebutting the notion that it acts solely as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that the overexpression of PdeI, induced by arabinose, resulted in increased c-di-GMP levels (Fig. 4E) . These findings strongly suggest that PdeI plays a pivotal role in upregulating c-di-GMP levels.

      Our further analysis indicated that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results showing that PdeI is a membrane-associated protein, we hypothesize that PdeI acts as a sensor, integrating environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. Upon careful analysis, we have determined that the other marker genes in this cluster do not significantly impact biofilm formation, nor have we identified any direct relationship between these genes, c-di-GMP, or PdeI. Our focus on PdeI within this cluster is justified by its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While other genes in this cluster may be co-expressed, their functions appear unrelated to the PdeI-c-di-GMP pathway we are investigating. Therefore, we opted not to elaborate on these genes in our main discussion, as they do not contribute directly to our understanding of the PdeI-c-di-GMP association. However, we can include a brief mention of these genes in the manuscript, indicating their lack of relevance to the PdeI-c-di-GMP pathway. This addition will provide a more comprehensive view of the cluster's composition while maintaining our focus on the key findings related to PdeI and c-di-GMP.

      We have also included the aforementioned explanations and supporting experimental data within the manuscript to clarify this important point (lines 193-217). Thank you for highlighting this apparent contradiction, allowing us to provide a more detailed explanation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I found the main text of the manuscript well written and easy to understand, though too compressed in parts to fully understand the details of the work presented, some examples are outlined below. The materials and methods appeared to be less carefully compiled and could use some careful proof-reading for spelling (e.g. repeated use of "minuts" for minutes, "datas" for data) and grammar and sentence fragments (e.g. "For exponential period E. coli data." Line 333). In general, the meaning is still clear enough to be understood. I also was unable to find figure captions for the supplementary figures, making these difficult to understand. 

      We appreciate your careful review, which has helped us improve the clarity and quality of our manuscript. We acknowledge that some parts of the main text may have been overly compressed due to Short Report format in eLife. We have thoroughly reviewed the manuscript and expanded on key areas to provide more comprehensive explanations. We have carefully revised the Materials and Methods section to address the following: Corrected all spelling and grammatical error, including "minuts" to "minutes" and "datas" to "data". Corrected grammatical issues and sentence fragments throughout the section. We sincerely apologize for the omission of captions for the supplementary figures. We have now added detailed captions for all supplementary figures to ensure they are easily understandable. We believe these revisions address your concerns and enhance the overall readability and comprehension of our work.

      General comments: 

      (1) To evaluate the performance of RiboD-PETRI, it would be helpful to have more details in general, particularly to do with the development of the sequencing protocol and the statistics shown. Some examples: How many reads were sequenced in each experiment? Of these, how many are mapped to the bacterial genome? How many reads were recovered per cell? Have the authors performed some kind of subsampling analysis to determine if their sequencing has saturated the detection of expressed genes? The authors show e.g. correlations between classic PETRI-seq and RiboD-PETRI for E. coli in Figure 1, but also have similar data for C. crescentus and S. aureus - do these data behave similarly? These are just a few examples, but I'm sure the authors have asked themselves many similar questions while developing this project; more details, hard numbers, and comparisons would be very much appreciated. 

      Thank you for your valuable feedback. To address your concerns, we have added a table in the supplementary material that clarifies the details of sequencing.

      The correlation values of PETRI-seq and RiboD-PETRI data in C. crescentus are relatively good. However, the correlation values between PETRI-seq and RiboD-PETRI data in SA data are relatively less high. The reason is that the sequencing depths of RiboD-PETRI and PETRI-seq are different, resulting in much higher gene expression in the RiboD-PETRI sequencing results than in PETRI-seq, and the calculated correlation coefficient is only about 0.47. This indicates that there is some positive correlation between the two sets of data, but it is not particularly strong. This indicates that there is a certain positive correlation between these two sets of data, but it is not particularly strong. However, we have counted the expression of 2763 genes in total, and even though the calculated correlation coefficient is relatively low, it still shows that there is some consistency between the two groups of samples.

      Author response image 1.

      Assessment of the effect of rRNA depletion on transcriptional profiles of (A) C. crescentus (CC) and (B) S. aureus (SA) . The Pearson correlation coefficient (r) of UMI counts per gene (log2 UMIs) between RiboD-PETRI and PETRI-seq was calculated for 4097 genes (A) and 2763 genes (B). The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. Each point represents a gene.

      (2) Additionally, I think it is critical that the authors provide processed read counts per cell and gene in their supplementary information to allow others to investigate the performance of their method without going back to raw FASTQ files, as this can represent a significant hurdle for reanalysis. 

      Thank you for your suggestion. However, it's important to clarify that reads and UMIs (Unique Molecular Identifiers) are distinct concepts in single-cell RNA sequencing. Reads can be influenced by PCR amplification during library construction, making their quantity less stable. In contrast, UMIs serve as a more reliable indicator of the number of mRNA molecules detected after PCR amplification. Throughout our study, we primarily utilized UMI counts for quantification. To address your concern about data accessibility, we have included the UMI counts per cell and gene in our supplementary materials provided above (Table S7-15. Some of the files are too large in memory and are therefore stored in GEO: GSE260458). This approach provides a more accurate representation of gene expression levels and allows for robust reanalysis without the need to process raw FASTQ files.

      (3) Finally, the authors should also discuss other approaches to ribosomal depletion in bacterial scRNA-seq. One of the figures appears to contain such a comparison, but it is never mentioned in the text that I can find, and one could read this manuscript and come away believing this is the first attempt to deplete rRNA from bacterial scRNA-seq. 

      We have addressed this concern by including a comparison of different methods for depleting rRNA from bacterial scRNA-seq in Table S4 and make a short text comparison as follows: “Additionally, we compared our findings with other reported methods (Fig. 1B; Table S4). The original PETRI-seq protocol, which does not include an rRNA depletion step, exhibited an mRNA detection rate of approximately 5%. The MicroSPLiT-seq method, which utilizes Poly A Polymerase for mRNA enrichment, achieved a detection rate of 7%. Similarly, M3-seq and BacDrop-seq, which employ RNase H to digest rRNA post-DNA probe hybridization in cells, reported mRNA detection rates of 65% and 61%, respectively. MATQ-DASH, which utilizes Cas9-mediated targeted rRNA depletion, yielded a detection rate of 30%. Among these, RiboD-PETRI demonstrated superior performance in mRNA detection while requiring the least sequencing depth.” We have added this content in the main text (lines 110-120), specifically in relation to Figure 1B and Table S4. This addition provides context for our method and clarifies its position among existing techniques.

      Detailed comments: 

      Line 78: the authors describe the multiplet frequency, but it is not clear to me how this was determined, for which experiments, or where in the SI I should look to see this. Often this is done by mixing cultures of two distinct bacteria, but I see no evidence of this key experiment in the manuscript. 

      The multiplet frequency we discuss in the manuscript is not determined through experimental mixing of distinct bacterial cultures.The PETRI-seq and mirco-SPLIT articles have also done experiments mixing the two libraries to determine the single-cell rate, and both gave good results. Our technique is derived from these two articles (mainly PETRI-seq), and the biggest difference is the difference in the later RiboD part, so we did not do this experiment separately. So the multiple frequencies here are theoretical predictions based on our sequencing results, calculated using a Poisson distribution. We have made this distinction clearer in our manuscript (lines 93-97). The method is available in Materials and Methods section (lines 520-528). The data is available in Table S2. To elaborate:

      To assess the efficiency of single-cell capture in RiboD-PETRI, we calculated the multiplet frequency using a Poisson distribution based on our sequencing results

      (1) Definition: In our study, multiplet frequency is defined as the probability of a non-empty barcode corresponding to more than one cell.

      (2) Calculation Method: We use a Poisson distribution-based approach to calculate the predicted multiplet frequency. The process involves several steps:

      We first calculate the proportion of barcodes corresponding to zero cells: . Then, we calculate the proportion corresponding to one cell: . We derive the proportion for more than zero cells: P(≥1) = 1 - P(0). And for more than one cell: P(≥2) = 1 - P(1) - P(0). Finally, the multiplet frequency is calculated as:

      (3) Parameter λ: This is the ratio of the number of cells to the total number of possible barcode combinations. For instance, when detecting 10,000 cells, .

      Line 94: the concept of "percentage of gene expression" is never clearly defined. Does this mean the authors detect 99.86% of genes expressed in some cells? How is "expressed" defined - is this just detecting a single UMI? 

      The term "percentage gene expression" refers to the proportion of genes in the bacterial strain that were detected as expressed in the sequenced cell population. Specifically, in this context, it means that 99.86% of all genes in the bacterial strain were detected as expressed in at least one cell in our sequencing results. To define "expressed" more clearly: a gene is considered expressed if at least one UMI (Unique Molecular Identifier) detected in a cell in the population. This definition allows for the detection of even low-level gene expression. To enhance clarity in the manuscript, we have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      Line 98: The authors discuss the number of recovered UMIs throughout this paragraph, but there is no clear discussion of the number of detected expressed genes per cell. Could the authors include a discussion of this as well, as this is another important measure of sensitivity? 

      We appreciate your suggestion to include a discussion on the number of detected expressed genes per cell, as this is indeed another important measure of sensitivity. We would like to clarify that we have actually included statistics on the number of genes detected across all cells in the main text of our paper. This information is presented as percentages. However, we understand that you may be looking for a more detailed representation, similar to the UMI statistics we provided. To address this, we have now added a new analysis showing the number of genes detected per cell (lines 132-133, 138-139, 144-145 and 184-186, Fig. 2B, 3B and S2B). This additional result complements our existing UMI data and provides a more comprehensive view of the sensitivity of our method. We have included this new gene-per-cell statistical graph in the supplementary materials.

      Figure 1B: I presume ctrl and delta delta represent the classic PETRI-seq and RiboD protocols, respectively, but this is not specified. This should be clarified in the figure caption, or the names changed. 

      We appreciate you bringing this to our attention. We acknowledge that the labeling in the figure could have been clearer. We have now clarified this information in the figure caption. To provide more specificity: The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. We have updated the figure caption to include these details, which should help readers better understand the protocols being compared in the figure.​

      Line 104: the authors claim "This performance surpassed other reported bacterial scRNA-seq methods" with a long number of references to other methods. "Performance" is not clearly defined, and it is unclear what the exact claim being made is. The authors should clarify what they're claiming, and further discuss the other methods and comparisons they have made with them in a thorough and fair fashion. 

      We appreciate your request for clarification, and we acknowledge that our definition of "performance" should have been more explicit. We would like to clarify that in this context, we define performance primarily in terms of the proportion of mRNA captured. Our improved method demonstrates a significantly higher rate of rRNA removal compared to other bacterial single-cell library construction methods. This results in a higher proportion of mRNA in our sequencing data, which we consider a key performance metric for single-cell RNA sequencing in bacteria. Additionally, when compared to our previous method, PETRI-seq, our improved approach not only enhances rRNA removal but also reduces library construction costs. This dual improvement in both data quality and cost-effectiveness is what we intended to convey with our performance claim.

      We recognize that a more thorough and fair discussion of other methods and their comparisons would be beneficial. We have summarized the comparison in Table S4 and make a short text discussion in the main text (lines 106-120). This addition provides context for our method and clarifies its position among existing techniques.

      Figure 1D: Do the authors have any explanation for the relatively lower performance of their C. crescentus depletion? 

      We appreciate your attention to detail and the opportunity to address this point. The lower efficiency of rRNA removal in C. crescentus compared to other species can be attributed to inherent differences between species. It's important to note that a single method for rRNA depletion may not be universally effective across all bacterial species due to variations in their genetic makeup and rRNA structures. Different bacterial species can have unique rRNA sequences, secondary structures, or associated proteins that may affect the efficiency of our depletion method. This species-specific variation highlights the challenges in developing a one-size-fits-all approach for bacterial rRNA depletion. While our method has shown high efficiency across several species, the results with C. crescentus underscore the need for continued refinement and possibly species-specific optimizations in rRNA depletion techniques. We thank you for bringing attention to this point, as it provides valuable insight into the complexities of bacterial rRNA depletion and areas for future improvement in our method.

      Line 118: The authors claim RiboD-PETRI has a "consistent ability to unveil within-population heterogeneity", however the preceding paragraph shows it detects potential heterogeneity, but provides no evidence this inferred heterogeneity reflects the reality of gene expression in individual cells. 

      We appreciate your careful reading and the opportunity to clarify this point. We acknowledge that our wording may have been too assertive given the evidence presented. We acknowledge that the subpopulations of cells identified in other species have not undergone experimental verification. Our intention in presenting these results was to demonstrate RiboD-PETRI's capability to detect “potential” heterogeneity consistently across different bacterial species, showcasing the method's sensitivity and potential utility in exploring within-population diversity. However, we agree that without further experimental validation, we cannot definitively claim that these detected differences represent true biological heterogeneity in all cases. We have revised this section to reflect the current state of our findings more accurately, emphasizing that while RiboD-PETRI consistently detects potential heterogeneity across species, further experimental validation would be required to confirm the biological significance of the observations (lines 169-171).

      Figure 1 H&I: I'm not entirely sure what I am meant to see in these figures, presumably some evidence for heterogeneity in gene expression. Are there better visualizations that could be used to communicate this? 

      We appreciate your suggestion for improving the visualization of gene expression heterogeneity. We have explored alternative visualization methods in the revised manuscript. Specifically, for the expression levels of marker genes shown in Figure 1H (which is Figure 2D now), we have created violin plots (Supplementary Fig. 4). These plots offer a more comprehensive view of the distribution of expression levels across different cell populations, making it easier to discern heterogeneity. However, due to the number of marker genes and the resulting volume of data, these violin plots are quite extensive and would occupy a significant amount of space. Given the space constraints of the main figure, we propose to include these violin plots as a Fig. S4 immediately following Figure 1 H&I (which is Figure 2D&E now). This arrangement will allow readers to access more detailed information about these marker genes while maintaining the concise style of the main figure.

      Regarding the pathway enrichment figure (Figure 2E), we have also considered your suggestion for improvement. We attempted to use a dot plot to display the KEGG pathway enrichment of the genes. However, our analysis revealed that the genes were only enriched in a single pathway. As a result, the visual representation using a dot plot still did not produce a particularly aesthetically pleasing or informative figure.

      Line 124: The authors state no significant batch effect was observed, but in the methods on line 344 they specify batch effects were removed using Harmony. It's unclear what exactly S2 is showing without a figure caption, but the authors should clarify this discrepancy. 

      We apologize for any confusion caused by the lack of a clear figure caption for Figure S2 (which is Figure S3D now). To address your concern, in addition to adding figure captions for supplementary figure, we would also like to provide more context about the batch effect analysis. In Supplementary Fig. S3, Panel C represents the results without using Harmony for batch effect removal, while Panel D shows the results after applying Harmony. In both panels A and B, the distribution of samples one and two do not show substantial differences. Based on this observation, we concluded that there was no significant batch effect between the two samples. However, we acknowledge that even subtle batch effects could potentially influence downstream analyses. Therefore, out of an abundance of caution and to ensure the highest quality of our results, we decided to apply Harmony to remove any potential minor batch effects. This approach aligns with best practices in single-cell analysis, where even small technical variations are often accounted for to enhance the robustness of the results.

      To improve clarity, we have revised our manuscript to better explain this nuanced approach: 1. We have updated the statement to reflect that while no major batch effect was observed, we applied batch correction as a precautionary measure (lines 181-182). 2. We have added a detailed caption to Figure S3, explaining the comparison between non-corrected and batch-corrected data. 3. We have modified the methods section to clarify that Harmony was applied as a precautionary step, despite the absence of obvious batch effects (lines 492-493).

      Figure 2D: I found this panel fairly uninformative, is there a better way to communicate this finding? 

      Thank you for your feedback regarding Figure 2D. We have explored alternative ways to present this information, using a dot plot to display the enrichment pathways, as this is often an effective method for visualizing such data. Meanwhile, we also provided a more detailed textual description of the enrichment results in the main text, highlighting the most significant findings.

      Figure 2I: the figure itself and caption say GFP, but in the text and elsewhere the authors say this is a BFP fusion. 

      We appreciate your careful review of our manuscript and figures. We apologize for any confusion this may have caused. To clarify: Both GFP (Green Fluorescent Protein) and BFP (Blue Fluorescent Protein) were indeed used in our experiments, but for different purposes: 1. GFP was used for imaging to observe location of PdeI in bacteria and persister cell growth, which is shown in Figure 4C and 4K. 2. BFP was used for cell sorting, imaging of location in biofilm, and detecting the proportion of persister cells which shown in Figure 4D, 4F-J. To address this inconsistency and improve clarity, we will make the following corrections: 1. We have reviewed the main text to ensure that references to GFP and BFP are accurate and consistent with their respective uses in our experiments. 2. We have added a note in the figure caption for Figure 4C to explicitly state that this particular image shows GFP fluorescence for location of PdeI. 3. In the methods section, we have provided a clear explanation of how both fluorescent proteins were used in different aspects of our study (lines 326-340).

      Line 156: The authors compare prices between RiboD and PETRI-seq. It would be helpful to provide a full cost breakdown, e.g. in supplementary information, as it is unclear exactly how the authors came to these numbers or where the major savings are (presumably in sequencing depth?) 

      We appreciate your suggestion to provide a more detailed cost breakdown, and we agree that this would enhance the transparency and reproducibility of our cost analysis. In response to your feedback, we have prepared a comprehensive cost breakdown that includes all materials and reagents used in the library preparation process. Additionally, we've factored in the sequencing depth (50G) and the unit price for sequencing (25¥/G). These calculations allow us to determine the cost per cell after sequencing. As you correctly surmised, a significant portion of the cost reduction is indeed related to sequencing depth. However, there are also savings in the library preparation steps that contribute to the overall cost-effectiveness of our method. We propose to include this detailed cost breakdown as a supplementary table (Table S6) in our paper. This table will provide a clear, itemized list of all expenses involved, including: 1. Reagents and materials for library preparation 2. Sequencing costs (depth and price per G) 3. Calculated cost per cell.

      Line 291: The design and production of the depletion probes are not clearly explained. How did the authors design them? How were they synthesized? Also, it appears the authors have separate probe sets for E. coli, C. crescentus, and S. aureus - this should be clarified, possibly in the main text.

      Thank you for your important questions regarding the design and production of our depletion probes. We included the detailed probe information in Supplementary Table S1, however, we didn’t clarify the information in the main text due to the constrains of the requirements of the Short Report format in eLife. We appreciate the opportunity to provide clarifications. ​

      The core principle behind our probe design is that the probe sequences are reverse complementary to the r-cDNA sequences. This design allows for specific recognition of r-cDNA. The probes are then bound to magnetic beads, allowing the r-cDNA-probe-bead complexes to be separated from the rest of the library. To address your specific questions: 1. Probe Design: We designed separate probe sets for E. coli, C. crescentus, and S. aureus. Each set was specifically constructed to be reverse complementary to the r-cDNA sequences of its respective bacterial species. This species-specific approach ensures high efficiency and specificity in rRNA depletion for each organism. The hybrid DNA complex wasthen removed by Streptavidin magnetic beads. 2. Probe Synthesis: The probes were synthesized based on these design principles. 3. Species-Specific Probe Sets: You are correct in noting that we used separate probe sets for each bacterial species. We have clarified this important point in the main text to ensure readers understand the specificity of our approach. To further illustrate this process, we have created a schematic diagram showing the principle of rRNA removal and clarified the design principle in figure legend, which we have included in the figure legend of Fig. 1A.

      Line 362: I didn't see a description of the construction of the PdeI-BFP strain, I assume this would be important for anyone interested in the specific work on PdeI. 

      Thank you for your astute observation regarding the construction of the PdeI-BFP strain. We appreciate the opportunity to provide this important information. The PdeI-BFP strain was constructed as follows: 1. We cloned the pdeI gene along with its native promoter region (250bp) into a pBAD vector. 2. The original promoter region of the pBAD vector was removed to avoid any potential interference. 3. This construction enables the expression of the PdeI-BFP fusion protein to be regulated by the native promoter of pdeI, thus maintaining its physiological control mechanisms. 4. The BFP coding sequence was fused to the pdeI gene to create the PdeI-BFP fusion construct. We have added a detailed description of the PdeI-BFP strain construction to our methods section (lines 327-334).

      Reviewer #2 (Recommendations For The Authors): 

      (1) General remarks: 

      Reconsider using 'advanced' in the title. It is highly generic and misleading. Perhaps 'cost-efficient' would be a more precise substitute. 

      Thank you for your valuable suggestion. After careful consideration, we have decided to use "improved" in the title. Firstly, our method presents an efficient solution to a persistent challenge in bacterial single-cell RNA sequencing, specifically addressing rRNA abundance. Secondly, it facilitates precise exploration of bacterial population heterogeneity. We believe our method encompasses more than just cost-effectiveness, justifying the use of the term "advanced."

      Consider expanding the introduction. The introduction does not explain the setup of the biological question or basic details such as the organism(s) for which the technique has been developed, or which species biofilms were studied. 

      Thank you for your valuable feedback regarding our introduction. We acknowledge our compressed writing style due to constrains of the requirements of the Short Report format in eLife. We appreciate opportunity to expand this crucial section of our manuscript, which will undoubtedly improve the clarity and impact of our manuscript's introduction.

      We revised our introduction (lines 53-80) according to following principles:

      (1) Initial Biological Question: We explained the initial biological question that motivated our research—understanding the heterogeneity in E. coli biofilms—to provide essential context for our technological development.

      (2) Limitations of Existing Techniques: We briefly described the limitations of current single-cell sequencing techniques for bacteria, particularly regarding their application in biofilm studies.

      (3) Introduction of Improved Technique: We introduced our improved technique, initially developed for E. coli.

      (4) Research Evolution: We highlighted how our research has evolved, demonstrating that our technique is applicable not only to E. coli but also to Gram-positive bacteria and other Gram-negative species, showcasing the broad applicability of our method.

      (5) Specific Organisms Studied: We provided examples of the specific organisms we studied, encompassing both Gram-positive and Gram-negative bacteria.

      (6) Potential Implications: Finally, we outlined the potential implications of our technique for studying bacterial heterogeneity across various species and contexts, extending beyond biofilms.

      (2) Writing remarks: 

      43-45 Reword: "Thus, we address a persistent challenge in bacterial single-cell RNA-seq regarding rRNA abundance, exemplifying the utility of this method in exploring biofilm heterogeneity.". 

      Thank you for highlighting this sentence and requesting a rewording. I appreciate the opportunity to improve the clarity and impact of our statement. We have reworded the sentence as: "Our method effectively tackles a long-standing issue in bacterial single-cell RNA-seq: the overwhelming abundance of rRNA. This advancement significantly enhances our ability to investigate the intricate heterogeneity within biofilms at unprecedented resolution." (lines 47-50)

      49 "Biofilms, comprising approximately 80% of chronic and recurrent microbial infections in the human body..." - probably meant 'contribute to'. 

      Thank you for catching this imprecision in our statement. We have reworded the sentence as: "​Biofilms contribute to approximately 80% of chronic and recurrent microbial infections in the human body...​"

      54-55 Please expand on "this". 

      Thank you for your request to expand on the use of "this" in the sentence. You're right that more clarity would be beneficial here. We have revised and expanded this section in lines 54-69.

      81-84 Unclear why these species samples were either at exponential or stationary phases. The growth stage can influence the proportion of rRNA and other transcripts in the population. 

      Thank you for raising this important point about the growth phases of the bacterial samples used in our study. We appreciate the opportunity to clarify our experimental design. To evaluate the performance of RiboD-PETRI, we designed a comprehensive assessment of rRNA depletion efficiency under diverse physiological conditions, specifically contrasting exponential and stationary phases. This approach allows us to understand how these different growth states impact rRNA depletion efficacy. Additionally, we included a variety of bacterial species, encompassing both gram-negative and gram-positive organisms, to ensure that our findings are broadly applicable across different types of bacteria. By incorporating these variables, we aim to provide insights into the robustness and reliability of the RiboD-PETRI method in various biological contexts. We have included this rationale in our result section (lines 99-106), providing readers with a clear understanding of our experimental design choices.

      86 "compared TO PETRI-seq " (typo). 

      We have corrected this typo in our manuscript.

      94 "gene expression collectively" rephrase. Probably this means coverage of the entire gene set across all cells. Same for downstream usage of the phrase. 

      Thank you for pointing out this ambiguity in our phrasing. Your interpretation of our intended meaning is accurate. We have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      97 What were the median UMIs for the 30,000 cell library {greater than or equal to}15 UMIs? Same question for the other datasets. This would reflect a more comparable statistic with previous studies than the top 3% of the cells for example, since the distributions of the single-cell UMIs typically have a long tail. 

      Thank you for this insightful question and for pointing out the importance of providing more comparable statistics. We agree that median values offer a more robust measure of central tendency, especially for datasets with long-tailed distributions, which are common in single-cell studies. The suggestion to include median Unique Molecular Identifier (UMI) counts would indeed provide a more comparable statistic with previous studies. We have analyzed the median UMIs for our libraries as follows and revised our manuscript according to the analysis (lines 126-130, 133-136, 139-142 and 175-180).

      (1) Median UMI count in Exponential Phase E. coli:

      Total: 102 UMIs per cell

      Top 1,000 cells: 462 UMIs per cell

      Top 5,000 cells: 259 UMIs per cell

      Top 10,000 cells: 193 UMIs per cell

      (2) Median UMI count in Stationary Phase S. aureus:

      Total: 142 UMIs per cell

      Top 1,000 cells: 378 UMIs per cell

      Top 5,000 cells: 207 UMIs per cell

      Top 8,000 cells: 167 UMIs per cell

      (3) Median UMI count in Exponential Phase C. crescentus:

      Total: 182 UMIs per cell

      Top 1,000 cells: 2,190 UMIs per cell

      Top 5,000 cells: 662 UMIs per cell

      Top 10,000 cells: 225 UMIs per cell

      (4) Median UMI count in Static E. coli Biofilm:

      Total of Replicate 1: 34 UMIs per cell

      Total of Replicate 2: 52 UMIs per cell

      Top 1,621 cells of Replicate 1: 283 UMIs per cell

      Top 3,999 cells of Replicate 2: 239 UMIs per cell

      104-105 The performance metric should again be the median UMIs of the majority of the cells passing the filter (15 mRNA UMIs is reasonable). The top 3-5% are always much higher in resolution because of the heavy tail of the single-cell UMI distribution. It is unclear if the performance surpasses the other methods using the comparable metric. Recommend removing this line. 

      We appreciate your suggestion regarding the use of median UMIs as a more appropriate performance metric, and we agree that comparing the top 3-5% of cells can be misleading due to the heavy tail of the single-cell UMI distribution. We have removed the line in question (104-105) that compares our method's performance based on the top 3-5% of cells in the revised manuscript. Instead, we focused on presenting the median UMI counts for cells passing the filter (≥15 mRNA UMIs) as the primary performance metric. This will provide a more representative and comparable measure of our method's performance. We have also revised the surrounding text to reflect this change, ensuring that our claims about performance are based on these more robust statistics (lines 126-130, 133-136, 139-142 and 175-180).

      106-108 The sequencing saturation of the libraries (in %), and downsampling analysis should be added to illustrate this point. 

      Thank you for your valuable suggestion. Your recommendation to add sequencing saturation and downsampling analysis is highly valuable and will help better illustrate our point. Based on your feedback, we have revised our manuscript by adding the following content:

      To provide a thorough evaluation of our sequencing depth and library quality, we performed sequencing saturation analysis on our sequencing samples. The findings reveal that our sequencing saturation is 100% (Fig. 8A & B), indicating that our sequencing depth is sufficient to capture the diversity of most transcripts. To further illustrate the impact of our downstream analysis on the datasets, we have demonstrated the data distribution before and after applying our filtering criteria (Fig. S1B & C). These figures effectively visualized the influence of our filtering process on the data quality and distribution. After filtering, we can have a more refined dataset with reduced noise and outliers, which enhances the reliability of our downstream analyses.

      We have also ensured that a detailed description of the sequencing saturation method is included in the manuscript to provide readers with a comprehensive understanding of our methodology. We appreciate your feedback and believe these additions significantly improve our work.

      122: Please provide more details about the biofilm setup, including the media used. I did not find them in the methods. 

      We appreciate your attention to detail, and we agree that this information is crucial for the reproducibility of our experiments. We propose to add the following information to our methods section (lines 311-318):

      "For the biofilm setup, bacterial cultures were grown overnight. The next day, we diluted the culture 1:100 in a petri dish. We added 2ml of LB medium to the dish. If the bacteria contain a plasmid, the appropriate antibiotic needs to be added to LB. The petri dish was then incubated statically in a growth chamber for 24 hours. After incubation, we performed imaging directly under the microscope. The petri dishes used were glass-bottom dishes from Biosharp (catalog number BS-20-GJM), allowing for direct microscopic imaging without the need for cover slips or slides. This setup allowed us to grow and image the biofilms in situ, providing a more accurate representation of their natural structure and composition.​"

      125: "sequenced 1,563 reads" missing "with" 

      Thank you for correcting our grammar. We have revisd the phrase as “sequenced with 1,563 reads”.

      126: "283/239 UMIs per cell" unclear. 283 and 239 UMIs per cell per replicate, respectively? 

      Thank you for correcting our grammar. We have revised the phrase as “283 and 239 UMIs per cell per replicate, respectively” (lines 184).

      Figure 1D: Please indicate where the comparison datasets are from. 

      We appreciate your question regarding the source of the comparison datasets in Figure 1D. All data presented in Figure 1D are from our own sequencing experiments. We did not use data from other publications for this comparison. Specifically, we performed sequencing on E. coli cells in the exponential growth phase using three different library preparation methods: RiboD-PETRI, PETRI-seq, and RNA-seq. The data shown in Figure 1D represent a comparison of UMIs and/or reads correlations obtained from these three methods. All sequencing results have been uploaded to the Gene Expression Omnibus (GEO) database. The accession number is GSE260458. We have updated the figure legend for Figure 1D to clearly state that all datasets are from our own experiments, specifying the different methods used.

      Figure 1I, 2D: Unable to interpret the color block in the data. 

      We apologize for any confusion regarding the interpretation of the color blocks in Figures 1I and 2D (which are Figure 2E, 3E now). The color blocks in these figures represent the p-values of the data points. The color scale ranges from red to blue. Red colors indicate smaller p-values, suggesting higher statistical significance and more reliable results. Blue colors indicate larger p-values, suggesting lower statistical significance and less reliable results. We have updated the figure legends for both Figure 2E and Figure 3E to include this explanation of the color scale. Additionally, we have added a color legend to each figure to make the interpretation more intuitive for readers.

      Figure1H and 2C: Gene names should be provided where possible. The locus tags are highly annotation-dependent and hard to interpret. Also, a larger size figure should be helpful. The clusters 2 and 3 in 2C are the most important, yet because they have few cells, very hard to see in this panel. 

      We appreciate your suggestions for improving the clarity and interpretability of Figures 1H and 2C (which is Figure 2D, 3D now). We have replaced the locus tags with gene names where possible in both figures. We have increased the size of both figures to improve visibility and readability. We have also made Clusters 2 and 3 in Figure 3D more prominent in the revised figure. Despite their smaller cell count, we recognize their importance and have adjusted the visualization to ensure they are clearly visible. We believe these modifications will significantly enhance the clarity and informativeness of Figures 2D and 3D.​

      (3) Questions to consider further expanding on, by more analyses or experiments and in the discussion: 

      What are the explanations for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels? How could a phosphodiesterase lead to increased c-di-GMP levels? 

      We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis (Fig S8). This dual-domain architecture suggests that PdeI may engage in complex regulatory roles. Previous studies have shown that the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels2, implying that the wild-type GGDEF domain in PdeI contributes to the maintenance or increase of c-di-GMP levels in the cell. Importantly, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Response Fig. 9B). In this revision, we also constructed PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite increasing BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Fig. 4D). This experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that PdeI overexpression, induced by arabinose, led to an upregulation of c-di-GMP levels (Fig. 4E). These results strongly suggest that PdeI plays a significant role in upregulating c-di-GMP levels. Our further analysis revealed that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we hypothesize that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We have also included this explanation (lines 193-217) and the supporting experimental data (Fig. 4D & 4J) in our manuscript to clarify this important point. Thank you for highlighting this apparent contradiction, as it has allowed us to provide a more comprehensive explanation of our findings.

      What about the rest of the genes in cluster 2 of the biofilm? They should be used to help interpret the association between PdeI and c-di-GMP. 

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. After careful analysis, we have determined that the other marker genes in this cluster do not have a significant impact on biofilm formation. Furthermore, we have not found any direct relationship between these genes and c-di-GMP or PdeI. Our focus on PdeI in this cluster is due to its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While the other genes in this cluster may be co-expressed, their functions appear to be unrelated to the PdeI and c-di-GMP pathway we are investigating. We chose not to elaborate on these genes in our main discussion as they do not contribute directly to our understanding of the PdeI and c-di-GMP association. Instead, we could include a brief mention of these genes in the manuscript, noting that they were found to be unrelated to the PdeI-c-di-GMP pathway. This would provide a more comprehensive view of the cluster composition while maintaining focus on the key findings related to PdeI and c-di-GMP.

      Author response image 2.

      Protein-protein interactions of marker genes in cluster 2 of 24-hour static biofilms of E coli data.

      A verification is needed that the protein fusion to PdeI functional/membrane localization is not due to protein interactions with fluorescent protein fusion. 

      We appreciate your concern regarding the potential impact of the fluorescent protein fusion on the functionality and membrane localization of PdeI. It is crucial to verify that the observed effects are attributable to PdeI itself and not an artifact of its fusion with the fluorescent protein. To address this matter, we have incorporated a control group expressing only the fluorescent protein BFP (without the PdeI fusion) under the same promoter. This experimental design allows us to differentiate between effects caused by PdeI and those potentially arising from the fluorescent protein alone.

      Our results revealed the following key observations:

      (1) Cellular Localization: The GFP alone exhibited a uniform distribution in the cytoplasm of bacterial cells, whereas the PdeI-GFP fusion protein was specifically localized to the membrane (Fig. 4C).

      (2) Localization in the Biofilm Matrix: BFP-positive cells were distributed throughout the entire biofilm community. In contrast, PdeI-BFP positive cells localized at the bottom of the biofilm, where cell-surface adhesion occurs (Fig 4F).

      (3) c-di-GMP Levels: Cells with high levels of BFP displayed no increase in c-di-GMP levels. Conversely, cells with high levels of PdeI-BFP exhibited a significant increase in c-di-GMP levels (Fig. 4D).

      (4) Persister Cell Ratio: Cells expressing high levels of BFP showed no increase in persister ratios, while cells with elevated levels of PdeI-BFP demonstrated a marked increase in persister ratios (Fig. 4J).

      These findings from the control experiments have been included in our manuscript (lines 193-244, Fig. 4C, 4D, 4F, 4G and 4J), providing robust validation of our results concerning the PdeI fusion protein. They confirm that the observed effects are indeed due to PdeI and not merely artifacts of the fluorescent protein fusion.

      (!) Vrabioiu, A. M. & Berg, H. C. Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proceedings of the National Academy of Sciences of the United States of America 119, doi:10.1073/pnas.2116830119 (2022). https://doi.org/10.1073/pnas.2116830119

      (2)bReinders, A. et al. Expression and Genetic Activation of Cyclic Di-GMP-Specific Phosphodiesterases in Escherichia coli. J Bacteriol 198, 448-462 (2016). https://doi.org:10.1128/JB.00604-15

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study attempts to resolve an apparent paradox of rapid evolutionary rates of multi-copy gene systems by using a theoretical model that integrates two classic population models. While the conceptual framework is intuitive and thus useful, the specific model is perplexing and difficult to penetrate for non-specialists. The data analysis of rRNA genes provides inadequate support for the conclusions due to a lack of consideration of technical challenges, mutation rate variation, and the relationship between molecular processes and model parameters.

      Overall Responses:

      Since the eLife assessment succinctly captures the key points of the reviews, the reply here can be seen as the overall responses to the summed criticisms. We believe that the overview should be sufficient to address the main concerns, but further details can be found in the point-by-point responses below. The overview covers the same grounds as the provisional responses (see the end of this rebuttal) but is organized more systematically in response to the reviews. The criticisms together fall into four broad areas. 

      First, the lack of engagement with the literature, particularly concerning Cannings models and non-diffusive limits. This is the main rebuttal of the companion paper (eLife-RP-RA-2024-99990). The literature in question is all in the WF framework and with modifications, in particular, with the introduction of V(K). Nevertheless, all WF models are based on population sampling. The Haldane model is an entirely different model of genetic drift, based on gene transmission. Most importantly, the WF models and the Haldane model differ in the ability to handle the four paradoxes presented in the two papers. These paradoxes are all incompatible with the WF models.

      Second, the poor presentation of the model that makes the analyses and results difficult to interpret. In retrospect, we fully agree and thank all the reviewers for pointing them out. Indeed, we have unnecessarily complicated the model. Even the key concept that defines the paradox, which is the effective copy number of rRNA genes, is difficult to comprehend. We have streamlined the presentation now. Briefly, the complexity arose from the general formulation permitting V(K) ≠ E(K) even for single copy genes. (It would serve the same purpose if we simply let V(K) = E(K) for single copy genes.) The sentences below, copied from the new abstract, should clarify the issue. The full text in the Results section has all the details.

      “On average, rDNAs have C ~ 150 - 300 copies per haploid in humans. While a neutral mutation of a single-copy gene would take 4N generations (N being the population size of an ideal population) to become fixed, the time should be 4NC* generations for rRNA genes (C* being the effective copy number). Note that C* >> 1, but C* < (or >) C would depend on the drift strength. Surprisingly, the observed fixation time in mouse and human is < 4N, implying the paradox of C* < 1.”

      Third, the confusion about which rRNA gene is being compared with which homology, as there are hundreds of them. We should note that the effective copy number C* indicates that the rRNA gene arrays do not correspond with the “gene locus” concept. This is at the heart of the confusion we failed to remove clearly. We now use the term “pseudo-population” to clarify the nature of rDNA variation and evolution. The relevant passage is reproduced from the main text shown below.

      “The pseudo-population of ribosomal DNA copies within each individual

      While a human haploid with 200 rRNA genes may appear to have 200 loci, the concept of "gene loci" cannot be applied to the rRNA gene clusters. This is because DNA sequences can spread from one copy to others on the same chromosome via replication slippage. They can also spread among copies on different chromosomes via gene conversion and unequal crossovers (Nagylaki 1983; Ohta and Dover 1983; Stults, et al. 2008; Smirnov, et al. 2021). Replication slippage and unequal crossovers would also alter the copy number of rRNA genes. These mechanisms will be referred to collectively as the homogenization process. Copies of the cluster on the same chromosome are known to be nearly identical in sequences (Hori, et al. 2021; Nurk, et al. 2022). Previous research has also provided extensive evidence for genetic exchanges between chromosomes (Krystal, et al. 1981; Arnheim, et al. 1982; van Sluis, et al. 2019).

      In short, rRNA gene copies in an individual can be treated as a pseudo-population of gene copies. Such a pseudo-population is not Mendelian but its genetic drift can be analyzed using the branching process (see below). The pseudo-population corresponds to the "chromosome community" proposed recently (Guarracino, et al. 2023). As seen in Fig. 1C, the five short arms harbor a shared pool of rRNA genes that can be exchanged among them. Fig. 1D presents the possible molecular mechanisms of genetic drift within individuals whereby mutations may spread, segregate or disappear among copies. Hence, rRNA gene diversity or polymorphism refers to the variation across all rRNA copies, as these genes exist as paralogs rather than orthologs. This diversity can be assessed at both individual and population levels according to the multi-copy nature of rRNA genes.”

      Fourth, the lack of consideration of many technical challenges. We have responded to the criticisms point-by-point below. One of the main criticisms is about mutation rate differences between single-copy and rRNA genes. We did in fact alluded to the parity in mutation rate between them in the original text but should have presented this property more prominently as is done now. Below is copied from the revised text:

      “We now consider the evolution of rRNA genes between species by analyzing the rate of fixation (or near fixation) of mutations. Polymorphic variants are filtered out in the calculation. Note that Eq. (3) shows that the mutation rate, m, determines the long-term evolutionary rate, l. Since we will compare the l values between rRNA and single-copy genes, we have to compare their mutation rates first by analyzing their long-term evolution. As shown in Table S1, l falls in the range of 50-60 (differences per Kb) for single copy genes and 40 – 70 for the non-functional parts of rRNA genes. The data thus suggest that rRNA and single-copy genes are comparable in mutation rate. Differences between their l values will have to be explained by other means.”

      While the overview should address the key issues, we now present the point-by-point response below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Wang et al is, like its companion paper, very unusual in the opinion of this reviewer. It builds off of the companion theory paper's exploration of the "Wright-Fisher Haldane" model but applies it to the specific problem of diversity in ribosomal RNA arrays.

      The authors argue that polymorphism and divergence among rRNA arrays are inconsistent with neutral evolution, primarily stating that the amount of polymorphism suggests a high effective size and thus a slow fixation rate, while we, in fact, observe relatively fast fixation between species, even in putatively non-functional regions.

      They frame this as a paradox in need of solving, and invoke the WFH model.

      The same critiques apply to this paper as to the presentation of the WFH model and the lack of engagement with the literature, particularly concerning Cannings models and non-diffusive limits. However, I have additional concerns about this manuscript, which I found particularly difficult to follow.

      Response 1: We would like to emphasize that, despite the many modified WF models, there has not been a model for quantifying genetic drift in multi-copy gene systems, due to the complexity of two levels of genetic drift – within individuals as well as between individuals of the population. We will address this question in the revised manuscript (Ruan, et al. 2024) and have included a mention of it in the text as follows:

      “In the WF model, gene frequency is governed by 1/N (or 1/2_N_ in diploids) because K would follow the Poisson distribution whereby V(K) = E(K). As E(K) is generally ~1, V(K) would also be ~ 1. In this backdrop, many "modified WF" models have been developed(Der, et al. 2011), most of them permitting V(K) ≠ E(K) (Karlin and McGregor 1964; Chia and Watterson 1969; Cannings 1974). Nevertheless, paradoxes encountered by the standard WF model apply to these modified WF models as well because all WF models share the key feature of gene sampling (see below and (Ruan, et al. 2024)). ”

      My first, and most major, concern is that I can never tell when the authors are referring to diversity in a single copy of an rRNA gene compared to when they are discussing diversity across the entire array of rRNA genes. I admit that I am not at all an expert in studies of rRNA diversity, so perhaps this is a standard understanding in the field, but in order for this manuscript to be read and understood by a larger number of people, these issues must be clarified.

      Response 2: We appreciate the reviewer’s feedback and acknowledge that the distinction between the diversity of individual rRNA gene copies and the diversity across the entire array of rRNA genes may not have been clearly defined in the original manuscript. The diversity in our manuscript is referring to the genetic diversity of the population of rRNA genes in the cell. To address this concern, we have revised the relevant paragraph in the text:

      “Hence, rRNA gene diversity or polymorphism refer to the variation across all rRNA copies, as these genes exist as paralogs rather than orthologs. This diversity can be assessed at both individual and population levels according to the multi-copy nature of rRNA genes.”

      Additionally, we have updated the Methods section to include a detailed description of how diversity is measured as follows:

      “All mapping and analysis are performed among individual copies of rRNA genes.

      Each individual was considered as a psedo-population of rRNA genes and the diversity of rRNA genes was calculated using this psedo-population of rRNA genes.”

      The authors frame the number of rRNA genes as roughly equivalent to expanding the population size, but this seems to be wrong: the way that a mutation can spread among rRNA gene copies is fundamentally different than how mutations spread within a single copy gene. In particular, a mutation in a single copy gene can spread through vertical transmission, but a mutation spreading from one copy to another is fundamentally horizontal: it has to occur because some molecular mechanism, such as slippage, gene conversion, or recombination resulted in its spread to another copy. Moreover, by collapsing diversity across genes in an rRNA array, the authors are massively increasing the mutational target size.   

      For example, it's difficult for me to tell if the discussion of heterozygosity at rRNA genes in mice starting on line 277 is collapsed or not. The authors point out that Hs per kb is ~5x larger in rRNA than the rest of the genome, but I can't tell based on the authors' description if this is diversity per single copy locus or after collapsing loci together. If it's the first one, I have concerns about diversity estimation in highly repetitive regions that would need to be addressed, and if it's the second one, an elevated rate of polymorphism is not surprising, because the mutational target size is in fact significantly larger.

      Response 3: As addressed in previous Response2, the measurement of diversity or heterozygosity of rRNA genes is consistently done by combining copies, as there is no concept of single gene locus for rDNAs. We agree that by combining the diversity across multiple rRNA gene copies into one measurement, the mutational target size is effectively increased, leading to higher observed levels of diversity than one gene. This is in line with our text:

      “If we use the polymorphism data, it is as if rDNA array has a population size 5.2 times larger than single-copy genes. Although the actual copy number on each haploid is ~ 110, these copies do not segregate like single-copy genes and we should not expect N* to be 100 times larger than N. The HS results confirm the prediction that rRNA genes should be more polymorphic than single-copy genes.”

      Under this consensus, the reviewer points out that the having a large number of rRNA genes is not equivalent to having a larger population size, because the spreading of mutations among rDNA copies within a species involves two stages: within individual (horizontal transmission) and between individuals (vertical transmission). Let’s examine how the mutation spreading mechanisms influence the population size of rRNA genes.

      First, an increase in the copy number of rRNA genes dose increase the actual population size (CN) of rRNA genes. If reviewer is referring to the effective population size of rRNA genes in the context of diversity (N* = CN/V*(K)), then an increase in C would also increase N*. In addition, the linkage among copies would reduce the drift effect, leading to increase diversity. Conversely, homogenization mechanism, like gene conversion and unequal crossing-over would reduce genetic variations between copies and increase V*(K), leading to lower diversity. Therefore, the C* =C/V*(K) in mice is about 5 times larger for rRNA genes than the rest of the genome (which mainly single-copy genes), even though the actual copy number is about 110, indicating a high homogenization rate.

      Even if these issues were sorted out, I'm not sure that the authors framing, in terms of variance in reproductive success is a useful way to understand what is going on in rRNA arrays. The authors explicitly highlight homogenizing forces such as gene conversion and replication slippage but then seem to just want to incorporate those as accounting for variance in reproductive success. However, don't we usually want to dissect these things in terms of their underlying mechanism? Why build a model based on variance in reproductive success when you could instead explicitly model these homogenizing processes? That seems more informative about the mechanism, and it would also serve significantly better as a null model, since the parameters would be able to be related to in vitro or in vivo measurements of the rates of slippage, gene conversion, etc.

      In the end, I find the paper in its current state somewhat difficult to review in more detail, because I have a hard time understanding some of the more technical aspects of the manuscript while so confused about high-level features of the manuscript. I think that a revision would need to be substantially clarified in the ways I highlighted above.

      Response 4: We appreciate your perspective on modeling the homogenizing processes of rRNA gene arrays.

      We employ the WFH model to track the drift effect of the multi-copy gene system. In the context of the Haldane model, the term K is often referred to as reproductive success, but it might be more accurate to interpret it as “transmission rate” in this study. As stated in the caption of Figure 1D, two new mutations can have very large differences in individual output (K) when transmitted to the next generation through homogenization process.

      Regarding why we did not explicitly model different mechanisms of homogenization, previous elegant models of multigene families have involved mechanisms like unequal crossing over(Smith 1974a; Ohta 1976; Smith 1976) or gene conversion (Nagylaki 1983; Ohta 1985) for concerted evolution, or using conversion to approximate the joint effect of conversion and crossing over (Ohta and Dover 1984). However, even when simplifying the gene conversion mechanism, modeling remains challenging due to controversial assumptions, such as uniform homogenization rate across all gene members (Dover 1982; Ohta and Dover 1984). No models can fully capture the extreme complexity of factors, while these unbiased mechanisms are all genetic drift forces that contribute to changes in mutant transmission. Therefore, we opted for a more simplified and collective approach using V*(K) to see the overall strength of genetic drift.

      We have discussed the reason for using V*(K) to collectively represent the homogenization effect in Discussion. As stated in our manuscript:

      “There have been many rigorous analyses that confront the homogenizing mechanisms directly. These studies (Smith 1974b; Ohta 1976; Dover 1982; Nagylaki 1983; Ohta and Dover 1983) modeled gene conversion and unequal cross-over head on. Unfortunately, on top of the complexities of such models, the key parameter values are rarely obtainable. In the branching process, all these complexities are wrapped into V*(K) for formulating the evolutionary rate. In such a formulation, the collective strength of these various forces may indeed be measurable, as shown in this study.”

      Reviewer #2 (Public Review):

      Summary:

      Multi-copy gene systems are expected to evolve slower than single-copy gene systems because it takes longer for genetic variants to fix in the large number of gene copies in the entire population. Paradoxically, their evolution is often observed to be surprisingly fast. To explain this paradox, the authors hypothesize that the rapid evolution of multi-copy gene systems arises from stronger genetic drift driven by homogenizing forces within individuals, such as gene conversion, unequal crossover, and replication slippage. They formulate this idea by combining the advantages of two classic population genetic models -- adding the V(k) term (which is the variance in reproductive success) in the Haldane model to the Wright-Fisher model. Using this model, the authors derived the strength of genetic drift (i.e., reciprocal of the effective population size, Ne) for the multi-copy gene system and compared it to that of the single-copy system. The theory was then applied to empirical genetic polymorphism and divergence data in rodents and great apes, relying on comparison between rRNA genes and genome-wide patterns (which mostly are single-copy genes). Based on this analysis, the authors concluded that neutral genetic drift could explain the rRNA diversity and evolution patterns in mice but not in humans and chimpanzees, pointing to a positive selection of rRNA variants in great apes.

      Strengths:

      Overall, the new WFH model is an interesting idea. It is intuitive, efficient, and versatile in various scenarios, including the multi-copy gene system and other cases discussed in the companion paper by Ruan et al.

      Weaknesses:

      Despite being intuitive at a high level, the model is a little unclear, as several terms in the main text were not clearly defined and connections between model parameters and biological mechanisms are missing. Most importantly, the data analysis of rRNA genes is extremely over-simplified and does not adequately consider biological and technical factors that are not discussed in the model. Even if these factors are ignored, the authors' interpretation of several observations is unconvincing, as alternative scenarios can lead to similar patterns. Consequently, the conclusions regarding rRNA genes are poorly supported. Overall, I think this paper shines more in the model than the data analysis, and the modeling part would be better presented as a section of the companion theory paper rather than a stand-alone paper. My specific concerns are outlined below.

      Response 5: We appreciate the reviewer’s feedback and recognize the need for clearer definitions of key terms. We have made revisions to ensure that each term is properly defined upon its first use.

      Regarding the model’s simplicity, as in the Response4, our intention was to create a framework that captures the essence of how mutant copies spread by chance within a population, relying on the variance in transmission rates for each copy (V(K)). By doing so, we aimed to incorporate the various homogenization mechanisms that do not affect single-copy genes, highlighting the substantially stronger genetic drift observed in multi-copy systems compared to single-copy genes. We believe that simplifying the model was necessary to make it more accessible and practical for real-world data analysis and provides a useful approximation that can be applied broadly. It is clearly an underestimate the actual rate as some forces with canceling effects might not have been accounted for.

      (1) Unclear definition of terms

      Many of the terms in the model or the main text were not clearly defined the first time they occurred, which hindered understanding of the model and observations reported. To name a few:

      (i) In Eq(1), although C* is defined as the "effective copy number", it is unclear what it means in an empirical sense. For example, Ne could be interpreted as "an ideal WF population with this size would have the same level of genetic diversity as the population of interest" or "the reciprocal of strength of allele frequency change in a unit of time". A few factors were provided that could affect C*, but specifically, how do these factors impact C*? For example, does increased replication slippage increase or decrease C*? How about gene conversion or unequal cross-over? If we don't even have a qualitative understanding of how these processes influence C*, it is very hard to make interpretations based on inferred C*. How to interpret the claim on lines 240-241 (If the homogenization is powerful enough, rRNA genes would have C*<1)? Please also clarify what C* would be, in a single-copy gene system in diploid species.

      Response 6: We apology for the confusion caused by the lack of clear definitions in the initial manuscript. We recognize that this has led to misunderstandings regarding the concept we presented. Our aim was to demonstrate the concerted evolution in multi-copy gene systems, involving two levels of “effective copy number” relative to single-copy genes: first, homogenization within populations then divergence between species. We used C* and Ne* to try to designated the two levels driven by the same homogenization force, which complicated the evolutionary pattern.

      To address these issues, we have simplified the model and revised the abstract to prevent any misunderstandings:

      “On average, rDNAs have C ~ 150 - 300 copies per haploid in humans. While a neutral mutation of a single-copy gene would take 4_N_ (N being the population size) generations to become fixed, the time should be 4_NC* generations for rRNA genes where 1<< C* (C* being the effective copy number; C* < C or C* > C would depend on the drift strength). However, the observed fixation time in mouse and human is < 4_N, implying the paradox of C* < 1. Genetic drift that encompasses all random neutral evolutionary forces appears as much as 100 times stronger for rRNA genes as for single-copy genes, thus reducing C* to < 1.”

      Thus, it should be clear that the fixation time as well as the level of polymorphism represent the empirical measures of C*.We have also revised the relevant paragraph in the text to define C* and V*(K) and removed Eq. 2 for clarity:

      “Below, we compare the strength of genetic drift in rRNA genes vs. that of single-copy genes using the Haldane model (Ruan, et al. 2024). We shall use * to designate the equivalent symbols for rRNA genes; for example, E(K) vs. E*(K). Both are set to 1, such that the total number of copies in the long run remains constant.

      For simplicity, we let V(K) = 1 for single-copy genes. (If we permit V(K) ≠ 1, the analyses will involve the ratio of V*(K) and V(K) to reach the same conclusion but with unnecessary complexities.) For rRNA genes,  V*(K) ≥ 1 may generally be true because K for rDNA mutations are affected by a host of homogenization factors including replication slippage, unequal cross-over, gene conversion and other related mechanisms not operating on single copy genes. Hence,

      where C is the average number of rRNA genes in an individual and V*(K) reflects the homogenization process on rRNA genes (Fig. 1D). Thus,

      C* = C/V*(K)

      represents the effective copy number of rRNA genes in the population, determining the level of genetic diversity relative to single-copy genes. Since C is in the hundreds and V*(K) is expected to be > 1, the relationship of 1 << C* ≤ C is hypothesized. Fig. 1D is a simple illustration that the homogenizing process may enhance V*(K) substantially over the WF model.

      In short, genetic drift of rRNA genes would be equivalent to single copy genes in a population of size NC* (or N*). Since C* >> 1 is hypothesized, genetic drift for rRNA genes is expected to be slower than for single copy genes.”

      (ii) In Eq(1), what exactly is V*(K)? Variance in reproductive success across all gene copies in the population? What factors affect V*(K)? For the same population, what is the possible range of V*(K)/V(K)? Is it somewhat bounded because of biological constraints? Are V*(K) and C*(K) independent parameters, or does one affect the other, or are both affected by an overlapping set of factors?

      Response 7: - In Eq(1), what exactly is V*(K)?  In Eq(1), V*(K) refers to the variance in the number of progeny to whom the gene copy of interest is transmitted (K) over a specific time interval. When considering evolutionary divergence between species, V*(K) may correspond to the divergence time.

      - What factors affect V*(K)? For the same population, what is the possible range of V*(K)/V(K)? Is it somewhat bounded because of biological constraints?  “V*(K) for rRNA genes is likely to be much larger than V(K) for single-copy genes, because K for rRNA mutations may be affected by a host of homogenization factors including replication slippage, unequal cross-over, gene conversion and other related mechanisms not operating on single-copy genes. For simplicity, we let V(K) = 1 (as in a WF population) and V*(K) ≥ 1.” Thus, the V*(K)/V(K) = V*(K) can potentially reach values in the hundreds, and may even exceed C, resulting in C*(= C/V*(K)) values less than 1. Biological constraints that could limit this variance include the minimum copy number within individuals, sequence constraints in functional regions, and the susceptibility of chromosomes with large arrays to intrachromosomal crossover (which may lead to a reduction in copy number)(Eickbush and Eickbush 2007), potentially reducing the variability of K.

      - Are V*(K) and C*(K) independent parameters, or does one affect the other, or are both affected by an overlapping set of factors?  There is no C*(K), the C* is defined as follows in the text:

      “C* = C/V*(K) represents the effective copy number of rRNA genes, reflecting the level of genetic diversity relative to single-copy genes. Since C is in the hundreds and V*(K) is expected to be > 1, the relationship of 1 << C* ≤ C is hypothesized.” The factors influencing V*(K) directly affect C* due to this relationship.

      (iii) In the multi-copy gene system, how is fixation defined? A variant found at the same position in all copies of the rRNA genes in the entire population?

      Response 8: We appreciate the reviewer's suggestion and have now provided a clear definition of fixation in the context of multi-copy genes within the manuscript.

      “For rDNA mutations, fixation must occur in two stages – fixation within individuals and among individuals in the population. (Note that a new mutation can be fixed via homogenization, thus making rRNA gene copies in an individual a pseudo-population.)”

      The evolutionary dynamics of multi-copy genes differ from those of single-copy (Mendelian) genes, which mutate, segregate and evolve independently in the population. Fixation in multi-copy genes, such as rRNA genes, is influenced by their ability to transfer genetic information among their copies through nonreciprocal exchange mechanisms, like gene conversion and unequal crossover (Ohta and Dover 1984). These processes can cause fluctuations in the number of mutant copies within an individual's lifetime and facilitate the spread of a mutant allele across all copies even in non-homologous chromosomes. Over time, this can result in the mutant allele replacing all preexisting alleles throughout the population, leading to fixation (Ohta 1976) meaning that the same variant will eventually be present at the corresponding position in all copies of the rRNA genes across the entire population. Without such homogenization processes, fixation would be unlikely to be obtained in multi-copy genes.

      (iv) Lines 199-201, HI, Hs, and HT are not defined in the context of a multi-copy gene system. What are the empirical estimators?

      Response 9: We appreciate the reviewer's comment and would like to clarify the definitions and empirical estimators for within the context of a multi-copy gene system in the text:

      “A standard measure of genetic drift is the level of heterozygosity (H). At the mutation-selection equilibrium

      where μ is the mutation rate of the entire gene and Ne is the effective population size. In this study, Ne = N for single-copy gene and Ne = C*N for rRNA genes. The empirical measure of nucleotide diversity H is given by

      where L is the gene length (for each copy of rRNA gene, L ~ 43kb) and pi is the variant frequency at the i-th site.

      We calculate H of rRNA genes at three levels – within-individual, within-species and then, within total samples (HI, HS and HT, respectively). HS and HT are standard population genetic measures (Hartl, et al. 1997; Crow and Kimura 2009). In calculating HS, all sequences in the species are used, regardless of the source individuals. A similar procedure is applied to HT. The HI statistic is adopted for multi-copy gene systems for measuring within-individual polymorphism. Note that copies within each individual are treated as a pseudo-population (see Fig. 1 and text above). With multiple individuals, HI is averaged over them.”

      (v) Line 392-393, f and g are not clearly defined. What does "the proportion of AT-to-GC conversion" mean? What are the numerator and denominator of the fraction, respectively?

      Response 10: We appreciate the reviewer's comment and have revised the relevant text for clarity as well as improved the specific calculation methods for f and g in the Methods section.

      “We first designate the proportion of AT-to-GC conversion as f and the reciprocal, GC-to-AT, as g. Specifically, f represents the proportion of fixed mutations where an A or T nucleotide has been converted to a G or C nucleotide (see Methods). Given f ≠ g, this bias is true at the site level.”

      Methods:

      “Specifically, f represents the proportion of fixed mutations where an A or T nucleotide has been converted to a G or C nucleotide. The numerator for f is the number of fixed mutations from A-to-G, T-to-C, T-to-G, or A-to-C. The denominator is the total number of A or T sites in the rDNA sequence of the specie lineage.

      Similarly, g is defined as the proportion of fixed mutations where a G or C nucleotide has been converted to an A or T nucleotide. The numerator for g is the number of fixed mutations from G-to-A, C-to-T, C-to-A, or G-to-T. The denominator is the total number of G or C sites in the rDNA sequence of the specie lineage.

      The consensus rDNA sequences for the species lineage were generated by Samtools consensus (Danecek, et al. 2021) from the bam file after alignment. The following command was used:

      ‘samtools consensus -@ 20 -a -d 10 --show-ins no --show-del yes input_sorted.bam output.fa’.”

      (2) Technical concerns with rRNA gene data quality

      Given the highly repetitive nature and rapid evolution of rRNA genes, myriads of things could go wrong with read alignment and variant calling, raising great concerns regarding the data quality. The data source and methods used for calling variants were insufficiently described at places, further exacerbating the concern.

      (i) What are the accession numbers or sample IDs of the high-coverage WGS data of humans, chimpanzees, and gorillas from NCBI? How many individuals are in each species? These details are necessary to ensure reproducibility and correct interpretation of the results.

      Response 11: We apologize for not including the specific details of the sample information in the main text. All accession numbers and sample IDs for the WGS data used in this study, including mice, humans, chimpanzee, and gorilla, are already listed in Supplementary Tables S4-S5. We have revised the table captions and referenced them at the appropriate points in the Methods to ensure clarity.

      “The genome sequences of human (n = 8), chimpanzee (n = 1) and gorilla (n = 1) were sourced from National Center for Biotechnology Information (NCBI) (Supplementary Table 4). … Genomic sequences of mice (n = 13) were sourced from the Wellcome Sanger Institute’s Mouse Genome Project (MGP) (Keane, et al. 2011).

      The concern regarding the number of individuals needed to support the results will be addressed in Response 13.

      (ii) Sequencing reads from great apes and mice were mapped against the human and mouse rDNA reference sequences, respectively (lines 485-486). Given the rapid evolution of rRNA genes, even individuals within the same species differ in copy number and sequences of these genes. Alignment to a single reference genome would likely lead to incorrect and even failed alignment for some reads, resulting in genotyping errors. Differences in rDNA sequence, copy number, and structure are even greater between species, potentially leading to higher error rates in the called variants. Yet the authors provided no justification for the practice of aligning reads from multiple species to a single reference genome nor evidence that misalignment and incorrect variant calling are not major concerns for the downstream analysis.

      Response 12: While the copy number of rDNA varies in each individuals, the sequence identity among copies is typically very high (median identity of 98.7% (Nurk, et al. 2022)). Therefore, all rRNA genes were aligned against to the species-specific reference sequences, where the consensus nucleotide nearly accounts for >90% of the gene copies in the population. In minimize genotyping errors, our analysis focused exclusively on single nucleotide variants (SNVs) with only two alleles, discarding other mutation types.

      Regarding sequence divergence between species, which may have greater sequence variations, we excluded unmapped regions with high-quality reads coverage below 10. In calculation of substitution rate, we accounted for the mapping length (L), as shown in the column 3 in Table 3-5.

      We appreciate the reviewer’s comments and have provide details in the Methods.

      (vi) It is unclear how variant frequency within an individual was defined conceptually or computed from data (lines 499-501). The population-level variant frequency was calculated by averaging across individuals, but why was the averaging not weighted by the copy number of rRNA genes each individual carries? How many individuals are sampled for each species? Are the sample sizes sufficient to provide an accurate estimate of population frequencies?

      Response 13: Each individual was considered as a psedo-population of rRNA genes, varaint frequency within an individual was the proportions of mutant allele in this psedo-population. The calculation of varaint frequency is based on the number of supported reads of each individual.

      The reason for calculating population-level variant frequency by averaging across individuals is relevant in the calculation of FIS and FST. In calculating FST, the standard practice is to weigh each population equally. So, when we show FST in humans, we do not consider whether there are more Africans, Caucasians or Asians. There is a reason for not weighing them even though the population sizes could be orders of magnitude different, say, in the comparison between an ethnic minority and the main population. In the case of FIS, the issue is moot. Although copy number may range from 150 to 400 per haploid, most people have 300 – 500 copies with two haploids.

      As for the concern regarding the number the individuals needed to support of the results:

      Considering the nature of multi-copy genes, where gene members undergo continuous exchanges at a much slower rate compared to the rapid rate of random distribution of chromosomes at each generation of sexual reproduction, even a few variant copies that arise during an individual's lifetime would disperse into the gene pool in the next generation (Ohta and Dover 1984). Thus, there is minimal difference between individuals. Our analysis is also aligns with this theory, particularly in human population (FIS = 0.059), where each individual carries the majority of the population's genetic diversity. Therefore, even a single chimpanzee or gorilla individual caries sufficient diversity with its hundreds of gene copies to calculate divergence with humans.

      (vii) Fixed variants are operationally defined as those with a frequency>0.8 in one species. What is the justification for this choice of threshold? Without knowing the exact sample size of the various species, it's difficult to assess whether this threshold is appropriate.

      Response 14: First, the mutation frequency distribution is strongly bimodal (see Figure below) with a peak at zero and the other at 1. This high frequency peak starts to rise slowly at 0.8, similar to FST distribution in Figure 4C. That is why we use it as the cutoff although we would get similar results at the cutoff of 0.90 (see Table below). Second, the sample size for the calculation of mutant frequency is based on the number of reads which is usually in the tens of thousands. Third, it does not matter if the mutation frequency calculation is based on one individuals or multiple individuals because 95% of the genetic diversity of the population is captured by the gene pool within each individual.

      Author response image 1.

      Author response table 1.

      The A/T to G/C and G/C to A/T changes in apes and mouse.

      New mutants with a frequency >0.9 within an individual are considered as (nearly) fixed, except for humans, where the frequency was averaged over 8 individuals in the Table 2.

      The X-squared values for each species are as follows: 58.303 for human, 7.9292 for chimpanzee, and 0.85385 for M. m. domesticus.

      (viii) It is not explained exactly how FIS, FST, and divergence levels of rRNA genes were calculated from variant frequency at individual and species levels. Formulae need to be provided to explain the computation.

      Response 15: After we clearly defined the HI, HS, and HT in Response9, understanding FIS and F_ST_ becomes straightforward.

      “Given the three levels of heterozygosity, there are two levels of differentiation. First, FIS is the differentiation among individuals within the species, defined by

      FIS = [HS - HI]/HS  

      FIS is hence the proportion of genetic diversity in the species that is found only between individuals. We will later show FIS ~ 0.05 in human rDNA (Table 2), meaning 95% of rDNA diversity is found within individuals.

      Second, FST is the differentiation between species within the total species complex, defined as

      FST = [HT – HS]/HT 

      FST is the proportion of genetic diversity in the total data that is found only between species.”

      (3) Complete ignorance of the difference in mutation rate difference between rRNA genes and genome-wide average

      Nearly all data analysis in this paper relied on comparison between rRNA genes with the rest (presumably single-copy part) of the genome. However, mutation rate, a key parameter determining the diversity and divergence levels, was completely ignored in the comparison. It is well known that mutation rate differs tremendously along the genome, with both fine and large-scale variation. If the mutation rate of rRNA genes differs substantially from the genome average, it would invalidate almost all of the analysis results. Yet no discussion or justification was provided.

      Response 16: We appreciate the reviewer's observation regarding the potential impact of varying mutation rates across the genome. To address this concern, we compared the long-term substitution rates on rDNA and single-copy genes between human and rhesus macaque, which diverged approximately 25 million years ago. Our analysis (see Table S1 below) indicates that the substitution rate in rDNA is actually slower than the genome-wide average. This finding suggests that rRNA genes do not experience a higher mutation rate compared to single-copy genes, as stated in the text:

      “Note that Eq. (3) shows that the mutation rate, m, determines the long-term evolutionary rate, l. Since we will compare the l values between rRNA and single-copy genes, we have to compare their mutation rates first by analyzing their long-term evolution. As shown in Table S1, l falls in the range of 50-60 (differences per Kb) for single copy genes and 40 – 70 for the non-functional parts of rRNA genes. The data thus suggest that rRNA and single-copy genes are comparable in mutation rate. Differences between their l values will have to be explained by other means.”

      However, given the divergence time (Td) being equal to or smaller than Tf, even if the mutation rate per nucleotide is substantially higher in rRNA genes, these variants would not become fixed after the divergence of humans and chimpanzees without the help of strong homogenization forces. Thus, the presence of divergence sites (Table 5) still supports the conclusion that rRNA genes undergo much stronger genetic drift compared to single-copy genes.

      Related to mutation rate: given the hypermutability of CpG sites, it is surprising that the evolution/fixation rate of rRNA estimated with or without CpG sites is so close (2.24% vs 2.27%). Given the 10 - 20-fold higher mutation rate at CpG sites in the human genome, and 2% CpG density (which is probably an under-estimate for rDNA), we expect the former to be at least 20% higher than the latter.

      Response 17: While it is true that CpG sites exhibit a 10-20-fold higher mutation rate, the close evolution/fixation rates of rDNA with and without CpG sites (2.24% vs 2.27%) may be attributed to the fact that fixation rates during short-term evolutionary processes are less influenced by mutation rates alone. As observed in the Human-Macaque comparison in the table above, the substitution rate of rDNA in non-functional regions with CpG sites is 4.18%, while it is 3.35% without CpG sites, aligning with your expectation of 25% higher rates where CpG sites are involved.

      This discrepancy between the expected and observed fixation rates may be due to strong homogenization forces, which can rapidly fix or eliminate variants, thereby reducing the overall impact of higher mutation rates at CpG sites on the observed fixation rate. This suggests that the homogenization mechanisms play a more dominant role in the fixation process over short evolutionary timescales, mitigating the expected increase in fixation rates due to CpG hypermutability.

      Among the weaknesses above, concern (1) can be addressed with clarification, but concerns (2) and (3) invalidate almost all findings from the data analysis and cannot be easily alleviated with a complete revamp work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Both reviewers found the manuscript confusing and raised serious concerns. They pointed out a lack of engagement with previous literature on modeling and the presence of ill-defined terms within the model, which obscure understanding. They also noted a significant disconnection between the modeling approach and the biological processes involved. Additionally, the data analysis was deemed problematic due to the failure to consider essential biological and technical factors. One reviewer suggested that the modeling component would be more suitable as a section of the companion theory paper rather than a standalone paper. Please see their individual reviews for their overall assessment.

      Reviewer #2 (Recommendations For The Authors):

      Beyond my major concerns, I have numerous questions about the interpretation of various findings:

      Lines 62-63: Please explain under what circumstance Ne=N/V(K) is biologically nonsensical and why.

      Response 18: “Biologically non-sensical” is the term used in (Chen, et al. 2017). We now used the term “biologically untenable” but the message is the same. How does one get V(K) ≠ E(K) in the WF sampling? It is untenable under the WF structure. Kimura may be the first one to introduce V(K) ≠ E(K) into the WF model and subsequent papers use the same sort of modifications that are mathematically valid but biologically dubious. As explained extensively in the companion paper, the modifications add complexities but do not give the WF models powers to explain the paradoxes.

      Lines 231-234: The claim about a lower molecular evolution rate (lambda) is inaccurate - under neutrality, the molecular evolution rate is always the same as the mutation rate. It is true that when the species divergence Td is not much greater than fixation time Tf, the observed number of fixed differences would be substantially smaller than 2*mu*Td, but the lower divergence level does not mean that the molecular evolution is slower. In other words, in calculating the divergence level, it is the time term that needs to be adjusted rather than the molecular evolution rate.

      Response 19: Thanks, we agree that the original wording was not accurate. It is indeed the substitution rate rather than the molecular evolution rate that is affected when species divergence time Td is not much greater than the fixation time Tf. We have revised the relevant text in the manuscript to correct this and ensure clarity.

      Lines 277-279: Hs for rRNA is 5.2x fold than the genome average. This could be roughly translated as Ne*/Ne=5.2. According to Eq 2: (1/Ne*)/(1/Ne)= Vh/C*, it can be drived that mean Ne*/Ne=C*/Vh. Then why do the authors conclude "C*=N*/N~5.2" in line 278? Wouldn't it mean that C*/Vh is roughly 5.2?

      Response 20: We apologize for the confusion. To prevent misunderstandings, we have revised Equation 1 and deleted Equation 2 from the manuscript. Please refer to the Response6 for further details.

      Lines 291-292: What does "a major role of stage I evolution" mean? How does it lead to lower FIS?

      Response 21: We apologize for the lack of clarity in our original description, and we have revised the relevant content to make them more directly.

      “In this study, we focus on multi-copy gene systems, where the evolution takes place in two stages: both within (stage I) and between individuals (stage II).”

      FIS for rDNA among 8 human individuals is 0.059 (Table 2), much smaller than 0.142 in M. m. domesticus mice, indicating minimal genetic differences across human individuals and high level of genetic identity in rDNAs between homologous chromosomes among human population. … Correlation of polymorphic sites in IGS region is shown in Supplementary Fig. 1. The results suggest that the genetic drift due to the sampling of chromosomes during sexual reproduction (e.g., segregation and assortment) is augmented substantially by the effects of homogenization process within individual. Like those in mice, the pattern indicates that intra-species polymorphism is mainly preserved within individuals.”

      Line 297-300: why does the concentration at very allele frequency indicate rapid homogenization across copies? Suppose there is no inter-copy homogenization, and each copy evolves independently, wouldn't we still expect the SFS to be strongly skewed towards rare variants? It is completely unclear how homogenization processes are expected to affect the SFS.

      Response 22: We appreciate the reviewer’s insightful comments and apologize for any confusion in our original explanation. To clarify:

      If there is no inter-copy homogenization and each copy evolves independently, it would effectively result in an equivalent population size that is C times larger than that of single-copy genes. However, given the copies are distributed on five chromosomes, if the copies within a chromosome were fully linked, there would be no fixation at any sites. Considering the data presented in Table 4, where the substitution rate in rDNA is higher than in single-copy genes, this suggests that additional forces must be acting to homogenize the copies, even across non-homologous chromosomes.

      Regarding the specific data presented in the Figure 3, the allele frequency spectrum is based on human polymorphism sites and is a folded spectrum, as the ancestral state of the alleles was not determined. High levels of homogenization would typically push variant mutations toward the extremes of the SFS, leading to fewer intermediate-frequency alleles and reduced heterozygosity. The statement that "allele frequency spectrum is highly concentrated at very low frequency within individuals" was intended to emphasize the localized distribution of variants and the high identity at each site. However, we recognize that it does not accurately reflect the role of homogenization and this conclusion cannot be directly inferred from the figure as presented. Therefore, we have removed the sentence in the text.

      The evidence of gBGC in rRNA genes in great apes does not help explain the observed accelerated evolution of rDNA relative to the rest of the genome. Evidence of gBGC has been clearly demonstrated in a variety of species, including mice. It affects not only rRNA genes but also most parts of the genome, particularly regions with high recombination rates. In addition, gBGC increases the fixation probability of W>S mutations but suppresses the fixation of S>W mutations, so it is not obvious how gBGC will increase or decrease the molecular evolution rate overall.

      Response 23: We have thoroughly rewritten the last section of Results. The earlier writing has misplaced the emphasis, raising many questions (as stated above). To answer them, we would have to present a new set of equations thus adding unnecessary complexities to the paper. Here is the streamlined and more logical flow of the new section.

      First, Tables 4 and 5 have shown the accelerated evolution of the rRNA genes. We have now shown that rRNA genes do not have higher mutation rates. Below is copied from the revised text:

      “We now consider the evolution of rRNA genes between species by analyzing the rate of fixation (or near fixation) of mutations. Polymorphic variants are filtered out in the calculation. Note that Eq. (3) shows that the mutation rate, m, determines the long-term evolutionary rate, l. Since we will compare the l values between rRNA and single-copy genes, we have to compare their mutation rates first by analyzing their long-term evolution. As shown in Table S1 l falls in the range of 50-60 (differences per Kb) for single copy genes and 40 – 70 for the non-functional parts of rRNA genes. The data thus suggest that rRNA and single-copy genes are comparable in mutation rate. Differences between their l values will have to be explained by other means.”

      Second, we have shown that the accelerated evolution in mice is likely due to genetic drift, resulting in faster fixation of neutral variants. We also show that this is unlikely to be true in humans and chimpanzees; hence selection is the only possible explanation. The section below is copied from the revised text. It shows the different patterns of gene conversions between mice and apes, in agreement with the results of Tables 4 and 5. In essence, it shows that the GC ratio in apes is shifting to a new equilibrium, which is equivalent to a new adaptive peak. Selection is driving the rDNA genes to move to the new adaptive peak.

      Revision - “Thus, the much accelerated evolution of rRNA genes between humans and chimpanzees cannot be entirely attributed to genetic drift. In the next and last section, we will test if selection is operating on rRNA genes by examining the pattern of gene conversion. 

      3) Positive selection for rRNA mutations in apes, but not in mice – Evidence from gene conversion patterns

      For gene conversion, we examine the patterns of AT-to-GC vs. GC-to-AT changes. While it has been reported that gene conversion would favor AT-to-GC over GC-to-AT conversion (Jeffreys and Neumann 2002; Meunier and Duret 2004) at the site level, we are interested at the gene level by summing up all conversions across sites. We designate the proportion of AT-to-GC conversion as f and the reciprocal, GC-to-AT, as g. Both f and g represent the proportion of fixed mutations between species (see Methods). So defined, f and g are influenced by the molecular mechanisms as well as natural selection. The latter may favor a higher or lower GC ratio at the genic level between species. As the selective pressure is distributed over the length of the gene, each site may experience rather weak pressure.

      Let p be the proportion of AT sites and q be the proportion of GC sites in the gene. The flux of AT-to-GC would be pf and the flux in reverse, GC-to-AT, would be qg. At equilibrium, pf = qg. Given f and g, the ratio of p and q would eventually reach p/q \= g/f. We now determine if the fluxes are in equilibrium (pf =qg). If they are not, the genic GC ratio is likely under selection and is moving to a different equilibrium.

      In these genic analyses, we first analyze the human lineage (Brown and Jiricny 1989; Galtier and Duret 2007). Using chimpanzees and gorillas as the outgroups, we identified the derived variants that became nearly fixed in humans with frequency > 0.8 (Table 6). The chi-square test shows that the GC variants had a significantly higher fixation probability compared to AT. In addition, this pattern is also found in chimpanzees (p < 0.001). In M. m. domesticus (Table 6), the chi-square test reveals no difference in the fixation probability between GC and AT (p = 0.957). Further details can be found in Supplementary Figure 2. Overall, a higher fixation probability of the GC variants is found in human and chimpanzee, whereas this bias is not observed in mice.

      Tables 6-7 here

      Based on Table 6, we could calculate the value of p, q, f and g (see Table 7). Shown in the last row of Table 7, the (pf)/(qg) ratio is much larger than 1 in both the human and chimpanzee lineages. Notably, the ratio in mouse is not significantly different from 1. Combining Tables 4 and 7, we conclude that the slight acceleration of fixation in mice can be accounted for by genetic drift, due to gene conversion among rRNA gene copies. In contrast, the different fluxes corroborate the interpretations of Table 5 that selection is operating in both humans and chimpanzees.”

      References

      Arnheim N, Treco D, Taylor B, Eicher EM. 1982. Distribution of ribosomal gene length variants among mouse chromosomes. Proc Natl Acad Sci U S A 79:4677-4680.

      Brown T, Jiricny J. 1989. Repair of base-base mismatches in simian and human cells. Genome / National Research Council Canada = Génome / Conseil national de recherches Canada 31:578-583.

      Cannings C. 1974. The latent roots of certain Markov chains arising in genetics: A new approach, I. Haploid models. Advances in Applied Probability 6:260-290.

      Chen Y, Tong D, Wu CI. 2017. A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations. Mol Biol Evol 34:2057-2064.

      Chia AB, Watterson GA. 1969. Demographic effects on the rate of genetic evolution I. constant size populations with two genotypes. Journal of Applied Probability 6:231-248.

      Crow JF, Kimura M. 2009. An Introduction to Population Genetics Theory: Blackburn Press.

      Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools. Gigascience 10.

      Datson NA, Morsink MC, Atanasova S, Armstrong VW, Zischler H, Schlumbohm C, Dutilh BE, Huynen MA, Waegele B, Ruepp A, et al. 2007. Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate. Bmc Genomics 8:190.

      Der R, Epstein CL, Plotkin JB. 2011. Generalized population models and the nature of genetic drift. Theoretical Population Biology 80:80-99.

      Dover G. 1982. Molecular drive: a cohesive mode of species evolution. Nature 299:111-117.

      Eickbush TH, Eickbush DG. 2007. Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics 175:477-485.

      Galtier N, Duret L. 2007. Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. Trends in Genetics 23:273-277.

      Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, et al. 2007. Evolutionary and Biomedical Insights from the Rhesus Macaque Genome. Science 316:222-234.

      Guarracino A, Buonaiuto S, de Lima LG, Potapova T, Rhie A, Koren S, Rubinstein B, Fischer C, Abel HJ, Antonacci-Fulton LL, et al. 2023. Recombination between heterologous human acrocentric chromosomes. Nature 617:335-343.

      Hartl DL, Clark AG, Clark AG. 1997. Principles of population genetics: Sinauer associates Sunderland.

      Hori Y, Shimamoto A, Kobayashi T. 2021. The human ribosomal DNA array is composed of highly homogenized tandem clusters. Genome Res 31:1971-1982.

      Jeffreys AJ, Neumann R. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 31:267-271.

      Karlin S, McGregor J. 1964. Direct Product Branching Processes and Related Markov Chains. Proceedings of the National Academy of Sciences 51:598-602.

      Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, et al. 2011. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477:289-294.

      Krystal M, D'Eustachio P, Ruddle FH, Arnheim N. 1981. Human nucleolus organizers on nonhomologous chromosomes can share the same ribosomal gene variants. Proceedings of the National Academy of Sciences of the United States of America 78:5744-5748.

      Meunier J, Duret L. 2004. Recombination drives the evolution of GC-content in the human genome. Molecular Biology and Evolution 21:984-990.

      Nagylaki T. 1983. Evolution of a large population under gene conversion. Proc Natl Acad Sci U S A 80:5941-5945.

      Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, et al. 2022. The complete sequence of a human genome. Science 376:44-53.

      Ohta T. 1985. A model of duplicative transposition and gene conversion for repetitive DNA families. Genetics 110:513-524.

      Ohta T. 1976. Simple model for treating evolution of multigene families. Nature 263:74-76.

      Ohta T, Dover GA. 1984. The Cohesive Population Genetics of Molecular Drive. Genetics 108:501-521.

      Ohta T, Dover GA. 1983. Population genetics of multigene families that are dispersed into two or more chromosomes. Proc Natl Acad Sci U S A 80:4079-4083.

      Ruan Y, Wang X, Hou M, Diao W, Xu S, Wen H, Wu C-I. 2024. Resolving Paradoxes in Molecular Evolution: The Integrated WF-Haldane (WFH) Model of Genetic Drift. bioRxiv:2024.2002.2019.581083.

      Smirnov E, Chmúrčiaková N, Liška F, Bažantová P, Cmarko D. 2021. Variability of Human rDNA. Cells 10.

      Smith GP. 1976. Evolution of Repeated DNA Sequences by Unequal Crossover. Science 191:528-535.

      Smith GP. 1974a. Unequal crossover and the evolution of multigene families. Cold Spring Harbor symposia on quantitative biology 38:507-513.

      Smith GP. 1974b. Unequal Crossover and the Evolution of Multigene Families.  38:507-513.

      Stults DM, Killen MW, Pierce HH, Pierce AJ. 2008. Genomic architecture and inheritance of human ribosomal RNA gene clusters. Genome Res 18:13-18.

      van Sluis M, Gailín M, McCarter JGW, Mangan H, Grob A, McStay B. 2019. Human NORs, comprising rDNA arrays and functionally conserved distal elements, are located within dynamic chromosomal regions. Genes Dev 33:1688-1701.

      Wall JD, Frisse LA, Hudson RR, Di Rienzo A. 2003. Comparative linkage-disequilibrium analysis of the beta-globin hotspot in primates. Am J Hum Genet 73:1330-1340.

    2. eLife Assessment

      This study presents a useful theoretical model of molecular evolution and attempts to use it to resolve the paradox of rapid evolution of ribosomal RNA genes. While intuitive, the model's underlying issue is grouping many factors under "variance in reproductive success" without explicitly modeling the molecular processes. This limitation, along with insufficient consideration of technical challenges in alignment and variants calling, provides incomplete support for the authors' claim that the observed paradoxical patterns in rRNA genes can largely be explained by homogenizing processes, such as gene conversion, unequal crossover and replication slippage.

    3. Reviewer #1 (Public review):

      The revision by Wang et al is a much more clear and readable manuscript than the original version, which I think was a bit too terse and hard to parse. In this version, I think I basically understand all the analyses that the authors undertake and how they argue that those analyses support their conclusions.

      The fundamental claim of the manuscript is that rRNA genes experience substitutions much too quickly, given that they are a multi-copy gene system. As clarified by the authors in their response, and as I think is relatively clear in the manuscript, they are collapsing all copies of the rRNA array down. They first quantify polymorphism (in this expanded definition, where polymorphism means variable at a given site across any copy). The authors find elevated levels of heterozygosity in rRNA genes compared to single copy genes, which isn't surprising, given that there is a substantially higher target size; that being said, the increase in polymorphism is smaller than the increase in target size. They then look at substitutions between mouse species and also between human and chimp, and argue that the substitution rate is too fast compared to single copy genes in many cases.

      I think that this is an interesting problem and one that obviously occupies some space in the literature. As the authors point out, one possibility for explaining the elevated fixation rate is that there is some kind of positive selection in these putatively non-functional regions. The authors, instead, argue that the elevated rate of evolution is due to neutral homogenizing processes. I'm sympathetic to this argument, I'm a neutralist myself :)

      That being said, I find the whole analysis and the connection with the WFH model very strange. As I stated in my previous review, it feels very odd to chalk everything up to variance in reproductive success, rather than explicitly modeling the molecular processes that may lead to the homogenization. For example, the authors bring up gene conversion, and even do a small test of gene conversion. But a force like biased gene conversion is perhaps better modeled as a deterministic force, rather than a stochastic force. Indeed, I think that explicit modeling of mutation dynamics has been very helpful in understanding the role of replicative vs damage-related mutation in humans, as seen in Gao et al (2016) and Spisak et al (2024). I realize, as the authors say in their cover letter, that this is hard! But a major concern with this manuscript is that it's about whether drift can plausibly explain the pattern, but then it's basically impossible to know if it really can, because we have no way to compare the estimated parameters with biophysical or biochemical measurements of the rates of homogenizing forces, because the homogenizing forces are just wrapped up under "variance in reproductive success". I think a much more interesting manuscript would have a more explicit model of homogenizing forces.

      I also have some concerns about the data analysis, echoing some concerns of the other reviewer. The biggest issue is that traditional read mapping and SNP calling pipelines for highly duplicated loci don't really make sense. I don't fully understand the variant calling pipeline. The authors state that "All mapping and analysis are performed among individual copies of rRNA genes." which makes it sound like the reads mapping to different copies were somehow deconvolved, which is what you'd need to do to use "normal" variant calling approaches that call look for homozygotes and heterozygotes. But I don't know enough about this literature to understand how they did that and if it makes any sense. If, instead, they called variants against collapsed rRNA copies, then using a standard variant calling approach does not make sense. If you have a variant in 2 out of 100 copies, a standard variant calling algorithm would very likely call that a homozygous ancestral site. Conditional on the variant calls being reasonable, however, I'm basically okay with their use of read counts to estimate "allele frequencies" within individuals.

      I have some more minor comments:

      (1) In the paragraph starting line 61, the authors say that WF models are unable to handle things like viral epidemics and transposons. I don't think that's really fair: the issue here isn't WF dynamics or not, it's that there is fundamentally evolution on two levels (which is also the case in the rRNA case considered in this manuscript). I certainly agree with the authors that you can't just naively apply standard pop gen theory in these systems, but I think the arrow at the WF model is misaimed, as the real issue is drift and selection on multiple levels.

      (2) Line 268-269: The authors argue that the long term rate of evolution in rRNA genes is roughly similar to single copy genes, suggesting not a big influence of increased mutation rate. I'm not sure I understand where this number comes from, as opposed to the divergence numbers they look at in Table 3. These seem to be two different conclusions from roughly the same measurement? Surely I am misunderstanding something.

      References:

      Gao, Z., Wyman, M. J., Sella, G., & Przeworski, M. (2016). Interpreting the dependence of mutation rates on age and time. PLoS biology, 14(1), e1002355.

      Spisak, N., de Manuel, M., Milligan, W., Sella, G., & Przeworski, M. (2024). The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair. PLoS biology, 22(6), e3002678.

    4. Reviewer #2 (Public review):

      I appreciate the authors' efforts in addressing previous feedback by correcting typos, clarifying terms, and expanding the methodological descriptions. The revisions have notably improved the manuscript's clarity and readability. However, despite these positive changes, I still have several significant concerns, both conceptual and technical, that need to be addressed to strengthen the conclusions of the paper.

      The key idea of this paper is the treatment of rDNA copies in an individual as a pseudo-population and model their sequence evolution with the WFH framework by introducing the parameter V*(K). With this modeling framework, the authors claim that the molecular evolution rate of rDNA relative to that of single-copy genes can be expressed as a simple function V*(K) and C (the copy number per individual). Moreover, when V*(K) is sufficiently large, the neutral molecular evolution of rDNA can be faster than expected under a naïve model without considering horizontal, homogenizing processes and thus be potentially compatible with empirical data. However, several issues persist in the definition, assumptions, and derivation of the model:

      (1) Several terms in the model remain undefined. While Ne is clearly defined in the standard single-copy gene model as the reciprocal of genetic drift (i.e., the decay in heterozygosity), its meaning for multiple-copy genes is unclear. Based on the context, it appears that the authors define Ne as the parameter that fits the population polymorphism level (Hs) using the equation in line 165. This definition is reasonable, but it should be explicitly clarified in the text."<br /> (2) Another key parameter V*(K) was still not defined within the paper. In response 9, the authors explained that V*(K) refers to "the number of progeny to whom the gene copy of interest is transmitted (K) over a specific time interval". However, the meaning of "progeny" remains unclear. Are the authors referring to the descendent copies of a gene copy, or the offspring individuals (i.e., the living organisms)? For example, if a variant spreads horizontally through homogenizing processes and transmits vertically to multiple offspring individuals, the number of descent gene copies could differ substantially from the number of descendent individuals to whom a gene copy is transmitted to. This distinction needs to be clarified and clearly stated in the paper.<br /> (3) The authors state that V*(K)>=1 for rDNA genes because of the homogenizing processes (lines 139-141) without providing justification. It is unclear, at least to me, whether homogenizing processes are expected increase or decrease the variance in "reproductive success" across gene copies. Moreover, the authors claim that V*(K) "can potentially reach values in the hundreds and may even exceed C, resulting in C*=C/V*(K)<1" (Response 7). This claim is unlikely to be true, as the minimum value of K is bounded by zero and E(K) is assumed to be 1. Even in the extreme case that 1% gene copies leave large numbers of descends while the others leave none, V*(K) would still be less than 100. Such extreme case seems highly improbable, given realistic rates of the homogenizing processes.<br /> (4) Regardless of how the authors define V*(K), it is not immediately clear why Equation 1 (N*=NC/V*(K)) holds. Both sides of the equation have their independent meanings, so the authors need to provide a step-by-step derivation demonstrating that they are equal. Only by doing this will the implicit underlying assumptions become clearer. I also strongly recommend that the authors conduct forward-in-time simulations with fixed N, C, V*(K) (however they define it) and μ to confirm that the right side of Equation 1 actually predicts the N* as calculated from the polymorphism level using the equation in line 165.<br /> (5) Without providing justification, the authors assumed that a certain number N* exists for rRNA such that it fits both the polymorphism level (line 156) in recent timescales and divergence level in longer timescales (i.e., in the comparison between Tf and Td). However, if N, C or any other relevant parameters have varied substantially throughout evolution, N* is expected to vary with time, and the same value may not fit both polymorphism and divergence data simultaneously.

      The authors also provided more detailed description of their data analysis methods, but some of my major concerns remain:<br /> (1) A significant issue with aligning reads to a single reference genome is reference bias, referring to the phenomenon that reads carrying the reference alleles tend to align more easily than those with one or more non-reference alleles, thus creating a bias in genotype calling or variant allele frequency quantification. As a result, there may be an underrepresentation of non-reference alleles in called variants or an underestimate of non-reference allele frequency, particularly in regions with high genetic diversity. Simply focusing on bi-allelic SNVs is insufficient to minimize reference bias. Given the fourfold increase in diversity within rDNA, the authors must either provide evidence that reference bias is not a significant concern or adopt graph-based reference genomes or more sophisticated alignment algorithms to address this issue.<br /> (2) The potential for reference bias also renders the analysis of divergence sites unreliable, as aligning reads from one species (e.g. chimpanzee) to the reference of another species (e.g., human) is likely to introduce biases in variant calling between the two. One commonly adopted approach to address this imbalance is to align reads from both species to a third reference genome that is expected to be equidistantly related to both.<br /> (3) Although it is somewhat reassuring that the estimated divergence rate of rDNA between human and macaque is comparable to that of the rest of the genome, there still remains concern of a under-estimation of divergence in rDNA regions due to reference bias issue. Note that while the "third genome" approach reduces imbalance between two genomes in comparison, it may still under-estimate overall divergence level due to under-calling of non-reference variants.<br /> (4) In response to my question about the similarity in rDNA substitution rates estimated with or without CpG sites, the authors suggest that this "may be due to strong homogenizing forces, which can rapidly fix or eliminate variants" (response17). However, this explanation is insufficient, because the observed substitution rate depends on the mutation rate multiplied by the fixation probability, and accelerated fixation or loss does not alter either. Unless the authors can provide more convincing explanation, technical errors in calling of fixed sites still remain a concern.

      Minor points<br /> Line 157: The statement "where μ is the mutation rate of the entire gene" must be wrong, as the heterozygosity calculated with such μ would correspond to the chance of seeing two different haplotypes at gene level, which is incompatible with the empirical calculation specified in Equation 2. Instead, μ must represent the mutation rate per site averaged over the entire gene.

      In response 22, the authors explained that the allele frequency spectrum shown in Fig 3 is folded, because the ancestral allele was not determined. However, this is inconsistent with x-axis Fig 3 ranging between 0 and 1. I suspect the x-axis represents the frequency of the alternative (i.e., non-reference) allele. If so, the reported correlation is inflated, as the reference allele is somewhat random, and a variant at joint ALT allele frequencies of (0.9, 0.9) is no different from a variant at (0.1, 0.1). The proper way of calculate this correlation is to first determine the minor allele frequency across individuals and then calculate the correlation between minor allele frequencies.

      Similarly, in response 14, it is unclear what the x-axis represents. Is it the ALT allele frequency or derived allele frequency? If the former, why are only variants with AF>0.8 defined as fixed variants, while those with AF<0.2 excluded? If it is the latter, please describe how ancestral state is determined.

    1. eLife Assessment

      This is a valuable study on the diffusion rates of drug molecules in human-derived cells, presenting convincing data indicating that their diffusion behavior depends on their charged state. It proposes that blocking drug protonation enhances diffusion and fractional recovery, suggesting improved intracellular availability of weakly basic drugs. The findings are significant for drug design and understanding the biophysical behavior of small molecules in cells.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to measure the diffusion of small drug molecules inside live cells. To do this, they selected a range of fluorescent drugs, as well as some commonly used dyes, and used FRAP to quantify their diffusion. The authors find that drugs diffuse and localize within the cell in a way that is weakly correlated with their charge, with positively charged molecules displaying dramatically slower diffusion and a high degree of subcellular localization.

      The study is important because it points to an important issue related to the way drugs behave inside cells beyond the simple "IC50" metric (a decidedly mesoscopic/systemic value). The authors conclude, and I agree, that their results point to nuanced effects that are governed by drug chemistry that could be optimized to make them more effective.

      Strengths:

      (1) The work examines an understudied aspect of drug delivery.

      (2) The work uses well-established methodologies to measure diffusion in cells

      (3) The work provides an extensive dataset, covering a range of chemistries that are common in small molecule drug design

      (4) The authors consider several explanations as to the origin of changes in cellular diffusion

      Comments on revised version:

      In general, my comments were addressed, new discussions were added, and the paper has been improved significantly, which is great.

      However, despite providing very clear instructions, a lot of my comments re statistical treatment were disregarded. Bar charts still do not show the repeats as individual points. Errors bars still represent SEM, which gives a wrong idea about the spread of the data. FRAP lines are still averages, and still do not show the spread of the data.

      Significance assignments are done based on average and SEMs, as opposed to the full dataset. There is nothing technically wrong with this, but it generally creates an impression that things are more reproducible/rigorous/significant than they would be if the data was shown completely.

    3. Reviewer #2 (Public Review):

      Summary:

      Blocking a weak base compound's protonation increased intracellular diffusion and fractional recovery in the cytoplasm, which may improve the intracellular availability and distribution of weakly basic, small molecule drugs and be impactful in future drug development.

      Strengths:

      (1) The intracellular distribution of drugs and the chemical properties that drive their distribution are much needed in the literature. Thus, the idea behind this paper is of relevance.

      (2) The study used common compounds that were relevant to others.

      (3) Altering a compound's pKa value and measuring cytosolic diffusion rates certainly is inciteful on how weak base drugs and their relatively high pKa values affect distribution and pharmacokinetics. This particular experiment demonstrated relevance to drug targeting and drug development.

      (4) The manuscript was fairly well written.

      Comments on revised version:

      After reviewing the authors' responses to my questions and concerns, they have adequately corrected the errors, added new information and data based off the reviewers suggestions that improved the manuscript. The manuscript in its current form would add quality information to a part of the literature that is lacking much needed information.